Thursday, December 6, 2012

Using awk list all the unique file types in a directory

stevenh@host:~$ find . -maxdepth 1 -type f | xargs file
./2012-11-09_08-25-58_284.jpg: JPEG image data, EXIF standard
./2012-08-21_16-11-36_312.jpg: JPEG image data, EXIF standard
./2012-11-16_17-59-01_863.jpg: JPEG image data, EXIF standard
./2012-10-22_19-00-57_173.jpg: JPEG image data, EXIF standard
./2012-10-20_21-48-49_804.jpg: JPEG image data, EXIF standard
./2012-08-26_12-56-14_108.jpg: JPEG image data, EXIF standard
./2012-10-03_17-36-48_84.jpg: JPEG image data, EXIF standard
./2012-11-10_21-07-58_506.jpg: JPEG image data, EXIF standard
./2012-10-01_06-37-42_924.jpg: JPEG image data, EXIF standard
...

Take the output of the above command and let's format it with the purpose listed in the title...



stevenh@host:~$ find . -maxdepth 1 -type f | xargs file | awk '{ FS=": +"; split($1,a,"."); print toupper(a[3]),$2}' | sort | uniq


Reading this logically, this script does the following:

find all files in the current directory, don't go into subdirectories
run the file command on the output with xargs
take that output and separate it with the regex (a ":" with one or more spaces after it)
split the first field into an array "a" separated by a period "."
print
   convert the third field of the awk array "a" to uppercase
   [space]
   field 2
sort the results
take all those results and only show me the uniq values

output will look something like this:


GIF: data
GIF GIF image data, version 89a, 710 x 431
JPEG data
JPEG JPEG image data, JFIF standard 1.01
JPG data
JPG JPEG image data
JPG JPEG image data, JFIF standard 1.01
JPG JPEG image data, JFIF standard 1.02
JPG JPEG image data, JFIF standard 1.02, comment
PDF PDF document, version 1.3
PNG data
PNG PNG image data, 420 x 294, 8-bit/color RGBA, non-interlaced
PNG PNG image data, 425 x 237, 8-bit/color RGBA, non-interlaced
PNG PNG image data, 500 x 300, 8-bit/color RGBA, non-interlaced
XLARGE1 JPEG image data, JFIF standard 1.02




No comments:

Post a Comment