Finding Unique Files

I've posted before about finding duplicate MP3 files. The other day, I found myself needing to do the opposite; this time with pictures. Pictures are a little easier to identify (at least in my case) because an MD5 over the entire content of the image is sufficient to identify images (MP3s require only hashing the non-ID3 portion of the file). Here's what happened:

After a small mishap with my photos, I needed to do a partial restore from backup (a nightly rsync -a --delete). I restored some files and then ran fdupes to remove any duplicates. However, since the mishap involved moving photos between folders, renaming some files, and deleting others, I wasn't sure if I had restored all the affected photos. To find out, I used a little shell foo:
$ find /pictures/ /backup/pictures/ -type f -exec md5sum {} \; > md5sums
$ sort md5sums | uniq --check-chars=32 --unique
This results in a list of files that only exist in either the primary or backup location. I expected to find some files that I had failed to restore properly. Instead, I was surprised to find some of my pictures were not backed up! Further investigation found that my backup drive was full and that my cron emails were being deposited in the spam folder...

The moral of this story is that finding unique files is just as useful as finding duplicates and that it can help you determine that both your restoration was successful and that your backup is complete.

No comments:

Post a Comment