I created a script called findDups.py - and it's one of the more involved Python scripts I've written.
I probably won't take the time to document the whole thing but some key elements used are:
shutil: failed to copy on the mac correctly. next trying commands module.
os.path: for useful path stuff
raw_input: to get input from the user BEFORE blowing away a bunch of files.
getopt: for processing the command line arguments.
operator.add: To allow me to use the built in reduce() function, to sum a list of custom objects.
time: to manage file date and time
And the biggest change had to be installing a new python module Image (PIL) to allow me to get image attributes from the files.
It uses the progress bar module (pb.py) to display text progress bars for many of the operations.
The code uses a dictionary to determine Uniqueness. If files share the same base name, they are considered unique. The dictionary contains a list of every file name in a directory tree. If files are unique, the list will have one item. If files share the same basename, then they are considered duplicates.
The script can print out all the files into a CSV
The script can print a list of ONLY the duplicate files.
The script can filter out all NON image files from processing.
The script can limit the scope to only “safe” image file formats
The script “should” be able to copy only unique files to a new directory (non-destructive culling)
The script should be able to remove duplicates from the existing tree (destructive culling)