[ale] Filed De-duplication
Jeff Hubbs
jhubbslist at att.net
Fri Oct 18 17:00:23 EDT 2013
When I was running a previous employer's file server (that I built on
Gentoo, btw, referencing the other thread), I would pipe find output to
xargs to md5sum to sort so that I could get a text file that I could
visually eyeball to see where the dupes tended to be. In my view it
wasn't a big deal until you had, like, ISO images that a dozen or more
people had copies of - if that's going on, there needs to be some
housecleaning and organization taking place. I suppose if you wanted
you could script something that moved dupes to a common area and
generated links in place of the dupes, but I'm not sure if that doesn't
introduce more problems than it solves.
As for auto-de-duping filesystems - which I suppose involves some sort
of abstraction between what the OS thinks are files and what actually
goes on disk - I wonder if there wouldn't wind up being some rather
casual disk operations that could set off a whole flurry of r/w activity
and plug up the works for a little while. Fun to experiment with, I'm sure.
On 10/18/13 12:34 PM, Calvin Harrigan wrote:
> Good Afternoon,
> I'm looking for a little advice/recommendation on file
> de-duplication software. I've have a disk filled with files that most
> certainly have duplicates. What's the best way to get rid of the
> duplicates. I'd like to check deeper than just file name/date/size.
> If possible I'd like to check content (checksum?). Are you aware of
> anything like that? Linux or windows is fine. Thanks
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> http://mail.ale.org/mailman/listinfo/ale
> See JOBS, ANNOUNCE and SCHOOLS lists at
> http://mail.ale.org/mailman/listinfo
>
More information about the Ale
mailing list