[ale] Filed De-duplication
leam hall
leamhall at gmail.com
Fri Oct 18 13:14:45 EDT 2013
Hmm....
This <pseudo-code> is off the top of my head, so there are probably some
serious issues with it.
for file in `find /my_dir`
do
MD5=`md5sum $file`
EXISTS=`grep $MD5 <file_of_sums> | wc -l`
if [ $EXISTS -ne 0 ]
then
EXISTS=0
rm $file
else
echo "$MD5" >> <file_of_sums>
fi
done
On Fri, Oct 18, 2013 at 12:59 PM, JD <jdp at algoloma.com> wrote:
> Slashdot had a question about this 1-2 yrs ago. Lots of people suggested
> scripting it, others pointed out some C code on sourceforge.
>
> I had a few hrs free that day and wrote some Perl (200+ LOC). Use it all
> the
> time, but I'd probably go with the C tool for any very large datasets.
> Mine
> doesn't automaticly remove anything and is far from perfect, that is
> certain.
> It is relatively fast on most types of files, however.
>
> On 10/18/2013 12:34 PM, Calvin Harrigan wrote:
> > Good Afternoon,
> > I'm looking for a little advice/recommendation on file de-duplication
> > software. I've have a disk filled with files that most certainly have
> > duplicates. What's the best way to get rid of the duplicates. I'd like
> to
> > check deeper than just file name/date/size. If possible I'd like to
> check
> > content (checksum?). Are you aware of anything like that? Linux or
> windows is
> > fine. Thanks
> > _______________________________
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> http://mail.ale.org/mailman/listinfo/ale
> See JOBS, ANNOUNCE and SCHOOLS lists at
> http://mail.ale.org/mailman/listinfo
>
--
Mind on a Mission <http://leamhall.blogspot.com/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ale.org/pipermail/ale/attachments/20131018/2f3ac3ba/attachment.html>
More information about the Ale
mailing list