[ale] comparing files
    Michelangelo Grigni 
    mic at mathcs.emory.edu
       
    Thu Jul 18 12:59:51 EDT 2002
    
    
  
x3 writes:
> I have 100 text files that are all about 100Kb in size. The data in the files 
> is supposed to be sequential - however, in my haste to backup the files from 
> a dying system, I copied repetitive data in some of them.
> ... Anyone know of a program that can d00 this in Linux (or even Win)?
To find and report common passages among many text files,
try a plagiarism detector such as "copyfind" at:
  http://plagiarism.phys.virginia.edu/home.html
In the usual application the files are student writing or
programming assignments, so they would tend to be shorter
than your files; I am not sure whether this will become a
performance issue.
---
This message has been sent through the ALE general discussion list.
See http://www.ale.org/mailing-lists.shtml for more info. Problems should be 
sent to listmaster at ale dot org.
    
    
More information about the Ale
mailing list