[ale] diff being boneheaded

Ed Cashin ecashin at noserose.net
Wed Aug 7 14:53:40 EDT 2013


A more ubiquitous alternative to the GNU cat features is the "listing"
command of sed, although the precise behavior of sed isn't portable (the
general behavior producing some kind of unambiguous listing is portable).

  sed -n l file1 > file1.listing
...

Something folks haven't mentioned is that some versions of diff have
options that let you ask diff to work harder to minimize the number of
differences it lists.  There's a tradeoff between going fast and minimizing
the output.

I can imagine a use case where you wouldn't be able to sort and uniq
because order was significant.  In such a case, you could use use bdb in
python/ruby/perl/whatever to keep the a map of seen lines and their
locations in a bdb file.

On Wed, Aug 7, 2013 at 1:53 PM, Lightner, Jeff <JLightner at water.com> wrote:

> cat -vt on the files should show non-printing characters including tabs -
> maybe output of same to additional files and diff on those would help e.g.
> cat -vt file1 >file1.vt
> cat -vt2 file2 >file2.vt
> diff file1.vt file2.vt
>
> Also often if I have large files I want to diff I usually use sdiff
> instead (widening my terminal session as far as possible first) as it will
> insure the lines are reasonably close to each other.  (This is a side by
> side diff.)
>
>
>
>
>
> -----Original Message-----
> From: ale-bounces at ale.org [mailto:ale-bounces at ale.org] On Behalf Of
> Stephen R. Blevins
> Sent: Wednesday, August 07, 2013 1:29 PM
> To: Atlanta Linux Enthusiasts
> Subject: Re: [ale] diff being boneheaded
>
> Could there be tabs in one file's lines, and spaces in the other?  What
> about other unprintable characters?
>
> Stephen R. Blevins
> stephen.r.blevins at gmail.com
>
> On 08/07/2013 12:37 PM, Jim Kinney wrote:
> > I've got 2 text files  > 6M lines each. Each file is sorted in
> > dictionary order. diff is flagging identical lines between them as
> <snip>
> > See JOBS, ANNOUNCE and SCHOOLS lists at
> > http://mail.ale.org/mailman/listinfo
> >
>
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> http://mail.ale.org/mailman/listinfo/ale
> See JOBS, ANNOUNCE and SCHOOLS lists at
> http://mail.ale.org/mailman/listinfo
>
>
>
>
> Athena(r), Created for the Cause(tm)
> Making a Difference in the Fight Against Breast Cancer
>
> ---------------------------------
> CONFIDENTIALITY NOTICE: This e-mail may contain privileged or confidential
> information and is for the sole use of the intended recipient(s). If you
> are not the intended recipient, any disclosure, copying, distribution, or
> use of the contents of this information is prohibited and may be unlawful.
> If you have received this electronic transmission in error, please reply
> immediately to the sender that you have received the message in error, and
> delete it. Thank you.
> ----------------------------------
>
>
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> http://mail.ale.org/mailman/listinfo/ale
> See JOBS, ANNOUNCE and SCHOOLS lists at
> http://mail.ale.org/mailman/listinfo
>



-- 
  Ed Cashin <ecashin at noserose.net>
  http://noserose.net/e/
  http://www.coraid.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ale.org/pipermail/ale/attachments/20130807/d45f88a9/attachment.html>


More information about the Ale mailing list