[ale] rsync comparisons

Greg Freemyer greg.freemyer at gmail.com
Thu Apr 1 13:53:54 EDT 2010


I only do half a TB a night, but my biggest slow down most nights is
disk i/o load at the destination node.

And I don't think file content is read during the file list creation
process, just the directory and inode metadata.  (ie. content by
default is only compared if the metadata shows the file has been
updated since the last rsync.)

So changing to whole file vs. normal chunk size compares will have no
impact on that phase.

Greg

On Thu, Apr 1, 2010 at 1:39 PM, scott <scott at sboss.net> wrote:
> time to do the rsync (from someone that rsyncs terabytes of data each
> day) comes down to the following factors:
> * load on both servers (either end of the rsync).  are you cpu,
> memory, disk IO bound?
> * bandwidth between the servers
> * how many files?  20GB of files, is that 20x 1gb files or 20,000x 1mb
> files?  the lesser the number of files the faster it goes.
> * full sync vs incremental (later is faster than the former).
> etc
>
> Now I have noticed once the two sides of a rsync have been synced at
> least once, generally majority of the time for the syncs are the
> "generating the transfer list".  I have 100gb filesystems that
> replicate faster than my 10gb ones.. all due to # of files, % of
> changed files..
>
> so there is many factors that come into why it takes soo long.
>
> On Thu, Apr 1, 2010 at 11:41 AM, Robert Coggins <ale at cogginsnet.com> wrote:
>> Well, what I a seeing is the syncing of roughly 20GB taking over an hour
>> for just a few megs of differences.  It stays in the "building file
>> list..." for almost all of this time.  I am trying to find a way to
>> speed that up.
>>
>> Rob
>>
>> On 04/01/2010 11:37 AM, scott wrote:
>>> rsync compares on a file level BUT it compares timedate stamps/sizes
>>> in "quick mode" (which is default).  but if you want it to compare
>>> file to file, use "-c or --checksum" option.  Now this puts a heavier
>>> load on both systems, since it does a MD5 checksum on every file that
>>> has the same timedate stamp/size on both sides of the sync.  Now if
>>> you want to force the copy of the whole file instead of the changed
>>> blocks, use the --whole-file option with it.
>>>
>>> I would use this ( -c & --whole-file) sparingly.  It is going to slow
>>> down the copies, put heavier loads on both ends and transfer more data
>>> (control data) back and forth.  I dont know your situation so I cant
>>> say to use it or or not.
>>>
>>> scott
>>>
>>>
>>> On Thu, Apr 1, 2010 at 11:00 AM, Robert Coggins <ale at cogginsnet.com> wrote:
>>>> Is there a way to do file level comparisons and not block level
>>>> comparisons using rsync?
>>>>
>>>> Rob
>>>> _______________________________________________
>>>> Ale mailing list
>>>> Ale at ale.org
>>>> http://mail.ale.org/mailman/listinfo/ale
>>>> See JOBS, ANNOUNCE and SCHOOLS lists at
>>>> http://mail.ale.org/mailman/listinfo
>>>>
>>> _______________________________________________
>>> Ale mailing list
>>> Ale at ale.org
>>> http://mail.ale.org/mailman/listinfo/ale
>>> See JOBS, ANNOUNCE and SCHOOLS lists at
>>> http://mail.ale.org/mailman/listinfo
>> _______________________________________________
>> Ale mailing list
>> Ale at ale.org
>> http://mail.ale.org/mailman/listinfo/ale
>> See JOBS, ANNOUNCE and SCHOOLS lists at
>> http://mail.ale.org/mailman/listinfo
>>
>
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> http://mail.ale.org/mailman/listinfo/ale
> See JOBS, ANNOUNCE and SCHOOLS lists at
> http://mail.ale.org/mailman/listinfo
>



-- 
Greg Freemyer
Head of EDD Tape Extraction and Processing team
Litigation Triage Solutions Specialist
http://www.linkedin.com/in/gregfreemyer
CNN/TruTV AiredForensic Imaging Demo -
   http://insession.blogs.cnn.com/2010/03/23/how-computer-evidence-gets-retrieved/

The Norcross Group
The Intersection of Evidence & Technology
http://www.norcrossgroup.com



More information about the Ale mailing list