[ale] parallel processing

jeff hubbs hbbs at mediaone.net
Mon Jan 7 18:40:24 EST 2002


Zyman, Andy wrote:

> Jeff,
> Thank You  for reply.
> Yes, if I specify "&" the job will go in the background. So having a couple
> of background jobs is the answer.
>  But the reason I was asking this is:
> 1. I don't know how many files the dir has, so I can't (?) specify 
> "filesA .... fileB shuld be copied by this job "
> "filesC .... fileD should be copied by this job "
> ....
> 
> I was thinking about this situation - cp  ./dirA/*   ./dirB 
> 
> The files are big enough to drive me nuts waiting when cp will be done (
> each about 5Gb -50GB X 10-15 files in diff. dirs)
> 2. To copy files, I'm creating the file with locations of these files and
> do 
> "while read"
> loop to copy each one in a time ( which is not efficient :< )
> I can't really apply "&" here because I need to check that all files are
> copied before proceeding any farther - this is mean "control point" in this
> operation...
> So I was thinking about smth. else, but not background....
> 
> Thank You
>  Andy
> office: 212 849 3543

Looks like you're running up against the limitations of disk drives. 
You're trying to interleave writes and reads within the same partition. 
  This is always a bad scenario, although trying to do the same thing 
between two partitions on the same drive might be worse, almost 
certainly if the partitions are on opposite ends of the drive.

My feeling is that trying to parallelize the file copies like we/you are 
suggesting could give you a *slightly* faster experience than trying to 
do it one at a time because the multiple processes will be fighting over 
the drive, trying to both read and write, and to some degree 
happenstance and the design of the drive, kernel disk I/O, and the file 
system will wind up helping you a bit.  However, you're never going to 
really get much of an edge this way, IMHO.

Assuming that you don't have a second drive, do you have enough RAM that 
you could create a ramdisk, copy the files to it, and then copy the 
files from the ramdisk to the destination?  Serializing what the drive 
has to do could give you a faster overall experience. Drives love long, 
sustained reads and writes.

If you have a choice of disk drives, pick the one with the fastest 
spindle speed and/or the most physical heads (pay no attention to the 
BIOS C/H/S data if the drive is even remotely recently - look up the 
specs via Google).  Drives love being able to read/write *across* the 
heads as opposed to radially across the platters.  I used to work with a 
Compaq box with a 2.1GB Quantum Bigfoot drive - a weird, horrible 
8"x5.25" contraption  with only a single platter and two heads.  It was 
about like dealing with a laptop drive.  A large internal cache is good 
too; some drives I've got lying around don't have any at all, I don't 
think.

I'm going to be facing this issue myself soon, as I've got a big stack 
of drives that were bought from Microseconds at $1 each ranging from 
60MB to 1GB, and I'll be using them as swap drives in boxes that will be 
booting over the network.  It's bad enough that I'll be having to rely 
on swap drives, but I want to try to use the fastest ones of at all 
possible.

You can use hdparm -t to get some feel for drive I/O speed but there's a 
utility called bonnie (search freshmeat or google) that's a lot better.

- Jeff


---
This message has been sent through the ALE general discussion list.
See http://www.ale.org/mailing-lists.shtml for more info. Problems should be 
sent to listmaster at ale dot org.






More information about the Ale mailing list