[ale] best way to copy 3Tb of data

Todor Fassl fassl.tod at gmail.com
Tue Oct 27 09:33:37 EDT 2015


One of the researchers I support wants to backup 3T of data to his space 
on our NAS. The data is on an HPC cluster on another network. It's not 
an on-going backup. He just needs to save it to our NAS while the HPC 
cluster is rebuilt. Then he'll need to copy it right back.

There is a very stable 1G connection between the 2 networks. We have 
plenty of space on our NAS. What is the best way to do the caopy? 
Ideally, it seems we'd want to have boththe ability to restart the copy 
if it fails part way through and to end up with a compressed archive 
like a tarball. Googling around tends to suggest that it's eitehr rsync 
or tar. But with rsync, you wouldn't end up with a tarball. And with 
tar, you can't restart it in the middle. Any other ideas?
Since the network connection is very stable, I am thinking of suggesting 
tar.

tar zcvf - /datadirectory | ssh user at backup.server "cat > backupfile.tgz"

If the researcher would prefer his data to be copied to our NAS as 
regular files, just use rsync with compression. We don't have an rsync 
server that is accessible to the outside world. He could use ssh with 
rsync but I could set up rsync if it would be worthwhile.

Ideas? Suggestions?



on at the far end.

He is going to need to copy the data back in a few weeks. It might even 
be worthwhile to send it via tar without uncompressing/unarchiving it on 
receiving end.





More information about the Ale mailing list