[ale] best way to copy 3Tb of data

Scott Plante splante at insightsys.com
Tue Oct 27 11:07:33 EDT 2015


You didn't say what you liked about the tarball. Is it the compression or the one file to deal with? In case it's the compression, here are some ideas. 


There is a FUSE compression filesystem called fusecompress. I can just install it on my desktop (openSUSE) from my regular OS repos. You may be able to do the same, or otherwise you can get the source and some instructions here: 
https://code.google.com/p/fusecompress/wiki/Usage 


Basically, you could just create a directory on your NAS (say, mkdir /storage/datadirectory) and then type: 
# fusecompress /storage/datadirectory 
That makes a compressed filesystem (which would contain the contents of /storage/datadirectory, if there were any) and mounts it back over that spot. You could then use rsync to copy to /storage/datadirectory and get the compression advantage of tarball along with the restart advantage of rsync. 


Similarly, if your underlying filesystem is already btrfs by chance, it has compression built in and you can enable it for a directory using chattr. I haven't really played with this though. 
https://btrfs.wiki.kernel.org/index.php/Compression 


By the way, I've been using rsync for many years and never set up an rsync server. I always use it via ssh. 


Scott 

----- Original Message -----

From: "Todor Fassl" <fassl.tod at gmail.com> 
To: "Atlanta Linux Enthusiasts" <ale at ale.org> 
Sent: Tuesday, October 27, 2015 9:33:37 AM 
Subject: [ale] best way to copy 3Tb of data 

One of the researchers I support wants to backup 3T of data to his space 
on our NAS. The data is on an HPC cluster on another network. It's not 
an on-going backup. He just needs to save it to our NAS while the HPC 
cluster is rebuilt. Then he'll need to copy it right back. 

There is a very stable 1G connection between the 2 networks. We have 
plenty of space on our NAS. What is the best way to do the caopy? 
Ideally, it seems we'd want to have boththe ability to restart the copy 
if it fails part way through and to end up with a compressed archive 
like a tarball. Googling around tends to suggest that it's eitehr rsync 
or tar. But with rsync, you wouldn't end up with a tarball. And with 
tar, you can't restart it in the middle. Any other ideas? 
Since the network connection is very stable, I am thinking of suggesting 
tar. 

tar zcvf - /datadirectory | ssh user at backup.server "cat > backupfile.tgz" 

If the researcher would prefer his data to be copied to our NAS as 
regular files, just use rsync with compression. We don't have an rsync 
server that is accessible to the outside world. He could use ssh with 
rsync but I could set up rsync if it would be worthwhile. 

Ideas? Suggestions? 



on at the far end. 

He is going to need to copy the data back in a few weeks. It might even 
be worthwhile to send it via tar without uncompressing/unarchiving it on 
receiving end. 



_______________________________________________ 
Ale mailing list 
Ale at ale.org 
http://mail.ale.org/mailman/listinfo/ale 
See JOBS, ANNOUNCE and SCHOOLS lists at 
http://mail.ale.org/mailman/listinfo 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ale.org/pipermail/ale/attachments/20151027/90d6eae0/attachment.html>


More information about the Ale mailing list