[ale] Backup large files to span DVDs

Alex Carver agcarver+ale at acarver.net
Wed Oct 28 14:07:01 EDT 2015


On 2015-10-28 10:03, Brian Mathis wrote:
> On Wed, Oct 28, 2015 at 2:19 AM, Alex Carver <agcarver+ale at acarver.net>
> wrote:
> 
>> On 2015-10-27 19:51, Steve Litt wrote:
>>> On Tue, 27 Oct 2015 08:02:39 -0700
>>> Alex Carver <agcarver+ale at acarver.net> wrote:
>>>
>>>> Ok, Google is failing me today.  I've got some large files (larger
>>>> than 5GB each) that I need to archive to DVD so I'll have to span
>>>> DVDs.  The problem is I can't seem to find the appropriate magic
>>>> incantations to accomplish this.
>>>
>>> You've gotten a bunch of great answers to the question of how to split
>>> files. Another aspect is how to pack them onto DVDs. For instance, that
>>> last file of each split will be smaller than the DVD capacity, and if
>>> you have some files that are naturally smaller than the DVD capacity,
>>> you need to pack them too.
>>>
>>> I once wrote a Ruby program to perform a backtracking algorithm to find
>>> the very tightest way to pack files. That's the tightest alternative,
>>> but years later, I found myself loathe to modify that program (when Ruby
>>> changed versions) because it was so complicated. A slightly less
>>> efficient, but *much* simpler algorithm is to keep on writing the
>>> biggest file that will fit in the remaining space. This is especially
>>> good if you have lots of files of widely varying space.
>>
>> Packing efficiency doesn't matter to me in general for backups (I care
>> more about having the copy even if it takes extra media) but I also
>> can't do anything unusual in this case because it's a work backup.
>> Archival data has to be recoverable in the future so there's a limited
>> set of tools that I am permitted to use to generate the archive in the
>> first place.  So I could tar all the files into one big lump, split them
>> with split into 4 GB chunks and burn all of those chunks (plus the small
>> chunk that was left over at the end) onto however many DVDs it takes.
>> That's what I did because tar and split (and later, cat for merging) are
>> standard and accepted.  None of the final files on the DVDs are unusual
>> (other than the DVDs having a single file with the split names of xaa,
>> xab, etc.) and I only have to include a small text file to describe the
>> file and the recovery method.
>>
>> So that's exactly what I did in this case because I had six files to
>> archive that were over 3GB each (an 18, a 13, a 7, a 4, and a couple 3's
>> and all of these were actually single-file backups from a specific
>> backup program).  It worked well, I burned four DVDs with no issues.
>> Three were full and the last was about 25% full but that's ok.
>>
>>>
>>>>
>>>> The file image and burning tools tools available are genisoimage and
>>>> wodim (unfortunately no mkisofs or cdrecord, thanks Debian).
>>>
>>> Wodim is just the new name for cdrecord.
>>>
>>> I wasn't aware Debian didn't offer mkisofs. I used Wheezy til I
>>> dumpstered it in favor of Void two weeks ago, and I *know* it had
>>> growisofs, because I used growisofs every time I did a Blu-ray backup.
>>>
>>
>> I know wodim is cdrecord except it's not, there's some bugs in wodim
>> that aren't present in cdrecord.
>>
>> As for mkisofs, yes Debian pulled the wodim stunt and created
>> genisoimage to replace mkisofs.  Growisofs is actually a symlink/script
>> that calls genisoimage.
>>
>> _______________________________________________
>> Ale mailing list
>> Ale at ale.org
>> http://mail.ale.org/mailman/listinfo/ale
>> See JOBS, ANNOUNCE and SCHOOLS lists at
>> http://mail.ale.org/mailman/listinfo
>>
> 
> 
> For archival purposes, be aware that DVDs, especially burned ones, degrade
> pretty quickly over time.  I'd probably make a duplicate of each disc.
> Also, you might consider creating parchive[1] recovery sets of all the
> files, so if there are partial problems when reading them back, you can
> hopefully recover that data from the parchive set.  There is a CentOS rpm
> for that somewhere.
> 
> In the future, instead of breaking down the files into large chunks, break
> them down into smaller ones (like 250MB), and then you can pack them onto
> the disc as needed.  Also, the split command allows you to set a prefix,
> which is really a better way to name files than accepting the default 'xaa'
> format.  Something like:
>     split --bytes=250M datafile datafile-split.
> will result in files named like "datafile-split.aa", "datafile-split.ab",
> etc...
> 
> [1] https://github.com/Parchive/par2cmdline

I do know that the DVDs (and dye based CDs, Bluerays, etc) degrade.
They get regenerated once a year for that reason.  Experiments are
ongoing with a mineral based recording medium (not just here at work but
at the Library of Congress, too.)  I also have other forms of the same
backup, I just need the DVD for off-site storage.

Parchive is not currently on the approved list of utilities.  I can only
use very well known utilities and methods that have a high degree of
longevity in terms of support or understanding.  Things like tar and cat
are very well understood, have been around for decades and are not
likely to go anywhere for a long time.  Parchive hasn't been around as
long and the specification and implementation is still changing.


More information about the Ale mailing list