[ale] Backup large files to span DVDs

Alex Carver agcarver+ale at acarver.net
Wed Oct 28 16:36:36 EDT 2015


It will help me for my personal stuff so I do plan to use all the
information in this thread for that purpose.  I keep several spinning
disk copies of stuff at home but I have been planning on investing in an
LTO system (or at least trying to revive my old Travan drives) and a DVD
or Blueray backup so adding a parity system to that would be good.

Like my other email I will defend work a bit to say that they're not
totally averse to introducing new technology but projects here can have
lifespans of decades and new projects sometimes want to see data from
old projects so retention and retrieval is vital.  Most of the heavy
duty backups for big projects are in various network storage systems
that have redundancy, multiple off-site live copies and hot storage in
remote data centers, and multiple types of cold storage in on and
off-site warehouses.

Small projects (like my R&D work) have some access to that
infrastructure but it's often easier to do more basic backup like you
would at home.  Usually this is things like backup copies of equipment
specific software, log data from experimental hardware, etc.  I do the
sysadmin work for my R&D group as a side-job because we'd otherwise have
to contract out to the IT department at a very high hourly rate.  Doing
it myself makes any group and project IT needs much more responsive than
the standard route (this is not unusual, most small projects handle
their own IT) but I still have to follow certain practices for data
retention.

The basics of the retention policy are:

1. all data should ideally be human readable (e.g. telemetry log files
would be text not binary) before choosing any binary format or
duplicated into human readable form

2. anything that is binary ideally must be in an established, well
documented format (e.g. images, CAD drawings, etc.)

3.  anything in a proprietary binary format must have a readable copy
exported to an open format (e.g. CAD drawings would need to be exported
to a standard format or converted to standard images that could be read
by a human and reconstructed)

4.  anything binary that is custom developed in-house would also have to
be documented fully with code provided to implement a decoder using a
well established, standard programming language along with having a
waiver for implementing custom code

5. changes in any generated data must be version controlled (e.g.  CAD
drawings, custom code sources, etc.)

6. multiple media types would be used for storage including paper prints



Sometimes it makes things hard to do compared to people on the outside
(as this thread showed a lot of useful tools) but you get used to it and
it's not really a significant portion of my time so the effort to make
big changes isn't a good use of my time or project money.  If I were in
charge of one of the near-billion dollar projects then I might make more
of a fuss. :)

On 2015-10-28 12:19, DJ-Pfulio wrote:
> Below won't help Alex.  Some environments are locked down and we have to live
> with it. I wish more were - like chemical plants, power plants, other control
> centers. I understand completely, having deployed one of those.
> 
> Getting **any** new code introduced just isn't worth the effort
> post-systems-deployment. Places like that are in "the devil we know" mode, which
> is completely understandable. Introducing **anything new** into these
> environments can break things ... or worse! There are extremely good reasons for
> this.
> 
> Don't know if parchive is the same code as par2, which I've been using for over
> a decade now.
> 
> par2 has saved some of my data a few times. Some of the optical media is over a
> decade old and only losing a few bits here and there.  Call it "nice to have"
> data, not mission critical.
> http://blog.jdpfu.com/2011/06/12/optical-data-recovery-technique-with-ddrescue-and-par2
>  explains a use to recover almost lost data with assurance that it is the same
> as what was archived.  It has a trivial par2 creation script.
> 
> 
> On 10/28/2015 02:37 PM, Alex Carver wrote:
>> On 2015-10-28 11:24, James Sumners wrote:
>>> On Wed, Oct 28, 2015 at 2:07 PM, Alex Carver <agcarver+ale at acarver.net>
>>> wrote:
>>>
>>>> Parchive is not currently on the approved list of utilities.  I can only
>>>> use very well known utilities and methods that have a high degree of
>>>> longevity in terms of support or understanding.  Things like tar and cat
>>>> are very well understood, have been around for decades and are not
>>>> likely to go anywhere for a long time.  Parchive hasn't been around as
>>>> long and the specification and implementation is still changing.
>>>>
>>>
>>> I'm not clear on your understanding of how Parchive works and what it is
>>> for. It's merely for verifying the integrity of data, and repairing said
>>> data if there is corruption. It is not an archive file format ala tar, zip,
>>> et alii.
>>>
>>> As for longevity, Parchive is nothing more than an application of
>>> Reed-Solomon coding. The algorithm isn't new, and it is used pretty much
>>> everywhere[1].
>>>
>>> [1] -- https://en.wikipedia.org/wiki/Reed–Solomon_error_correction
>>
>>
>> I understand that it's not a file format itself but it does require an
>> additional utility to implement the method (the parchive client) and
>> that's what I'm not allowed to use.  Generating the parity files would
>> just be a waste of time in this case because I would not be able to get
>> the utility approved.  The current approved method for corruption
>> mitigation is multiple media types and duplicate copies (e.g. a magnetic
>> copy, an optical copy, a hard copy for things that can be printed, etc.)
>>
>> I'll certainly consider it for my personal data storage because it looks
>> like a good thing to have, but I just can't do it at work.  This is why
>> there are ten hard drives, many spindles of disc blanks, and lots of
>> binders at my desk.
>> _______________________________________________
>> Ale mailing list
>> Ale at ale.org
>> http://mail.ale.org/mailman/listinfo/ale
>> See JOBS, ANNOUNCE and SCHOOLS lists at
>> http://mail.ale.org/mailman/listinfo
>>
> 
> 
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> http://mail.ale.org/mailman/listinfo/ale
> See JOBS, ANNOUNCE and SCHOOLS lists at
> http://mail.ale.org/mailman/listinfo
> 



More information about the Ale mailing list