[ale] initramfs capable of a multi-device btrfs / new initramfs with dracut help?

Tue Jul 20 04:31:03 EDT 2010

On Tue, 20 Jul 2010 00:35:54 -0400
"Michael B. Trausch" <mike at trausch.us> wrote:

> The idea would be that the bootloader has no dependency on the OS, nor
> the OS' filesystem.  That would also make it amenable to be placed on
> a truly read-only device, such as a ROM chip, a locked SD card, or
> whatever.

The trouble is that there are already too many standards.  The PC world
may have been stuck with the same old BIOS boot sequence for probably
as many years as I can remember, but at least we know what to expect.

> GNU GRUB gets around this, as an example, by installing itself on the
> Linux /boot partition.  That means that the filesystem that
> houses /boot, whether that is the root filesystem or a dedicated
> filesystem, must be able to be read by GRUB.  For those who remember
> and/or still use the LILO boot loader, it solves that problem by being
> very tiny and using a block list, so that it can be filesystem
> agnostic. That is, IMHO, another nasty way to solve that problem.

IIRC, LILO seemed to Do The Right Thing when you told it to use
a /dev/md*.  I'm pretty sure GRUB isn't nearly as intelligent.

> > I've always been one to champion the durability of flash but I had
> > to RMA my X25-m twice in the first 6 months.
> 
> Eh.  A 128 MB SD card would be sufficient (as a Linux /boot partition,
> that is) if you don't keep too many dusty old kernels around or aren't
> someone who rebuilds the kernel frequently without sweeping up the
> cruft.

The size isn't the issue here.  I'm just trying to say that I've been
endorsing all sorts of flash over the years for its durability.  I'm
just not having any luck lately, and even if the media isn't spinning
I'm still not going to trust my servers' boot processes to a single
piece of media that can fail.

> Personally, I don't RAID the operating system itself.  I will backup
> the operating system partition on occasion, but it's not a critical
> thing since it can be regenerated; it's more important to me to
> ensure that I can bring the system back to an identical state from
> install media and data in my ${HOME}.  I keep the data as separate as
> possible, and that is always backed up.  At this point, the only mode
> of failure that I'm vulnerable to is a nuclear bomb hitting Atlanta
> and wiping out all of the life in the metro area---obviously, if
> something that catastrophic happens, I have more important things to
> worry about than my client's data.  Or rather, I'll not have anything
> to worry about at all.

I don't want that much down time.  I'd prefer not to have any downtime,
but in this case my frugality was a bad idea :).

Having to reinstall a machine because of a drive failure is
significantly more expensive than just buying the extra drive and
setting up a RAID to begin with.  I don't get paid to reinstall my own
OS when a drive fails :).  Either I spend the time or I have to pay
someone else.  

> > If I'm using software raid it is very likely to be in a lower end
> > machine.  It probably doesn't have swappable drive cages, so I'm
> > going to have to bring it down to replace that boot drive.  If I
> > put a blank drive in there is the BIOS going to tell me no OS
> > found?  It will likely depend on the machine...
> > 
> > I only mention this because I've been thinking about it today.  I'm
> > planning to drive out to replace the boot drive in a colocated
> > machine of mine.  I saved a few bucks buying a server with no hot
> > swap cages. If I could hot swap, I could just pop the drive in an
> > reload grub.
> 
> I always go with an Ubuntu Live CD and a server install disc---and a
> duplicate of the most recent data backup, when possible in a
> drop-in-ready form.

I'm prepared to make things work no matter what happens.  The problem
isn't making it work.  I'm just trying to illustrate a problem I have
with software RAID.  You can't be certain how it is going to react
during a hardware failure.  The right kind of failure on the first disk
could leave the machine unbootable.  On some hardware any failure of
that disk might make the machine unbootable.

It is cheap to pay the colo to replace a drive.  Much less so if they
have to fiddle with it to get my OS to boot.

> > I'm being paranoid, I dd-ed down a copy of the drive up to the end
> > of the /boot partition and dd-ed it onto the replacement drive.  :)
> 
> You could keep a local mirror, rsync'd daily... wouldn't change that
> much, but worthwhile if you forget to run a backup immediately after
> changing it (or automatic updates of any fashion are enabled, which I
> should think not, but that's just my own opinion).

I'm very paranoid with regard to backups.  You don't worry about having
your OS installed on a RAID, I don't care about backing up the OS.
Even if I did, the backups aren't any more accessible to me than the
disks in the server.

I'm hoping I'm prepared enough so that I can just bolt in the new
disks, boot it up, start the rebuilds with mdadm and walk out the
door.  I'll be very sad if I need to dig out my laptop.

I usually recommend not running identical drives in a RAID.  If you buy
a stack of identical drives there's a good chance they're from the same
manufacturing batch.  This 1u guy of mine holds four drives, I bought 2
different brands.  He's not in production yet, but he's fixing to be
now so I was upgrading the host and all the virtual machines the other
day.

During the updates two of the same-brand drives dropped out of their
mirrors in the RAID 10.  So if you're buying drives from different
manufacturers make certain not to pair up like drives :).

Pat