[ale] Linux HA

Christopher Fowler cfowler at outpostsentinel.com
Wed Oct 31 15:21:52 EDT 2007


I've been testing some stuff in regards to Linux HA today.  Normally we
sell 2 servers.  One is a "master" and the other is a "slave".  I've
been testing today the capability to use a floating IP address and allow
the slave to take over for the master.  I have a few issues that do need
to be resolved before I can roll this out.  In my lab and colo I
experienced 2 issues that HA could not have saved me from.

#1.  Kernel not responding.

In this case I can ping the server.  All connect()'s from clients
seem to hang until they timeout.  In this scenario my slave will take
the IP address but the master will still have it and still answer pings.
Also he will still answer arp requests.  HA can't save me here.

#2.  Kernel and programs still respond but disks are off

In this case I/O to drives was hosed.  Apache would serve up pages that
were in memory but any request in a page on disk would result in that
connection hanging forever.  No I/O possible.  In this scenario the
heartbeat agent will probably still see a server that is working but the
reality would be a DoS condition.  Also upon seeing this issue I'm still
left with a server who will not relinquish his IP address.

In both cases it seems my only recourse is to allow my slave to also
control the power of the master.  If #1 and #2 exist the slave can
simply take the floating IP and make a determination if he needs to kill
power.  If so he can kill power and then the master can be repaired.

Ideas?

Chris





More information about the Ale mailing list