[ale] Linux Cluster Server Room

Bob Toxen bob at verysecurelinux.com
Sun Apr 25 17:52:22 EDT 2004


On Mon, Apr 19, 2004 at 09:36:10PM -0400, Dow Hurst wrote:
> I understand your philosophy here but have a question?  What if the 
> calculations are long and costly to restart?  Shouldn't I look at the value 
> of spent computation that might have to be done over if I lose power?  The 
> code I am most concerned about running on the cluster may or may not be 
> checkpointable.  I think it might be, but I know my users and they won't 
> want power to be an issue with predicting when their jobs will finish. ;-)
You need to look at the time between power failures, the cost of the
UPS devices and generators, and the cost of repeating calculations.  Most
likely, getting UPSs that can supply power for 5 minutes will be the
optimum solution.  A power failure longer than that is rare, happening
perhaps once a year.

Compare the low-end APC to the other brands.  My 10 year old APC still is
running well (after new batteries from Batteries Plus about 3 years ago).
I think that the APC have supperior surge protection to the other brands.
Other than that, I'd not pay a huge premium for APC.  On the other hand
you have lots of equipment in your cluster that could get fried.

> Are Best UPS better performing than Tripplite or APC?  I have experience 
> with Tripplite, APC, and Leibert so far and never used Best.  I like the 
> toughness and quality of the enclosure of the APC and Leibert.  I like the 
> quality of all three.  I like the performance and cost of APC and 
> Tripplite.  Tripplite's cases or enclosures on the low end aren't as nice 
> as APC, but when you get the high UPSes they have nice rack enclosures.  
> Performance wise, I haven't been able to tell a difference between the two. 
> Heat production leans toward APC producing less overall.
APC is, by far, the best in my opinion.

> What do you mean by getting the wrong power factor conversion? Do you mean 
> getting 120v at 60Hz vs 220v at 60Hz on the output outlets?
PF is how closely the current cycle matches the voltage cycle.  As you
recall from physics, an inductive or capacitive load will have a
power factor of less tha 100%.  It probably is not worth worrying about
here.

> I appreciate all this advice!
> Dow
Bob



> Jeffrey B. Layton wrote:
> >I'll give you my 2 cents about clusters and UPS's if you wish.
> >
> >A good cluster configuration will treat each compute node as
> >an appliance. You don't really care about it too much and it
> >doesn't hold any data of any importance. What you care about
> >is the master node and/or where the data is stored These
> >machines can have their own UPS or a single UPS to cover
> >the machines (they may be more than one). Then take the cost
> >savings (if you can) and put them into more nodes, or a better
> >interconnect (if needed), or a large file system, or a better
> >backup system, or .... well, you get the picture.
> >
> >Thinking of only putting a UPS on the important parts of the
> >cluster will save you money, time, and headaches. However,
> >if you put a cluster in a server room you can have all power
> >covered by a single huge UPS and probably a diesel backup
> >generator as well. This goes back to the purpose of a server
> >room - to support independent servers, not clusters. While this
> >is nice and good, it is somewhat wasteful. If you could have
> >a combination of UPS/Diesel backed power and just regular
> >conditioned power, that would be more economical. However,
> >the budgets for clusters (computing) and the budget for facilities
> >are never really seen as related by management. Even though
> >they come out of the same overall pot within the company (or
> >university), management has a tendency to compartmentalize
> >things for easy managing (and the definite lack of brain power
> >on the part of most managers). Try arguing that you really
> >don't need the giant UPS/Diesel combo and you will get IT
> >managers screaming all sorts of things about you. Sigh.
> >
> >Of course, these comments depend on your cluster configuration.
> >If you are running a global filesystem across all of the nodes,
> >so that each node has part of the filesystem, then you might
> >want to think about a good UPS for all of the nodes (try
> >restoring a 20 TB global filesystem from backup after a
> >power outage).
> >
> >Good Luck!
> >
> >Jeff
> >
> >>What type of UPS system are you using? Do most install a large UPS 
> >>system for the entire server room? If so, how much will this cost?
> >>
> >>Thanks,
> >>Chris
> >>
> >>-----Original Message-----
> >>From: Dow Hurst [mailto:dhurst at kennesaw.edu]
> >>Sent: Monday, April 12, 2004 11:20 AM
> >>To: ale
> >>Subject: Re: [ale] Linux Cluster Server Room
> >>
> >>
> >>Thanks Jonathon!  That is exactly the kind of ballpark I needed!  I 
> >>don't need
> >>the vendors right now as we are still kicking around ideas.  If anyone 
> >>would
> >>throw some specs or ideas out there, I'd appreciate it.  Here is a quick
> >>question?  Is planning for double your planned load a good rule?  I would
> >>think that would be a good idea.  How about backup cooling if the main 
> >>unit
> >>dies?  The firesafe is one I had not thought of.
> >>Dow
> >>
> >>
> >>Jonathan Glass (IBB) wrote:
> >> 
> >>
> >>>How big are the Opteron nodes?  Are they 1,2,4U?  How big are the power
> >>>supplies?  What is the maximum draw you expect?  Convert that number to
> >>>figure out how much heat dissipation you'll need to handle.
> >>>
> >>>I have a 3-ton A/C unit in my 14|15 x 14|15 server room, and the 24-33
> >>>node cluster I just spec'd out from IBM (1U, Dual Opterons) was rated at
> >>>a max heat dissipation (is this the right word?) of 18,000 BTU. 
> >>>According to my A/C guy, the 3-ton unit can handle a max of 36,000 BTU,
> >>>so I'm well inside my limits.  Getting the 3-ton unit installed in the
> >>>drop-down ceiling, including installing new chilled water lines, was
> >>>around $20K.
> >>>
> >>>I do have sprinkler fire protection, but that room is set to release its
> >>>water supply independent of the other rooms. Also, supposedly, the fire
> >>>sprinkler heads (whatever they're called) withstand considerably more
> >>>heat than normal ones.  So, the reasoning goes, if it gets hot enough
> >>>for those to go off, I have bigger problems than just water.  Thus, I
> >>>have a fire safe nearby (in the same bldg...yeah, yeah, I know; off-site
> >>>storage!) that holds my tapes, and will shortly hold a hardware
> >>>inventory and admin password list on all my servers.
> >>>
> >>>If you want my list of vendors, send me an email off-list, or call my
> >>>office, and I'll see if I can track down the DPOs for you.
> >>>
> >>>Thanks
> >>>
> >>>Jonathan Glass
> >>>
> >>>On Fri, 2004-04-09 at 17:35, Dow Hurst wrote:
> >>>
> >>>  
> >>>
> >>>>If I needed to take an existing space 400 square feet w/8' ceiling, 
> >>>>20'x20'x8', and add A/C and fire protection for a server room, what 
> >>>>kind of cost would be incurred?  Sounds like an algebra problem from 
> >>>>highschool doesn't it?  Let's say a full 84" rack of 4CPU Opteron 
> >>>>nodes and supporting hardware were in the room.  Does anyone have 
> >>>>any ballpark figures they could throw out there?  Any links I could 
> >>>>be pointed to?
> >>>>Thank a bunch,
> >>>>Dow
> >>>>
> >>>>
> >>>>PS.  I'd like some other type of fire protection than sprinkler 
> >>>>heads. ;-)
Halon!



More information about the Ale mailing list