[ale] Server room issues
Dow Hurst
dhurst at kennesaw.edu
Thu Apr 22 09:29:07 EDT 2004
Someone added that having enough room in the server room to have a moveable
cart on big rubber wheels is very convenient. You can wheel to the rack your
working on and use as a bench as well as a cart. I like the cover over the
main kill switch.
Dow
Jim Popovitch wrote:
> Excellent write-up Dow. I only have one point to add.... ALWAYS put a
> plastic cover over the big red manual power kill switch, and for
> sanity's sake: locate the switch at maximum arm-length away from exit
> doors so as to make it highly inconvenient for contractors to hang their
> jacket on. :)
>
> -Jim P.
>
> On Thu, 2004-04-22 at 02:37, Dow Hurst wrote:
>
>>Here is a not so condensed listing of some advice from the list and others.
>>Read this as a rundown of a particular installation with some thoughts
>>injected. I'd appreciate comments or advice if you want to add anything.
>>Names were blacked out to protect someone:
>>
>>Starting Philosphy: servers should never shutdown, run 24/7; jobs should be
>>checkpointable for restart in case of failure; people more important than
>>servers; we are not a 99.99999 facility
>>
>>1. A/C issues
>>
>> 3 separate systems, each with 0.5 needed capacity to cool server room, so 1
>>can fail and be repaired w/out downtime on servers. No redundant piping of
>>coolant so that is a point of failure. Two window units (if windows
>>available) are installed as extra cooling capacity to handle either extra heat
>>load or main unit failure for only a short time. The window units make use of
>>the window space which would have been a leak point for the A/C anyway.
>>
>>Think of A/C and Power supply as a fixed capacity that is set when the room is
>>built. No more A/C or Power will be added due to the expense. So, plan for
>>future heat load and power load for the next 5-10 years.
>>
>>Put in a raised floor for A/C plenum to deliver air to the underside of
>>servers. Extremely valuable and in XXXXX's opinion the most important server
>>room feature. Perforated tiles allow control and tuning of the cool air flow
>>in the room. This is important since the airflow is never correct after the
>>room is filled with equipment. It always needs redirection after the servers
>>are installed. The floor needs to be of high quality design with steel
>>capable of supporting a 2000lb rack of equipment with ease. XXXXX indicated
>>that the floor should not deflect more than 1/32" for 4000lb/square inch
>>applied force. A steel loading dock ramp should be put in, not wood since
>>heavy equipment will move in up to the raised floor level.
>>
>>
>>An ale.org member, Jonathon Glass, said this:
>>"You should really look at the IBM e325 series (Opterons) for cooling. I have
>>4 of them (demo units) in a cluster. I felt the cases while running a demo,
>>and they were cool, if not cold, to the touch. IBM has spent a lot of money
>>on making these machines rack-ready, and cool running, and it has paid off."
>>
>>Also he said:
>>"How big are the Opteron nodes? Are they 1,2,4U? How big are the power
>>supplies? What is the maximum draw you expect? Convert that number to figure
>>out how much heat dissipation you'll need to handle.
>>
>>I have a 3-ton A/C unit in my 14|15 x 14|15 server room, and the 24-33 node
>>cluster I just spec'd out from IBM (1U, Dual Opterons) was rated at a max heat
>>dissipation (is this the right word?) of 18,000 BTU. According to my A/C guy,
>>the 3-ton unit can handle a max of 36,000 BTU, so I'm well inside my limits.
>>Getting the 3-ton unit installed in the drop-down ceiling, including
>>installing new chilled water lines, was around $25K."
>>
>>Another ale.org member, Chris Ricker, had this to say:
>>"Just to give you another price point to compare, we just spec'ed out getting
>>an additional 30 tons A/C (360,000 BTUs), and it's coming in at ~$100,000.
>>That's just for adding two more 15-ton units, as most of the other
>>infrastructure needed for that's already there...."
>>
>>
>>2. Power issues
>>
>> Power cabling is run under the raised floor to receptacles in the floor.
>>All circuits, except a couple of 120V 20A outlets, are 240V 30A single phase
>>with the possibility of three phase if needed. Large twist lock receptacles
>>were used so unplugging a power cord has to be a deliberate action. Some IBM
>>servers need two 60A 240V 3phase circuits since they were designed to replace
>>older IBM servers that used that type circuitry.
>>
>>Grounding of servers is thru the raised floor steel structure which would be
>>grounded thru the building ground for safety.
>>
>>No data cables should go inside the raised floor if possible, only power
>>cabling. A high ceiling that allows overhead data cables is ideal since
>>working in the cold air under the floor to install or fix data cables is an
>>unpleasant experience. However, XXXXX says don't sacrifice the raised floor
>>for overhead cable runs.
>>
>>UPSes were only used on the file servers and disk arrays. The main compute
>>servers were supported by space saving power conditioners that provide a pure
>>sine wave and suppress voltage changes. Space is at a premium in XXXXXXX and
>>the city power is excellent so this solution saved battery space on large kVA
>>capacity UPSes and in the long term the cost of battery replacements. We may
>>not have that luxury with the southern storms and above ground power grid.
>>Power strips were put in that have digital readouts showing current amperage
>>used in the circuit. These readouts allow you to know your amperage per
>>circuit in realtime and tune the load per circuit since most server power
>>supplies run at less current than their rating. Plus the strips will turn on
>>each outlet in a timed sequence during power up so the server loads are staged
>>and don't hit the circuits all at once. Dial-in modem control or terminal
>>server control is available on the power strips if wanted.
>>
>>Jonathon Glass had this to say:
>>"Just for the cluster, I have a 6kVa BestPower UPS. It'll run all 16 nodes
>>for about 15 minutes."
>>
>>Jeffrey Layton chimed in with this thought:
>>"We run CFD codes (Computational Fluid Dynamics) to explore fluid flow over
>>and in aircraft. The runs can last up to about 48 hours. Our codes
>>checkpoint themselves, so if we lose the nodes (or a node since we're running
>>MPI codes), we just back up to the last checkpoint. Not a big deal. However,
>>if we didn't checkpoint, I would think about it a bit. 48 hours is long time.
>>If the cluster dies at 47:59 I would be very upset. However, if we're running
>>on a cluster with 256 nodes with UPS and if getting rid of UPS means I can get
>>60 more nodes, then perhaps I could just run my job on my more nodes and get
>>done faster (reducing the window of vulnerability if you will).
>>
>>You also need to think about how long the UPS' will last. If you need to run
>>48 hours and the UPS kicks in about 24 hours, will the UPS last 24 hours? If
>>not, you will lose the job anyway (with no check pointing) unless you get some
>>really big UPS'. So in this case, UPS won't help much. However, it would
>>help if you were only a few minutes away from completing a computation and
>>just needed to finish (if it's a long run, the odds are this scenario won't
>>happen often). If you could just touch a file and have your code recognize
>>this so it could quickly check point, then a UPS might be worth it (some of
>>our codes do this). We've got generators that kick in about 10 seconds after
>>power failure. And the best thing is that they get tested every month (I can
>>tell you stories about installations that never tested their diesel).
>>
>>However, like I mentioned below, the ultimate answer really depends. If I can
>>tolerate the lose of my apps running then I can take the money I would dump
>>into UPS and diesel and buy more nodes. If your codes can't or don't check
>>point, then you might consider UPS and diesel. If you have a global file
>>system on the nodes (like many people are doing today) that you need up or you
>>need to at least gracefully shutdown, then consider a UPS and/or diesel.
>>
>>I guess my ultimate question is, "Is UPS and diesel necessary for all or part
>>of the cluster?" There is no one correct answer. The answer depends upon the
>>situation. However, don't be boxed into a corner that says you have to have
>>UPS and diesel."
>>
>>
>>3. Fire Detection versus Suppression
>>
>> XXXXX felt suppression systems might endanger the life of someone in the
>>server room when actuated, so a fire detection system was installed. This
>>detection system is wired into the A/C and power so can turn them off if fire
>>is present. The logic is that the A/C would be the primary point for a fire
>>to start or a overheating circuit so cutting power to A/C and servers would be
>>most likely to stop the problem.
>>
>>Also, insurance for major equipment items are written in as a rider on the
>>building or institution's insurance policy.
>>
>>Fire Suppression systems have come a long way and I am getting info on them.
>>A mixture of gases that suppress fire but allow people to breathe are
>>available and considered the norm for server rooms in businesses. A
>>particulate based system that leaves no residue is also available. I've
>>started the process to get info but exact room dimensions are required to
>>quote accurately. Probably $15-25K is about right for a suppression system.
>>
>>Building codes are strict in XXXXXXX, so whatever building codes require is
>>what they had to live with. The sprinkler heads can be switched to high
>>temperature heads and the piping can be isolated from other areas to prevent
>>disasters, but we may not be able to eliminate sprinklers from the area.
>>XXXXX explained that an extension cable is illegal in their server rooms and
>>most of XXXXXXX due to building codes. In the South codes will be much more
>>relaxed.
>>
>>Jonathon Glass said:
>>"I do have sprinkler fire protection, but that room is set to release its
>>water supply independent of the other rooms. Also, supposedly, the fire
>>sprinkler heads (whatever they're called) withstand considerably more heat
>>than normal ones. So, the reasoning goes, if it gets hot enough for those to
>>go off, I have bigger problems than just water. Thus, I have a fire safe
>>nearby (in the same bldg...yeah, yeah, I know; off-site storage!) that holds
>>my tapes, and will shortly hold a hardware inventory and admin password list
>>on all my servers."
>>
>>
>>4. Room Location Caveats
>>
>> Don't be far from the loading dock to ease the movement of equipment into
>>the server room. Elevators, stairs, tight turns, doorways, and possibilities
>>of flooding and corrosive gases are not obstacles we want our room to be near.
>>XXXXX has one room at the level of the xxxx river so flooding is a concern.
>>He mentioned that a large drain in the center of the floor is always nice to
>>have. Large servers may not fit thru doorways or in elevators. IBM ships a
>>large server in two pieces at an extra charge of thousands of dollars because
>>some installations can't fit the server thru a door or up a stair. XXXXX and
>>XXXXX ran into this on their P960 IBM server. They got the installation for
>>free but it was a pain to deal with.
>>
>>A continuous hinge steel door is good for sealing in cool air and discouraging
>>theft, but is hard to remove for equipment installation. A key lock works
>>when their is no power, battery or otherwise. ;-) A keycard electronic lock
>>is good for multiple employees entering the room since you can track entry
>>time and card codes, but the lock should be able to be opened in case of loss
>>of power. (Batteries are used alot for this, I think. I'll ask XXX how the
>>keycard locks work at KSU since I ought to know that anyway.)
>>
>>
>>5. Server Cabinets
>>
>> Most standard clusters come in standard racks except for blades. Some
>>special large servers like the IBM P960 come in special racks that are a
>>required purchase. The SGI Altix can be put in a standard rack. So, APC
>>makes nice standard racks that can be purchased. We would then install the
>>Altix parts into the standard rack and go from there. Skip the front doors on
>>the racks unless you share the server room and feel you must lock up the
>>servers. Keep the back doors since cabling and sensitive connectors need
>>protection. Standard racks can come in half rack, full rack, and extra tall
>>rack sizes. The extra tall sizes won't fit in elevators and may need some
>>special help to get into the server room. A 47U rack full of servers will
>>easily weigh 2000lb, so expect to place them carefully before filling them up!
>> Sliding rails are preferred for compute nodes but not preferred for disk
>>arrays. Disk arrays usually have removeable disks from the front for hot swap
>>replacement so sliding the array out doesn't help. Definitely high on XXXXX's
>>list of must haves is the LCD and keyboard mounted on a sliding rail. You use
>>this to access the servers via a serial connection so is immune to network
>>problems. A terminal server for serial access or a KVM switch is recommended.
>> They have a $20K Raritan KVM switch while others rave about a Cyclades
>>terminal server. I'm familiar with both products thru articles and
>>advertisements, but have only used the cheap 4 port or 2 port KVM switches
>>myself. A KVM or terminal server setup might handle up to 128 or 256 console
>>connections that are hardwired via serial cables or a special backplane
>>connection. The IBM blades have a special backplane that has the network
>>wiring and console wiring embedded in it.
>>
>>You need the steel loading ramp when your delivery guys take a running start
>>to get a 1 ton server up onto the raised floor!
>>
>>XXXXX has two 30A 240V circuits per cabinet installed in the floor. I imagine
>>these are in the floor just past the rear of the cabinet's footprint.
>>
>>Eleven inches are needed for overhead cable raceways if overhead cable runs
>>are put in place.
>>
>>At XXXXX they have one server room for proprietary servers and one room for
>>standard rack based servers.
>>
>>
>>5. Networking
>>
>>Not discussed yet.
>
>
>
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> http://www.ale.org/mailman/listinfo/ale
>
--
__________________________________________________________
Dow Hurst Office: 770-499-3428 *
Systems Support Specialist Fax: 770-423-6744 *
1000 Chastain Rd. Bldg. 12 *
Chemistry Department SC428 Email: dhurst at kennesaw.edu *
Kennesaw State University Dow.Hurst at mindspring.com *
Kennesaw, GA 30144 *
************************************************************
This message (including any attachments) contains *
confidential information intended for a specific individual*
and purpose, and is protected by law. If you are not the *
intended recipient, you should delete this message and are *
hereby notified that any disclosure, copying, distribution *
of this message, or the taking of any action based on it, *
is strictly prohibited. *
************************************************************
More information about the Ale
mailing list