[ale] Supporting Linux on super computers?

Leam Hall leamhall at gmail.com
Tue Jun 4 07:47:05 EDT 2024


Nice video!

At one site we tried SuSE on IBM Z. They gave us a several day class on the hardware, and it was pretty cool. I was a little bit peeved when we ran into a disk count issue, the OS couldn't take more that twenty-some disks. I wasn't upset about the limit, but that they already had a patch for it and the IBM support engineers didn't bother applying it until we lost days on the project.

I don't think the company moved forward after the Proof of Concept, but it was a fun learning experience.

Leam

On 6/3/24 17:55, Brian Stanaland via Ale wrote:
> Lawrence Livermore released a video today talking about application
> tuning. It's pretty interesting.
> https://youtu.be/IZSWymZmkc0?si=BmzMqXEmU0eCG1a2
> 
> Slurm is a big part of scheduling for clusters. Even the commercial
> cluster manager tools like Bright depend on it.
> 
>  From the hardware side, they tend to use open source monitoring tools.
> Nagios, Icenga, etc. Plus manufacturers have their own tools to keep
> an eye on the systems and scripts to collect logs. Lots of the tools
> are the same, just scaled up for more hardware. Like MD raids having
> 106 drives.
> 
> Brian
> 
> On Mon, Jun 3, 2024 at 6:02 PM Scott McBrien via Ale <ale at ale.org> wrote:
>>
>> High Performance Compute (HPC), or supercomputers, are custom built to pool compute resources.  Generally their workloads are scheduled and execute batch.  So in addition to the compute node management, the scheduling software is also critical for the cluster.  Traditionally, one manipulates the node to have all the application and data required for its allotted time, however I have a lot of conversations with people in this segment about containerization of their applications, models, and data.  This makes the cleanup from removing a job from the cluster pretty trivial as you just kill off the containers, and pull and run the next containerized job for the next work cycle.
>>
>> An example of a job in this realm would be things like:
>> Animation render farming
>> Cryptocurrency mining
>> Scientific compute (like gene sequencing or statistical modeling or prediction)
>> Geospatial analysis
>> Geologic survey analysis
>> Graphic prediction and modeling (such as simulated crash results for automotive engineers)
>> Simulated wind tunnel testing (aeronautics and automotive)
>>
>> Etc.
>>
>> -STM
>>
>>> On Jun 3, 2024, at 4:05 PM, Leam Hall via Ale <ale at ale.org> wrote:
>>>
>>> For those of you who know, what's different about supporting Linux on supercomputers?
>>>
>>> Thanks!
>>>
>>> Leam
>>>
>>>
>>> --
>>> DevSecOps Engineer         (reuel.net/resume)
>>> Scribe: The Domici War     (domiciwar.net)
>>> General Ne'er-do-well      (github.com/LeamHall)
>>> _______________________________________________
>>> Ale mailing list
>>> Ale at ale.org
>>> https://mail.ale.org/mailman/listinfo/ale
>>> See JOBS, ANNOUNCE and SCHOOLS lists at
>>> http://mail.ale.org/mailman/listinfo
>> _______________________________________________
>> Ale mailing list
>> Ale at ale.org
>> https://mail.ale.org/mailman/listinfo/ale
>> See JOBS, ANNOUNCE and SCHOOLS lists at
>> http://mail.ale.org/mailman/listinfo
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> https://mail.ale.org/mailman/listinfo/ale
> See JOBS, ANNOUNCE and SCHOOLS lists at
> http://mail.ale.org/mailman/listinfo

-- 
DevSecOps Engineer         (reuel.net/resume)
Scribe: The Domici War     (domiciwar.net)
General Ne'er-do-well      (github.com/LeamHall)


More information about the Ale mailing list