[ale] Supporting Linux on super computers?

Brian Stanaland brian at stanaland.org
Mon Jun 3 18:55:48 EDT 2024


Lawrence Livermore released a video today talking about application
tuning. It's pretty interesting.
https://youtu.be/IZSWymZmkc0?si=BmzMqXEmU0eCG1a2

Slurm is a big part of scheduling for clusters. Even the commercial
cluster manager tools like Bright depend on it.

>From the hardware side, they tend to use open source monitoring tools.
Nagios, Icenga, etc. Plus manufacturers have their own tools to keep
an eye on the systems and scripts to collect logs. Lots of the tools
are the same, just scaled up for more hardware. Like MD raids having
106 drives.

Brian

On Mon, Jun 3, 2024 at 6:02 PM Scott McBrien via Ale <ale at ale.org> wrote:
>
> High Performance Compute (HPC), or supercomputers, are custom built to pool compute resources.  Generally their workloads are scheduled and execute batch.  So in addition to the compute node management, the scheduling software is also critical for the cluster.  Traditionally, one manipulates the node to have all the application and data required for its allotted time, however I have a lot of conversations with people in this segment about containerization of their applications, models, and data.  This makes the cleanup from removing a job from the cluster pretty trivial as you just kill off the containers, and pull and run the next containerized job for the next work cycle.
>
> An example of a job in this realm would be things like:
> Animation render farming
> Cryptocurrency mining
> Scientific compute (like gene sequencing or statistical modeling or prediction)
> Geospatial analysis
> Geologic survey analysis
> Graphic prediction and modeling (such as simulated crash results for automotive engineers)
> Simulated wind tunnel testing (aeronautics and automotive)
>
> Etc.
>
> -STM
>
> > On Jun 3, 2024, at 4:05 PM, Leam Hall via Ale <ale at ale.org> wrote:
> >
> > For those of you who know, what's different about supporting Linux on supercomputers?
> >
> > Thanks!
> >
> > Leam
> >
> >
> > --
> > DevSecOps Engineer         (reuel.net/resume)
> > Scribe: The Domici War     (domiciwar.net)
> > General Ne'er-do-well      (github.com/LeamHall)
> > _______________________________________________
> > Ale mailing list
> > Ale at ale.org
> > https://mail.ale.org/mailman/listinfo/ale
> > See JOBS, ANNOUNCE and SCHOOLS lists at
> > http://mail.ale.org/mailman/listinfo
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> https://mail.ale.org/mailman/listinfo/ale
> See JOBS, ANNOUNCE and SCHOOLS lists at
> http://mail.ale.org/mailman/listinfo


More information about the Ale mailing list