[ale] HPC replies
Vernard Martin
vernard at gmail.com
Mon Dec 30 17:46:23 EST 2024
High Performance Computing has been my predominant specialization area
since 1992 and I am currently employed in this field. Like most things in
IT, there are various levels of involvement. The key thing to understand
about it is that "high performance" means many things to many folks and
also evolves over time. It is the techniques used more than the results.
There was a point where High Performance was 4 Pentium Pro CPUs in a
double-sized tower case with 64GB of RAm connected by quad-1Gbit NICs (the
first Beowulf clusters). Similarly, a single system with 32 CPUs and 256GB
of RAM in that area was considered HPC.
Supercomputing 2024 was held in Atlanta the week of November 18th and the
entire field was on display then.
in general, these days, really big HPC resources tend to be clusters of
individual servers connected by either high-speed and/or low-latency
interconnects. 10Gb ethernet at the low end and up to 100Gb ethernet at the
high end with usually multiple connections for either dedicated traffic or
just to get redundancy. You also tend to have a lot of Infiniband and more
bespoke network protocols for doing low-latency networking so that you can
do NUMA memory across servers using some sort of message-passing-style
interface (think RPC on steroids). You may optionally have some sort of
dedicated parallel filesystem so that you can get a single namespace across
all your servers and hopefully, enough bandwidth support to have all those
servers talking to the storage without slowing down very much. Finally, you
have some sort of orchestration/scheduling system on top of that so that
the users don't have to think too hard about how to get their jobs to run
don't hose resources while they contend with everybody else doing the same.
Adjacent to all that is the software needed to monitor and maintain that
lovely HPC mess. :)
Folks tend to specialize in a specific area rather than be insane enough to
touch all of it. So you get a lot of folks that are storage side experts
and can blather on and on about Lustre, ZFS, GFS, GPFS, Cepth, and other
stuff. You also tend to see a lot of folks that specialize in scheduling
systems such as Slurm PBS Pro, LSF, and if you are really masochistic, Grid
Engine. There is also a very large area of folks that spend time
optimizing applications to run on different architectures such as specific
CPUs and GPUs. And finally, the folks that are observability maniacs that
want to monitor and visualize everything about the environment because
honestly, you can't identify why you are losing the "P' in "HPC" if you
aren't doing that.
I live and breathe this stuff and there is another of us as well (looks
around for the soon-to-be-retired Jim Kinney) as well as the aforementioned
Brian M.
Let me know if you have any questions.
V
On Sun, Dec 29, 2024 at 4:28 PM Leam Hall via Ale <ale at ale.org> wrote:
> Hey all, I just wanted to follow up on this.
>
> I just finished Coursera's short class on Introductory HPC. Learned a
> little Slurm and got to play with it on the course interface. Then,
> naturally, I found out how to install Slurm locally to play with the
> commands. Cool...
>
> Leam
>
>
> On 12/10/24 07:51, Brian MacLeod via Ale wrote:
> > It may be economically inefficient, but if it seems like something you
> > might like or love to do, then you do inefficient things. That's at the
> > very least what a hobby is, if not a specialization in certain minutiae
> in
> > common areas.
> >
> > I've found that despite my interest and accessibility in HPC (very
> > economically efficient path, I found it wasn't for me in the end. But
> the
> > experience has definitely informed me for dealing with so many filesystem
> > issues that that has become what I am known for.
> >
> > I know plenty of people who've entered it in less efficient means and
> don't
> > make near as much money as in their studied fields, but they love it and
> > feel satisfied helping others use these technologies.
> >
> >
> > bnm
> >
> >
> >
> >
> > On Mon, Dec 9, 2024 at 9:32 PM Russell L. Carter via Ale <ale at ale.org>
> > wrote:
> >
> >> Greetings!
> >>
> >> I am temporaily, regretably, locate4d in Douglasville
> >> GA. Yet I am wondering about the discussion here about
> >> uh, erm, "HPC computing".
> >>
> >> All the comments so far are true; it's a mess, always
> >> has been.
> >>
> >> But there is a reason for HPC computing: various algorithms
> >> need memory locality to work efficiently. People here surely
> >> know what I mean: your nonlinear PDE solver (a galaxy here)
> >> likely needs locally efficient memory accesses to work
> >> "well enough" to get your PhD and then a low paid (relatively
> >> speaking) job if you get tenure through the publications.
> >>
> >> This all means that unless you are already in the PhD/HPC
> >> ecosystem, it's quite economically inefficient to try to
> >> be employed there.
> >>
> >> Russell L. Carter
> >>
> >>
> >> On 12/9/24 8:32 PM, Dev Null via Ale wrote:
> >>> Dec 9, 2024 17:40:27 matthew.brown--- via Ale <ale at ale.org>:
> >>>
> >>>> _______________________________________________
> >>>> Ale mailing list
> >>>> Ale at ale.org
> >>>> https://mail.ale.org/mailman/listinfo/ale
> >>>> See JOBS, ANNOUNCE and SCHOOLS lists at
> >>>> http://mail.ale.org/mailman/listinfo
> >>>
> >>> Well said! I completely agree.
> >>>
> >>
> >> _______________________________________________
> >> Ale mailing list
> >> Ale at ale.org
> >> https://mail.ale.org/mailman/listinfo/ale
> >> See JOBS, ANNOUNCE and SCHOOLS lists at
> >> http://mail.ale.org/mailman/listinfo
> >>
> >
> >
> > _______________________________________________
> > Ale mailing list
> > Ale at ale.org
> > https://mail.ale.org/mailman/listinfo/ale
> > See JOBS, ANNOUNCE and SCHOOLS lists at
> > http://mail.ale.org/mailman/listinfo
>
> --
> Linux Software Engineer (reuel.net/career)
> Scribe: The Domici War (domiciwar.net)
> Coding Ne'er-do-well (github.com/LeamHall)
>
> Between "can" and "can't" is a gap of "I don't know", a place of
> discovery. For the passionate, much of "can't" falls into "yet". -- lh
>
> Practice allows options and foresight. -- lh
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> https://mail.ale.org/mailman/listinfo/ale
> See JOBS, ANNOUNCE and SCHOOLS lists at
> http://mail.ale.org/mailman/listinfo
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.ale.org/pipermail/ale/attachments/20241230/2609deb6/attachment.htm>
More information about the Ale
mailing list