[ale] HPC replies
Russell L. Carter
rcarter at pinyon.org
Mon Dec 30 18:50:36 EST 2024
Yes, exactly so. This is what I meant by economically inefficient.
You can be a specialist and employed at say a National Lab. of some
sort located *not* in an expensive housing city (ie, not NASA Ames)
and enjoy the work immensely. I know I did: if you check out the
original published NAS Parallel Benchmarks you will find my name right
there, and that job was the grandest adventure of my life.
But!! I don't know anything about SLURM because my own household/small
biz clusters just can't justify the overhead for *all the other stuff*
that is required to make such a system viable with a low headcount
support crew of ahem uno ein un yep *1*.
I mean it's all cool stuff and if you believe it's for you (it was for
me at the time) go for it but have no illusions that anybody on say
Hacker News will understand such a thing. (That could be a good reason
to do it anyway, yep I get it)
Basically the whole cloud ecosystem which is heating up the planet so
successfully is predicated on *waste*. Maybe somebody has calculated
how much of the cloud is simply tests. You need tests, to be sure, but
real work is always jobs of days, weeks, months. What the cloud don't
care about is efficiency, in the HPC context.
Good luck and all the best,
Russell
Did I just top post... again? I mean &*^@(*&^#$ Firefox for getting
rid of the emacs editing ability.
On 12/30/24 5:46 PM, Vernard Martin via Ale wrote:
> High Performance Computing has been my predominant specialization area
> since 1992 and I am currently employed in this field. Like most things
> in IT, there are various levels of involvement. The key thing to
> understand about it is that "high performance" means many things to many
> folks and also evolves over time. It is the techniques used more than
> the results. There was a point where High Performance was 4 Pentium Pro
> CPUs in a double-sized tower case with 64GB of RAm connected by
> quad-1Gbit NICs (the first Beowulf clusters). Similarly, a single system
> with 32 CPUs and 256GB of RAM in that area was considered HPC.
>
> Supercomputing 2024 was held in Atlanta the week of November 18th and
> the entire field was on display then.
>
> in general, these days, really big HPC resources tend to be clusters of
> individual servers connected by either high-speed and/or low-latency
> interconnects. 10Gb ethernet at the low end and up to 100Gb ethernet at
> the high end with usually multiple connections for either dedicated
> traffic or just to get redundancy. You also tend to have a lot of
> Infiniband and more bespoke network protocols for doing low-latency
> networking so that you can do NUMA memory across servers using some sort
> of message-passing-style interface (think RPC on steroids). You may
> optionally have some sort of dedicated parallel filesystem so that you
> can get a single namespace across all your servers and hopefully, enough
> bandwidth support to have all those servers talking to the storage
> without slowing down very much. Finally, you have some sort of
> orchestration/scheduling system on top of that so that the users don't
> have to think too hard about how to get their jobs to run don't hose
> resources while they contend with everybody else doing the same.
>
> Adjacent to all that is the software needed to monitor and maintain that
> lovely HPC mess. :)
>
> Folks tend to specialize in a specific area rather than be insane enough
> to touch all of it. So you get a lot of folks that are storage side
> experts and can blather on and on about Lustre, ZFS, GFS, GPFS, Cepth,
> and other stuff. You also tend to see a lot of folks that specialize in
> scheduling systems such as Slurm PBS Pro, LSF, and if you are really
> masochistic, Grid Engine. There is also a very large area of folks that
> spend time optimizing applications to run on different architectures
> such as specific CPUs and GPUs. And finally, the folks that
> are observability maniacs that want to monitor and visualize everything
> about the environment because honestly, you can't identify why you are
> losing the "P' in "HPC" if you aren't doing that.
>
> I live and breathe this stuff and there is another of us as well (looks
> around for the soon-to-be-retired Jim Kinney) as well as the
> aforementioned Brian M.
>
> Let me know if you have any questions.
>
> V
>
> On Sun, Dec 29, 2024 at 4:28 PM Leam Hall via Ale <ale at ale.org
> <mailto:ale at ale.org>> wrote:
>
> Hey all, I just wanted to follow up on this.
>
> I just finished Coursera's short class on Introductory HPC. Learned
> a little Slurm and got to play with it on the course interface.
> Then, naturally, I found out how to install Slurm locally to play
> with the commands. Cool...
>
> Leam
>
>
> On 12/10/24 07:51, Brian MacLeod via Ale wrote:
> > It may be economically inefficient, but if it seems like
> something you
> > might like or love to do, then you do inefficient things. That's
> at the
> > very least what a hobby is, if not a specialization in certain
> minutiae in
> > common areas.
> >
> > I've found that despite my interest and accessibility in HPC (very
> > economically efficient path, I found it wasn't for me in the
> end. But the
> > experience has definitely informed me for dealing with so many
> filesystem
> > issues that that has become what I am known for.
> >
> > I know plenty of people who've entered it in less efficient means
> and don't
> > make near as much money as in their studied fields, but they love
> it and
> > feel satisfied helping others use these technologies.
> >
> >
> > bnm
> >
> >
> >
> >
> > On Mon, Dec 9, 2024 at 9:32 PM Russell L. Carter via Ale
> <ale at ale.org <mailto:ale at ale.org>>
> > wrote:
> >
> >> Greetings!
> >>
> >> I am temporaily, regretably, locate4d in Douglasville
> >> GA. Yet I am wondering about the discussion here about
> >> uh, erm, "HPC computing".
> >>
> >> All the comments so far are true; it's a mess, always
> >> has been.
> >>
> >> But there is a reason for HPC computing: various algorithms
> >> need memory locality to work efficiently. People here surely
> >> know what I mean: your nonlinear PDE solver (a galaxy here)
> >> likely needs locally efficient memory accesses to work
> >> "well enough" to get your PhD and then a low paid (relatively
> >> speaking) job if you get tenure through the publications.
> >>
> >> This all means that unless you are already in the PhD/HPC
> >> ecosystem, it's quite economically inefficient to try to
> >> be employed there.
> >>
> >> Russell L. Carter
> >>
> >>
> >> On 12/9/24 8:32 PM, Dev Null via Ale wrote:
> >>> Dec 9, 2024 17:40:27 matthew.brown--- via Ale <ale at ale.org
> <mailto:ale at ale.org>>:
> >>>
> >>>> _______________________________________________
> >>>> Ale mailing list
> >>>> Ale at ale.org <mailto:Ale at ale.org>
> >>>> https://mail.ale.org/mailman/listinfo/ale <https://
> mail.ale.org/mailman/listinfo/ale>
> >>>> See JOBS, ANNOUNCE and SCHOOLS lists at
> >>>> http://mail.ale.org/mailman/listinfo <http://mail.ale.org/
> mailman/listinfo>
> >>>
> >>> Well said! I completely agree.
> >>>
> >>
> >> _______________________________________________
> >> Ale mailing list
> >> Ale at ale.org <mailto:Ale at ale.org>
> >> https://mail.ale.org/mailman/listinfo/ale <https://mail.ale.org/
> mailman/listinfo/ale>
> >> See JOBS, ANNOUNCE and SCHOOLS lists at
> >> http://mail.ale.org/mailman/listinfo <http://mail.ale.org/
> mailman/listinfo>
> >>
> >
> >
> > _______________________________________________
> > Ale mailing list
> > Ale at ale.org <mailto:Ale at ale.org>
> > https://mail.ale.org/mailman/listinfo/ale <https://mail.ale.org/
> mailman/listinfo/ale>
> > See JOBS, ANNOUNCE and SCHOOLS lists at
> > http://mail.ale.org/mailman/listinfo <http://mail.ale.org/
> mailman/listinfo>
>
> --
> Linux Software Engineer (reuel.net/career <http://reuel.net/career>)
> Scribe: The Domici War (domiciwar.net <http://domiciwar.net>)
> Coding Ne'er-do-well (github.com/LeamHall <http://github.com/
> LeamHall>)
>
> Between "can" and "can't" is a gap of "I don't know", a place of
> discovery. For the passionate, much of "can't" falls into "yet". -- lh
>
> Practice allows options and foresight. -- lh
> _______________________________________________
> Ale mailing list
> Ale at ale.org <mailto:Ale at ale.org>
> https://mail.ale.org/mailman/listinfo/ale <https://mail.ale.org/
> mailman/listinfo/ale>
> See JOBS, ANNOUNCE and SCHOOLS lists at
> http://mail.ale.org/mailman/listinfo <http://mail.ale.org/mailman/
> listinfo>
>
>
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> https://mail.ale.org/mailman/listinfo/ale
> See JOBS, ANNOUNCE and SCHOOLS lists at
> http://mail.ale.org/mailman/listinfo
More information about the Ale
mailing list