[ale] Moving last win box to linux
    jc.lightner at comcast.net 
    jc.lightner at comcast.net
       
    Mon Sep 15 13:07:54 EDT 2025
    
    
  
This conversation made me think of an issue I saw once with a flash disk on a card used in out main development server.   It brought the entire server to its knees because of overheating (due to overuse).   It turned out the card had been sending log messages telling us it was overheating but we’d not been looking for them.  It supposedly had double redundancy built in so far as the storage memory flash chips were concerned.  However, it was the single point of failure ROM that bit the dust due to the heat.   
Research indicated most folks bought those cards in pairs but my employer hadn’t wanted that expense so we lost data due to that issue.   It was also only suggested to be used for ephemeral database storage to increase performance but not risk actual database corruption should a failure like that occur.   We did that AND also moved the card to a slot in between two vacant slots to help with air flow and prevent heat from nearby cards possibly exacerbating the problem.
I’m wondering if some of the Nvidia issues that people say caused crashes were heat related.   I don’t have any experience using them myself but gather they can produce a fair amount of heat.
 
From: Ale <ale-bounces at ale.org> On Behalf Of Jim Kinney via Ale
Sent: Monday, September 15, 2025 11:05 AM
To: Atlanta Linux Enthusiasts <ale at ale.org>
Cc: Jim Kinney <jim.kinney at gmail.com>
Subject: Re: [ale] Moving last win box to linux
 
Many consumer graphics cards purchase a graphics chip from a big maker like NVidia and make their own board for it. So unless the chip and board are made by NVidia, there's culpability to spread around.
 
Hardware does fail. But swearing off of all Ford cars because one was a lemon is a bit extreme. 
 
Running large clusters with hundreds to thousands of NVidia GPUs shows the high end devices to be rather sturdy as long at heat is removed as required. 
 
The consumer ones I've personally used have all been found to be heat problems or a driver bug of a desktop environment bug.
 
-- 
James P. Kinney III
Every time you stop a school, you will have to build a jail. What you gain at one end you lose at the other. It's like feeding a dog on his own tail. It won't fatten the dog.
- Speech 11/23/1900 Mark Twain
http://heretothereideas.blogspot.com/
 
On Mon, Sep 15, 2025, 10:47 AM DJPfulio--- via Ale <ale at ale.org <mailto:ale at ale.org> > wrote:
On 9/15/25 10:06, James Taylor via Ale wrote:
> "All I can tell you about the preceding is I wouldn't be caught dead 
> using an nVidia video card or GPU or whatever it's called with
> Linux. That's playing Russian Roulette with intermittent freezes and 
> spontaneous reboots. This doesn't happen to most people, but it's a 
> well known problem and it happened to me. And when it does, it takes 
> hours and hours to troubleshoot because nobody suspects a bad video 
> card would take down a whole machine."
> 
> That's odd. I've been sourcing nothing but nvidia cards for my
> linux  boxes for over a decade, with no issues other than usual self-
> inflicted ones.
> 
> I've been running exclusively openSUSE, so maybe better support on
> that distro?
> 
I used nVidia for a long time.  Then had one of their cards fail and the failure looked like the CPU+Motherboard had been destroyed.  It was just the GPU, but it left me a little shocked that something like that which should have just caused the MB to do the "Bad GPU" beeps, didn't.
Bought a low-end replacement nVidia, just newer.  Used it for about 5 yrs, then after an OS upgrade, it stopped working except at 768p resolution.  Previously, it was working at 1920x1200.  Seems nvidia decided to drop support and I was stuck using the F/LOSS drivers, which would have been fine, if my desired resolution worked.  That old nVidia GPU will probably work fine today with the resolution I want.
Anyway, when the proprietary drivers weren't rebuilt for the new kernel (which was 2+ yrs old FWIW), I foolishly got another low-end nVidia GPU.  At the time, nVidia made loading their drivers a hassle, but eventually I got it working.  This was at the height of the "can't pay us enough to make GPUs" period.  Since it was a relatively cheap DDR5 GT 1030 card, nvidia has little reason to do any support for it. Upgraded from a Core i5 CPU to a Ryzen 5 and took the GT 1030 with me.  It was never as stable as I expected - not terrible, just I don't consider any GPU related issue to be good.  A few years later, I moved to a newer Ryzen 5600G where the integrated GPU is faster than the GT 1030. It used 50% less power (CPU+GPU) and performed better.  Pulled the nVidia and haven't had any stability issues since then.  2 yrs later, I replaced an older, slower, Intel Pentium G system with the same Ryzen 5600G and MB I was so pleased with the other box.  Basically, have 2 almost identical deskt
 ops that can run 100% of my VM and container workloads on either system alone. Only the NIC setup and connected storage is slightly different.
Sure, sometimes I'd like to have a little more powerful AMD GPU, but not enough to spend $80 for a used version and certainly not enough to spend more for something new.  My needs are basic graphics, sometimes, with some media playback.  Zero gaming.  I played with using the iGPU for encoding video in hardware, but the results weren't very good even when I relaxed the file size to be 10x larger than the source.  Instead, to get smaller files with no noticeable artifacts, I use handbrake.  If I'm willing to have a few artifacts (watch once stuff), then I'll use ffmpeg software with h.264 video encoding, since that will often reduce the file size 50%, though not always.  My playback devices work best with h.264 videos and specific h.265 settings,
They hate google vp9 videos.  The audio and video get farther and farther out of sync with vp9 until within less than a minute, it is unwatchable because it is already 10 seconds diverged.
Anyway, I came up with a rule for GPUs.  Get the AMD GPU unless you are spending over $1200, then you want the nVidia GPUs with all the problems those bring. AMD GPUs, in the mid-cost and low-end, remove so many hassles since their drivers are part of the kernel now, that it just doesn't make sense to bother with nvidia anymore outside high-end GPUs.  Clearly, if you want to play with LLM stuff, you'll need a $1200-$2000 nvidia GPU anyway.
And for laptops, I choose Intel-based GPUs with iGPUs built-in.  Those have been good enough and the power management with Intel is predictable.  Laptops usually become useless after 3-5 yrs of use for me, so the idea of having 10 yrs of use from an older GPU on a laptop has never been an issue. Plus, my last 2 laptops were less than $320 and just 1-2 generations behind the midrange CPU shipped "new" at the time.  I can't see spending even $500 on a laptop, unless it wasn't my money.  The way I see it, if I need 1 $1200 laptop every decade or (3) $250 (or less) laptops every decade, at least with the cheaper versions, I get faster performance, more RAM, faster wifi/networking, larger SSD, newer batteries every few years, for 75% of the cost.
Of course, others will have different priorities, which is fine.  I'm rockin' a Dell 13.3inch Latitude with an 11th Gen Core i5 now. Great laptop for less than $260 refurbed directly from Dell, dude. No AMD/nvidia GPU in that laptop and I don't recall the last time I used wifi with it, but it has Intel wifi chips.
YMMV, of course.
_______________________________________________
Ale mailing list
Ale at ale.org <mailto:Ale at ale.org> 
https://mail.ale.org/mailman/listinfo/ale
See JOBS, ANNOUNCE and SCHOOLS lists at
http://mail.ale.org/mailman/listinfo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.ale.org/pipermail/ale/attachments/20250915/15f433f8/attachment.htm>
    
    
More information about the Ale
mailing list