[ale] Ram, sigstop, swap, pids, etc

Jim Kinney jim.kinney at gmail.com
Mon Feb 8 22:40:30 EST 2021


That's the same test I'll be doing tomorrow on a real node. I have a small numpy app that generates a 10,000x10,000 random matrix then inverts it, does some other easy but parallelized math. It will eat as many cpu cores as I give it. 10k slurps a boatload of ram.

So my plan is to find out which pid to send that STOP to, the shepherd or the actual job monitored by the shepherd. I'm betting on the job.

If this works reliably, I can add pre and post scripts to stop then continue other jobs.

Or I just hork the master up and tomorrow  blows up in my face. Good times!

Oh, ram and swap. I'm letting the kernel deal with that. My test is basically two of the numpy jobs so the paused one will have to get at least partially swapped out and back.

On February 8, 2021 7:43:29 PM EST, Steve Litt via Ale <ale at ale.org> wrote:
>On Mon, 08 Feb 2021 14:13:55 -0500
>Jim Kinney via Ale <ale at ale.org> wrote:
>
>
>> I want to send Bob's job a SIGSTOP and let Mary's job run to
>> completion. Then send a SIGCONT and Bob is back running.
>
>I just created a 2.7GB text file, called junk.jnk, that has 293 million
>lines. I ran gkrellm and then ran the following:
>
>sort junk.jnk
>kill -SIGSTOP 18665
>
>CPU usage immediately dropped from 66% to about 2%. A few seconds later
>I did:
>
>kill -SIGCONT 18665
>
>CPU usage went back up to 66%. So based on that, it seems like the
>STOP/CONT combination works well. I think Bob's job would eventually
>swap out. If you REALLY want to swap it out quickly, you could write a
>C program that does nothing but malloc() and copy bogus bytes to the
>newly allocated pointers (because without the bogus bytes, you don't
>really consume RAM). Have it malloc() about the same amount of RAM as
>you expect Mary's process will need. Then free() it all and exit. I
>suspect this quick program would cause Bob's stopped program to swap,
>leaving the path clear for Mary's program to run.
>
>SteveT
>
>Steve Litt 
>Autumn 2020 featured book: Thriving in Tough Times
>http://www.troubleshooters.com/thrive
>_______________________________________________
>Ale mailing list
>Ale at ale.org
>https://mail.ale.org/mailman/listinfo/ale
>See JOBS, ANNOUNCE and SCHOOLS lists at
>http://mail.ale.org/mailman/listinfo

-- 
Computers amplify human error
Super computers are really cool
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.ale.org/pipermail/ale/attachments/20210208/16b5287e/attachment.html>


More information about the Ale mailing list