[ale] Ram, sigstop, swap, pids, etc

Jim Kinney jim.kinney at gmail.com
Mon Feb 8 15:36:29 EST 2021


I've been looking at criu. My use case is HPC. 

On the performance issues, since Bob is not running as it gets paged to swap only a bit of Mary will slow for the page out time. Bob can suffer since Mary owns the hardware.

The thing that criu does I can't see a way to work with is the pid change on restore. 

In sge and variants, there's a shepherd process that manages the job process tree that's run on the hpc nodes. Criu would have to pause the shepherd process for each job which breaks the node daemon or pause the job which breaks the shepherd. 

Granted, I'm still in theory land with no practical testing yet. 

If only this hpc process actually worked with cgroups as is claimed....

On February 8, 2021 3:09:29 PM EST, Solomon Peachy via Ale <ale at ale.org> wrote:
>On Mon, Feb 08, 2021 at 02:13:55PM -0500, Jim Kinney via Ale wrote:
>> Will the kernel move Bob's process from ram to swap and back if it 
>> sits in STOP for a while (hours to days)? Unknown how long after Mary
>
>> starts that it eats all the RAM.
>
>It won't automatically move Bob's process to swap in one fell swoop; 
>instead as Mary's process needs more RAM, Bob's will get incrementally 
>paged out as it's not actively being accessed.
>
>And when Mary's is finished, once Bob's is allowed to resume, it will 
>get incremetnally paged back in as its components are needed.  (There's
>
>probably a tunable or other mechanism to "encourage" it to page back in
>
>more quickly, beyond running swapoff and forcing everything back..)
>
>Performance is going to suffer while the paging is happening.
>
>Perhaps a better option is the explicit checkpoint/restore mechanism
>using
>the criu tool.
>
> - Solomon
>-- 
>Solomon Peachy			      pizza at shaftnet dot org (email&xmpp)
>                                     @pizza:shaftnet dot org   (matrix)
>High Springs, FL                      speachy (freenode)

-- 
Computers amplify human error
Super computers are really cool
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.ale.org/pipermail/ale/attachments/20210208/c42cdb93/attachment.html>


More information about the Ale mailing list