[ale] Ram, sigstop, swap, pids, etc

Chuck Payne terrorpup at gmail.com
Mon Feb 8 15:47:01 EST 2021


Is this where nice would come into play? Or using CPULimit on a job?

On Mon, Feb 8, 2021, 3:36 PM Jim Kinney via Ale <ale at ale.org> wrote:

> I've been looking at criu. My use case is HPC.
>
> On the performance issues, since Bob is not running as it gets paged to
> swap only a bit of Mary will slow for the page out time. Bob can suffer
> since Mary owns the hardware.
>
> The thing that criu does I can't see a way to work with is the pid change
> on restore.
>
> In sge and variants, there's a shepherd process that manages the job
> process tree that's run on the hpc nodes. Criu would have to pause the
> shepherd process for each job which breaks the node daemon or pause the job
> which breaks the shepherd.
>
> Granted, I'm still in theory land with no practical testing yet.
>
> If only this hpc process actually worked with cgroups as is claimed....
>
> On February 8, 2021 3:09:29 PM EST, Solomon Peachy via Ale <ale at ale.org>
> wrote:
>>
>> On Mon, Feb 08, 2021 at 02:13:55PM -0500, Jim Kinney via Ale wrote:
>>
>>> Will the kernel move Bob's process from ram to swap and back if it
>>> sits in STOP for a while (hours to days)? Unknown how long after Mary
>>> starts that it eats all the RAM.
>>>
>>
>> It won't automatically move Bob's process to swap in one fell swoop;
>> instead as Mary's process needs more RAM, Bob's will get incrementally
>> paged out as it's not actively being accessed.
>>
>> And when Mary's is finished, once Bob's is allowed to resume, it will
>> get incremetnally paged back in as its components are needed.  (There's
>> probably a tunable or other mechanism to "encourage" it to page back in
>> more quickly, beyond running swapoff and forcing everything back..)
>>
>> Performance is going to suffer while the paging is happening.
>>
>> Perhaps a better option is the explicit checkpoint/restore mechanism using
>> the criu tool.
>>
>>  - Solomon
>>
>>
> --
> Computers amplify human error
> Super computers are really cool
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> https://mail.ale.org/mailman/listinfo/ale
> See JOBS, ANNOUNCE and SCHOOLS lists at
> http://mail.ale.org/mailman/listinfo
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.ale.org/pipermail/ale/attachments/20210208/8e446930/attachment.html>


More information about the Ale mailing list