[ale] shared research server help
DJ-Pfulio
DJPfulio at jdpfu.com
Thu Oct 5 08:26:31 EDT 2017
I use taskspooler to manage computer batch workloads, but don't know how
to force other users to use it.
https://www.linux.com/news/queuing-tasks-batch-execution-task-spooler
On 10/05/2017 07:52 AM, Jim Kinney wrote:
> Back to the original issue:
>
> A tool like torque or slurm is really your best solution to intensive
> shared resources. It prevents 2 big jobs from eating the same machine
> and can also encourage users to code better to manage resources better
> so they can run more jobs.
>
> I have the same problem. One heavy gpu machine (4 tesla P100) only has
> 64 G ram. Student tried to load in 200+G of data into ram.
>
> A few crashes later he can run 2 jobs at once, each only eats 30G ram
> and one p100.
>
> On October 4, 2017 6:32:32 PM EDT, Todor Fassl <fassl.tod at gmail.com> wrote:
>
> I manage a group of research servers for grad students at a university.
> The grad students use these machines to do the research for their Ph.D
> theses. The problem is that they pretty regularly kill off each other's
> programs by using up all the ram. Most of the machines have 256G of ram.
> One kid uses 200Gb and another 100Gb and one or the other, often both,
> die. Sometimes they bringthe machines down by hogging the cpu or using
> up all the ram. Well, the machines never crash but they might as well be
> down.
>
> We really, really don't want to force them to use a scheduling system
> like slurm. They are just learnng and they might run the same piece of
> code 20 times in an hour.
>
> Is there a way to set a limit on the amount of ram all of a user's
> processes can use? If so, we were thinking of setting it at 50% of the
> on-board ram. Then it would take 3 students together to trash a machine.
> It might still happen but it would be a lot more infrequent.
>
> Any other suggestions? Anything at all? Just keep in mind that we really
> want to keep it easy for the students to play around.
>
>
> --
> Sent from my Android device with K-9 Mail. All tyopes are thumb related
> and reflect authenticity.
>
>
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> http://mail.ale.org/mailman/listinfo/ale
> See JOBS, ANNOUNCE and SCHOOLS lists at
> http://mail.ale.org/mailman/listinfo
>
--
Got Linux? Used on smartphones, tablets, desktop computers, media
centers, and servers by kids, Moms, Dads, grandparents and IT
professionals.
More information about the Ale
mailing list