Define what is the target number of jobs that can be run on the different clusters #50

satyaog · 2021-08-17T19:04:04Z

In the "Data parallelism" section, it is implied that hundreds of jobs can be run. This is contradictory to what is explained above.

Originally posted by @tesfaldet in #46 (comment)

How do you define "larger experiments"? How many jobs?

Originally posted by @tesfaldet in #46 (comment)

btravouillon · 2021-09-10T01:49:31Z

For CC we might refer to their documentation for details.
Should we rephrase with "...for larger experiments that don't fit in Mila cluster..." ?

For Mila I guess this recommendation was relevant before AO2, when the number of GPUs was around 170. (a quick search shows this number of 5 was introduced in a pre-pandemic world, April 2019, https://github.com/mila-iqia/AI-HPC-Docs/commit/667d8097913821112fecd2aab24ed4154f069743). We have 500+ GPUs in the cluster now. Moreover, this is the job of the batch scheduler to distribute the jobs based on defined rules and QoS, not to the users to limit themselves.

fosterrath-mila · 2021-10-06T18:02:50Z

This should be in the guidelines section.

satyaog changed the title ~~Define what is the target number of jobs that can be run on the Mila cluster~~ Define what is the target number of jobs that can be run on the different clusters Aug 17, 2021

btravouillon self-assigned this Sep 10, 2021

fosterrath-mila added the priority:medium label Oct 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define what is the target number of jobs that can be run on the different clusters #50

Define what is the target number of jobs that can be run on the different clusters #50

satyaog commented Aug 17, 2021 •

edited

Loading

btravouillon commented Sep 10, 2021

fosterrath-mila commented Oct 6, 2021

Define what is the target number of jobs that can be run on the different clusters #50

Define what is the target number of jobs that can be run on the different clusters #50

Comments

satyaog commented Aug 17, 2021 • edited Loading

btravouillon commented Sep 10, 2021

fosterrath-mila commented Oct 6, 2021

satyaog commented Aug 17, 2021 •

edited

Loading