-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
is there a way to count/estimate the number of cores-per-node on a system instance? #6091
Comments
I think the most complete solution might be to require both a cores and nodes limit for jobs, and if either is exceeded the job is rejected. This is what we ended up doing with the flux-core policy limits. This is mentioned in a note in flux-config-policy(5):
|
OK, that might be an OK start. Are you thinking the limit would be represented like:
or something different? and if a job might exceed a |
I was thinking you'd check both values and if either exceeded the configured limit then the job is rejected. If you can't tell how much of a resource is in the jobspec, then just skip that test. That way you are always checking at least one limit. |
I think the goal (at least for accounting) here is to be able to enforce a resource limit across all of a user's running jobs. If we go with the above, if a job will exceed either limit ( But maybe we could just add a |
The goal is to add up the resource usage of all of a users running jobs and prevent them from starting a new job if their resource usage would exceed some limit. The approach that Mark is suggesting sounds like it would work in most cases. In principle a user could exceed the limit by submitting some jobs that only specify nodes and others that only specify cores, but that would probably be rare in practice. Another possible place where these limits wouldn't work would be jobs that specify a number of cores that isn't an even multiple of the number of cores per node on node exclusive clusters, as it sounds like those jobs will effectively reserve more cores than the |
Very good points @ryanday36. |
No problem @grondo - I probably should've given more background as to why the limit needed to be there in the first place. So it sounds like we should keep separate counts of both This is mainly why I asked if there was a function to gather total node/core counts on a system with (actually, now that I think about it, if the above sounds okay, then I'm not sure keeping track of |
just to continue to be a pain here, if we're going to convert things, it probably makes more sense to convert to |
To answer your original question, you can get access to the resources in an instance by fetching |
Thanks for the advice here. After some time playing around with this I think I was able to get somewhere. I've opened a PR over in flux-framework/flux-accounting#469 that proposes adding some work during plugin initialization where it tries to at least estimate the cores-per-node on the system it's loaded on by fetching |
I'm trying to make some progress on flux-framework/flux-accounting#349 and, as a start, am just trying to see if I can reliably count or estimate the number of nodes used by a job.
If a user does not specify the number of nodes for their job, jobspec will report
0
fornnodes
. If we know how many cores-per-node there are on a system (particularly node-exclusive), however, we might be able to just count the number of cores reported by jobspec and convert this tonnodes
to use for accounting. This probably won't work for systems that do not have the same number of cores-per-node across all its nodes.Another potential option comes from an offline conversation with @ryanday36:
Although I don't believe this is the case in jobspec, perhaps there is somewhere else where we could query and store this information to be used.
The text was updated successfully, but these errors were encountered: