-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
plugin: add estimation of cores-per-node count on system during initialization #469
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some early comments on a first pass-through
if (flux_kvs_lookup_get_unpack (f, | ||
"{s{s[{s{s:s}}]}}", | ||
"execution", | ||
"R_lite", | ||
"children", | ||
"core", &core) < 0) { | ||
flux_log_error (h, "flux_kvs_lookup_unpack"); | ||
return -1; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally I would request that this synchronous get be replaced with an asynchronous flux_future_then(3)
and the parsing of R handled in a callback. However, it seems like the result is required for validation of jobs, and you probably don't want to let some jobs through erroneously while waiting for resource.R
, so perhaps this is actually the right solution. The job manager will block here while waiting for R when mf_priority.so
is loaded, but it should be a very short time, and while this is occurring job management will also pause. Most of the time this will occur during job manager module load, which has other synchronous work anyway.
I'd at least suggest a comment here describing why a synchronous get is used in this case.
@garlick: any other thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The FLUX_KVS_WATCH flag should be dropped. That's only used when you want to receive a response for every change to the key, and here the future is being destroyed after the first response.
IRL, we only load mf_priority.so
in the system instance so resource.R
should exist already and the KVS lookup should be fast. FLUX_KVS_WAITCREATE is probably is needed in test where the plugin is loaded in a test instance and resource.R
is dynamically discovered though.
Yes a comment would be good since synchronous activities always raise eyebrows.
"{s{s[{s{s:s}}]}}", | ||
"execution", | ||
"R_lite", | ||
"children", | ||
"core", &core) < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
R may have multiple entries in the R_lite array. To handle possible heterogeneity, you could iterate each entry and use the maximum number of cores found.
Probably ok if this is just a first cut, though. If so, I'd put a comment stating that "equal number of cores on all nodes in R is assumed, so we only look at the first entry" or simimlar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for mentioning this, I was unaware that it could have multiple entries. If it has multiple entries, would it look like this?
{
"version": 1,
"execution": {
"R_lite": [
{
"rank": "19-22",
"children": {
"core": "0-47",
}
},
{
"rank": "23-29",
"children": {
"core": "0-15",
}
}
]
}
}
src/plugins/mf_priority.cpp
Outdated
// calculate number of cores-per-node on system | ||
ncores_per_node = calculate_range (core); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of parsing the idset with custom code here, you can use libflux-idset.so exported by flux-core.
See idset_decode(3), then just use idset_count(3) to get the number of cores.
flux jobtap query mf_priority.so > query.json && | ||
test_debug "jq -S . <query.json" && | ||
cat query.json && | ||
jq -e ".ncores_per_node == 2" <query.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this check work on all systems? If you need to check the actual cores/node you could use nproc
or possibly flux resource R --include=0 | flux R decode --count=core
(sorry if I missed someplace where you've guaranteed 2 cores per node)
0c4af54
to
5747337
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #469 +/- ##
==========================================
- Coverage 83.30% 83.08% -0.22%
==========================================
Files 23 23
Lines 1557 1573 +16
==========================================
+ Hits 1297 1307 +10
- Misses 260 266 +6
|
Thank you both for the first pass! I've force-pushed up some changes to drop I've also added some comments in code where suggested. |
5747337
to
409649f
Compare
Problem: The paths for FLUX_IDSET_LIBS/FLUX_IDSET_CFLAGS are not listed in the output of ./configure. There is also slight misformatting in the output of the FLUX_CORE_CFLAGS label. Switch the period and colon at the end of the FLUX_CORE_CFLAGS line. Add FLUX_IDSET_LIBS and FLUX_IDSET_CFLAGS to the output of ./configure.
Problem: The priority plugin does not know about basic system information it will need in order to enforce a max-cores limit per association, such as the number of cores on a node. Add an estimation of a cores-per-node count estimate during the initialization of the priority plugin by fetching resource.R from the KVS. Store this estimate in a global variable in the plugin. Add this estimate to the list of information returned in the plugin.query callback.
Problem: flux-accounting has no tests for estimating the cores-per-node count by fetching resource.R from the KVS and then querying it from the plugin. Add some basic tests.
409649f
to
2dec9cc
Compare
Problem
The priority plugin does not know about basic system information it will need in order to enforce a max-cores limit per association, such as the number of cores on a node.
This PR adds an estimation of a cores-per-node count estimate during the initialization of the priority plugin by fetching
resource.R
from the KVS. It stores this estimate in a global variable in the plugin. The plan is to use this count when calculating the number of cores used by a job when onlynnodes
are specified. I think that this count might not be exactly right for systems where the core count per-node might be different throughout the system, but I figure this could at least be a good estimate and a start for tracking and enforcing resource limit per-association across all of their running jobs (see conversation in flux-framework/flux-core#6091).This count is also added to the list of information returned in the callback for
plugin.query
.