Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plugin: add estimation of cores-per-node count on system during initialization #469

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

cmoussa1
Copy link
Member

Problem

The priority plugin does not know about basic system information it will need in order to enforce a max-cores limit per association, such as the number of cores on a node.


This PR adds an estimation of a cores-per-node count estimate during the initialization of the priority plugin by fetching resource.R from the KVS. It stores this estimate in a global variable in the plugin. The plan is to use this count when calculating the number of cores used by a job when only nnodes are specified. I think that this count might not be exactly right for systems where the core count per-node might be different throughout the system, but I figure this could at least be a good estimate and a start for tracking and enforcing resource limit per-association across all of their running jobs (see conversation in flux-framework/flux-core#6091).

This count is also added to the list of information returned in the callback for plugin.query.

Copy link
Contributor

@grondo grondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some early comments on a first pass-through

Comment on lines +1161 to +1220
if (flux_kvs_lookup_get_unpack (f,
"{s{s[{s{s:s}}]}}",
"execution",
"R_lite",
"children",
"core", &core) < 0) {
flux_log_error (h, "flux_kvs_lookup_unpack");
return -1;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally I would request that this synchronous get be replaced with an asynchronous flux_future_then(3) and the parsing of R handled in a callback. However, it seems like the result is required for validation of jobs, and you probably don't want to let some jobs through erroneously while waiting for resource.R, so perhaps this is actually the right solution. The job manager will block here while waiting for R when mf_priority.so is loaded, but it should be a very short time, and while this is occurring job management will also pause. Most of the time this will occur during job manager module load, which has other synchronous work anyway.

I'd at least suggest a comment here describing why a synchronous get is used in this case.

@garlick: any other thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FLUX_KVS_WATCH flag should be dropped. That's only used when you want to receive a response for every change to the key, and here the future is being destroyed after the first response.

IRL, we only load mf_priority.so in the system instance so resource.R should exist already and the KVS lookup should be fast. FLUX_KVS_WAITCREATE is probably is needed in test where the plugin is loaded in a test instance and resource.R is dynamically discovered though.

Yes a comment would be good since synchronous activities always raise eyebrows.

Comment on lines +1162 to +1217
"{s{s[{s{s:s}}]}}",
"execution",
"R_lite",
"children",
"core", &core) < 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R may have multiple entries in the R_lite array. To handle possible heterogeneity, you could iterate each entry and use the maximum number of cores found.

Probably ok if this is just a first cut, though. If so, I'd put a comment stating that "equal number of cores on all nodes in R is assumed, so we only look at the first entry" or simimlar.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for mentioning this, I was unaware that it could have multiple entries. If it has multiple entries, would it look like this?

{
  "version": 1,
  "execution": {
    "R_lite": [
      {
        "rank": "19-22",
        "children": {
          "core": "0-47",
        }
      },
      {
        "rank": "23-29",
        "children": {
          "core": "0-15",
        }
      }
    ]
  }
}

Comment on lines 1178 to 1179
// calculate number of cores-per-node on system
ncores_per_node = calculate_range (core);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of parsing the idset with custom code here, you can use libflux-idset.so exported by flux-core.
See idset_decode(3), then just use idset_count(3) to get the number of cores.

flux jobtap query mf_priority.so > query.json &&
test_debug "jq -S . <query.json" &&
cat query.json &&
jq -e ".ncores_per_node == 2" <query.json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this check work on all systems? If you need to check the actual cores/node you could use nproc or possibly flux resource R --include=0 | flux R decode --count=core (sorry if I missed someplace where you've guaranteed 2 cores per node)

@cmoussa1 cmoussa1 force-pushed the resource.counting.plugin branch 2 times, most recently from 0c4af54 to 5747337 Compare July 18, 2024 18:33
Copy link

codecov bot commented Jul 18, 2024

Codecov Report

Attention: Patch coverage is 64.70588% with 6 lines in your changes missing coverage. Please review.

Project coverage is 83.08%. Comparing base (a6bd897) to head (5747337).
Report is 6 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #469      +/-   ##
==========================================
- Coverage   83.30%   83.08%   -0.22%     
==========================================
  Files          23       23              
  Lines        1557     1573      +16     
==========================================
+ Hits         1297     1307      +10     
- Misses        260      266       +6     
Files Coverage Δ
src/plugins/mf_priority.cpp 82.19% <64.70%> (-0.87%) ⬇️

@cmoussa1
Copy link
Member Author

Thank you both for the first pass! I've force-pushed up some changes to drop FLUX_KVS_WATCH, use libflux-idset, and change how the test is structured to instead use flux resource R --include=0 | flux R decode --count=core (this check makes a lot more sense than what I had).

I've also added some comments in code where suggested.

Problem: The paths for FLUX_IDSET_LIBS/FLUX_IDSET_CFLAGS are not listed
in the output of ./configure. There is also slight misformatting in the
output of the FLUX_CORE_CFLAGS label.

Switch the period and colon at the end of the FLUX_CORE_CFLAGS line.

Add FLUX_IDSET_LIBS and FLUX_IDSET_CFLAGS to the output of ./configure.
Problem: The priority plugin does not know about basic system
information it will need in order to enforce a max-cores limit per
association, such as the number of cores on a node.

Add an estimation of a cores-per-node count estimate during the
initialization of the priority plugin by fetching resource.R from the
KVS. Store this estimate in a global variable in the plugin.

Add this estimate to the list of information returned in the
plugin.query callback.
Problem: flux-accounting has no tests for estimating the cores-per-node
count by fetching resource.R from the KVS and then querying it from the
plugin.

Add some basic tests.
@cmoussa1 cmoussa1 force-pushed the resource.counting.plugin branch from 409649f to 2dec9cc Compare January 6, 2025 19:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new feature new feature plugin related to the multi-factor priority plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants