-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flexible Scheduling with or_slot [WIP] #1296
base: master
Are you sure you want to change the base?
Conversation
69c8c91
to
29bb462
Compare
Capturing the original greedy solution here since I am going to collapse that commit with the dp solution:
|
29bb462
to
b64c82b
Compare
Problem: the traverser does not support options for flexible scheduling. Add support for a logical or type of resource group, or_slots. or_slots are options for resource configurations that the traverser considers when selecting resources.
Problem: The traverser primes the jobspec with count of resources that are specified as pruning filter types. This additive accumulation results in counts that could be much higher than available counts in the planner when using flexible scheduling with or_slots. These high counts cause the pruning by subplanner to stop the traversal. This results in matches not being found when matches are available. Add in a new accumulation option min_if. This takes the lowest count instead of the sum of all resource counts. Use this when the parent type is or_slot_rt.
b64c82b
to
c4fbdb4
Compare
Problem: there are no tests for or_slots Add tests
c4fbdb4
to
5095b76
Compare
After a discussion with @milroy, we decided that it does not make sense to modularize the policy and selection algorithm for or slots yet, and that we can save it for when we have more than one policy or decide on how that can be expressed in the job spec. this is a similar point to how we want to deal with or slot counts, we do not yet have a great solution for that yet. Additionally, I have done some performance testing on or slots to see how this algorithm affects the scheduling time. I uploaded the csv with this comment. The columns in the CSV are |
f011ad7
to
6465c9d
Compare
Problem: the or_config map uses a string as an index. Fluxion is moving away from using strings where ever possible. Create a struct and custom hashing function to use the map of resource counts directly as an index.
6465c9d
to
a35f8a8
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1296 +/- ##
========================================
+ Coverage 75.3% 75.5% +0.2%
========================================
Files 111 111
Lines 15300 16119 +819
========================================
+ Hits 11531 12183 +652
- Misses 3769 3936 +167
|
This PR introduces flexible scheduling to the traverser. Flexible scheduling will allow the user to define several different acceptable resource configurations that can be selected from while the traverser is walking the resource graph. These configurations are specified as or_slots, where a single or_slot is an acceptable compute unit. These or_slots are selected from equally from any available options at a given point in the graph traversal. Any combination of the or_slots may be selected, and they should be thought of as interchangeable configurations.
Implementation:
or_slot:
or_slots are a new slot-like resource type. They searched for and treated similarly to slots, but in this case you can have sibling or_slots at the same level. or slots also have the added complexity of needing to select the best configuration of the or_slots for available resources. At time of submitting this WIP PR, this is done by taking the union of all resources specified in all the or slots and completing the traversal with all of those resources to get resource counts for those resources. With those resource counts the best configurations can be selected and scheduled as normally with slots.
slot configuration selection:
After getting resource counts for all possible resources in all of the or_slots, the slot configuration is determined. I was not able to find an efficient means of finding the best configurations in terms of score. A first attempt landed me on a greedy algorithm that that would do well interns of score, but that would on occasion not find a match when a match could be found. This PR show an example of a DP algorithm that optimizes over the number of or_slots that it can schedule with the given resources. How the configuration is selected could be configurable in a similar way to how match policies are selected, and there could be much effort to give flexible and features to this selection process. An example might be to use the same DP algo, but weight some configurations more than others.
Caveats:
To do Items:
Example Job Spec: