Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plugin: add "age" as a factor to priority calculation for jobs #455

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

cmoussa1
Copy link
Member

Problem

The priority plugin does not factor in the age of a job in its priority calculation. For the sake of flux-accounting's priority plugin, age will be considered the time that a job was released by the scheduler to be run but could not due to another constraint (e.g a resource constraint).


This PR looks to add age as a factor when calculating a job's priority. It adds a new callback to the plugin for job.state.sched where it will store the current timestamp in a map along with the job ID if the job does not have an urgency of 0. If the job is still active when jobs are reprioritized and job.priority.get is called, this timestamp will be subtracted from the current time to calculate an age, boosting the job's integer priority. When the job finishes running and enters job.state.inactive, the job ID/timestamp key-value pair is removed from the map.

A set of tests are added to the testsuite that look to simulate the following scenarios:

  1. jobs are submitted but one of them cannot run due to a resource constraint; jobs are reprioritized and the job that could not run is expected to have its priority increase as a result.
  2. a user submits a job with --urgency=0, preventing their job from ever running. The urgency of the job is updated, but the priority of the job should not increase since the job was purposely held from running.
  3. jobs are submitted but one of them cannot run due to a resource constraint but one of the jobs is submitted with --urgency=0. The urgency of the job is updated while both jobs are active, so it cannot run. Jobs are reprioritized and the job that could not run is expected to have its priority increase as a result.

Leaving this as [WIP] to get feedback on the implementation to see if this is a viable approach.

In terms of testing, you'll also notice that I've inserted various sleep 1 or sleep 1.5 throughout the test file in order to simulate a job having to wait at least a second so that it's age factor will actually contribute to the calculation of a job's priority. I know we like to try and stray away from putting sleep commands in our test file, so if there is a better approach here, I'm all ears. :-)

@cmoussa1 cmoussa1 added new feature new feature plugin related to the multi-factor priority plugin labels May 30, 2024
@grondo
Copy link
Contributor

grondo commented May 30, 2024

I haven't looked yet, so I apologize if this is obvious, but what happens if a user submits a job, it goes into SCHED state and gets a priority adjustment due to age, and then a user holds the job by setting an urgency of 0? When the job is released again with a future urgency update, is the job's age calculated from the timestamp it originally entered the SCHED state, or is it the most recent urgency event?

@cmoussa1
Copy link
Member Author

Ah, I believe the job's age is calculated from the timestamp it originally entered the SCHED state, not the most recent urgency event, so the scenario you proposed could result in someone boosting their own job's priority. Perhaps a watcher for an urgency event needs to be added to check for this? Do you happen to know if this watcher would follow a similar format to the other "job.update.attributes.*" callbacks?

@grondo
Copy link
Contributor

grondo commented May 30, 2024

Do you happen to know if this watcher would follow a similar format to the other "job.update.attributes.*" callbacks?

Actually, any urgency, priority, or jobspec-update event posted to a job in SCHED state will transition the job back to PRIORITY as shown in the RFC 21 state diagram. Once the new priority event is posted, then the job transitions back to SCHED state, and job.state.sched is called.

I wonder if this means there's nothing more to do here and your current approach would work as is? That is, the age is updated with the most recent transition to SCHED which is what we want. That works out nicely if so.

@grondo
Copy link
Contributor

grondo commented May 30, 2024

If this is the case, I would add a comment to the age calculation with the details above because it probably isn't clear at first look.

@cmoussa1
Copy link
Member Author

Ah, I believe you are right. The age is in fact updated with the most recent transition to SCHED. If I simulate the scenario you proposed above, the eventlog for the job looks like this:

{"timestamp":1717103762.321717,"name":"submit","context":{"userid":1001,"urgency":16,"flags":0,"version":1}}
{"timestamp":1717103762.3332443,"name":"jobspec-update","context":{"attributes.system.bank":"bankA"}}
{"timestamp":1717103762.3332648,"name":"validate"}
{"timestamp":1717103762.3449147,"name":"depend"}
{"timestamp":1717103762.3449488,"name":"priority","context":{"priority":500}}
{"timestamp":1717103763.945509,"name":"priority","context":{"priority":660}}
{"timestamp":1717103763.9557166,"name":"urgency","context":{"userid":58985,"urgency":0}}
{"timestamp":1717103763.9557536,"name":"priority","context":{"priority":0}}
{"timestamp":1717103763.9587595,"name":"urgency","context":{"userid":58985,"urgency":16}}
{"timestamp":1717103763.9588048,"name":"priority","context":{"priority":661}}
{"timestamp":1717103769.0532241,"name":"alloc","context":{"annotations":{"sched":{"resource_summary":"rank0/core[0-1]"}}}}

And the various priority calculations (in order) in the eventlog are as follows:

  1. job does not have an age because it has not reached SCHED state yet - thus, priority is as expected
  2. all jobs have are prioritized while job was waiting for resources - priority gets boosted due to age
  3. priority moves down to 0 because job was updated to have an urgency of 0
  4. job was updated to have an urgency of 16 and priority for the job is recalculated using the timestamp from the last time it reached the SCHED state

I can add a comment with your description above. Thanks!

@cmoussa1 cmoussa1 force-pushed the age.factor.plugin branch from 1945b12 to 9b6e4d7 Compare May 30, 2024 21:35
@cmoussa1 cmoussa1 changed the title [WIP] plugin: add "age" as a factor to priority calculation for jobs plugin: add "age" as a factor to priority calculation for jobs Jun 3, 2024
@cmoussa1 cmoussa1 force-pushed the age.factor.plugin branch from 9b6e4d7 to 9f50970 Compare June 3, 2024 15:49
@cmoussa1 cmoussa1 marked this pull request as ready for review June 3, 2024 15:58
@cmoussa1 cmoussa1 requested a review from grondo June 3, 2024 15:59
Copy link
Contributor

@grondo grondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a quick pass I had some questions, so thought I'd stop here for now.

Comment on lines 66 to 144
* age: a factor that considers the time that a job was released by the
* scheduler to be run but could not due to another constraint (e.g a
* resource constraint). Any urgency, priority, or jobspec-update event
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This statement is unclear to me, but I may just be reading it wrong. What do you mean by "released by the scheduler to run"?

Would it be clearer to just say: "The age factor considers the most recent timestamp that a job entered the SCHED state and therefore represents the amount of time the job has been eligible for scheduling." Or is that statement incorrect (I haven't fully reviewed this PR yet, so apologies if so.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, maybe you just mean "released to the scheduler"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, yes I mean to the scheduler here, not by. I can fix this - thanks!

t_released = std::chrono::duration_cast <std::chrono::duration<double>> (
std::chrono::system_clock::now ().time_since_epoch ()).count ();
// store job id, t_released in map
released_jobs[FLUX_JOBTAP_CURRENT_JOB] = t_released;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FLUX_JOBTAP_CURRENT_JOB is a constant, is that really what you want to use here? I could be misunderstanding use of the map, but I think with multiple jobs you'd end up overwriting each job's t_released?

Seems like you could keep a map of released jobs by jobid, then you wouldn't need to set the value in the aux item list of the job, or you can allocate space for t_released and just store it in the job's aux item list without the need for the released_jobs global?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about that, it's my own misunderstanding of FLUX_JOBTAP_CURRENT_JOB - I thought that I could use that in place of unpacking the jobid.

And you are absolutely right about not setting anything in the aux item list of the job if I use a map. Originally, I had tried allocating space for t_released instead of using a map but wasn't able to figure it out - probably because I was doing something wrong and don't understand memory allocation fully.

Can I allocate space for a double* in one callback and clean it up in another? In this case, allocate in job.state.sched and clean it up in job.state.inactive?

Copy link
Contributor

@grondo grondo Jun 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may be an easier way using a C++ smart pointer, but in C I'd do something like:

    // Allocate space for a double, e.g. in `job.state.sched`
    double *t_released = malloc (sizeof (double));
    if (!t_released)
        // handle enomem error
   *t_released = value;

    // Place value in job_aux hash and schedule destruction with job
    // (be sure to check return value in actual code)
    flux_jobtap_job_aux_set (p,
                             FLUX_JOBTAP_CURRENT_JOB,
                             "mf_priority::t_released",
                             t_released,
                             free);

I think the above will work, but untested so there may be typos, etc.

Then you'd access the value the way you're currently doing it via flux_jobtap_job_aux_get(3). The value will automatically be freed when the job is destroyed, so you don't have to manually do it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried playing around with this implementation for a bit and was just never able to get it working. For now, I've pushed up a fix to just store/retrieve job IDs and their associated timestamps with a map.

I think the nice thing with this method is the ability to return the information stored in this map with the plugin.query callback. I'm definitely open to more discussion, though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT: wait, I'm sorry, I think I might have just figured it out. 😆 I don't think I was correctly managing memory with my C++ pointers, but if I declare a C pointer (like you suggested), I think this actually becomes pretty straightforward. I can push up an additional commit that uses this approach of aux_set for comparison.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if @trws would have a suggestion for using smart pointers stashed in a the job aux item hash, but o/w using a C pointer works fine too.

However, I liked your idea of exposing the age (and maybe other factors too) in the plugin stats returned from flux jobtap query!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK! In that case, I'll leave my proposal as is. 👍 But it is good to know that we have the aux_set/aux_get implementation for the future!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable. FWIW, using a smart pointer in the aux array would kinda defeat the purpose, I could make something for us that makes it easy to capture smart pointers into something managed by the aux array, so things like lambdas or classes with complex state would be easier to manage, but doing it directly wouldn't really buy much unless you had a smart pointer already for some reason.

"mf_priority:t_released"));
if (timestamp) {
// remove the entry from the map
auto it = released_jobs.find (FLUX_JOBTAP_CURRENT_JOB);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, it may not have been clear but FLUX_JOBTAP_CURRENT_JOB is a constant. To get the actual jobid you'll have to unpack it from flux_plugin_arg_t *args

@cmoussa1 cmoussa1 force-pushed the age.factor.plugin branch from 9f50970 to 147085a Compare June 4, 2024 17:07
Copy link
Contributor

@grondo grondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've made another pass through this PR with some comments and questions.

While I was looking at this, another question occurred to me.

It seems like once age is considered as a factor, all jobs have to be reprioritized periodically so that the priority assigned due to age is fair. I don't think this occurs by default anywhere now?

For an example, one job is submitted and immediately proceeds to SCHED state and gets a small priority bump due to its age. Later another job is submitted and the same occurs. Its age is less than the first job, but they will have similar priorities because the priority of the first job has not been recomputed in the intervening time.

This means any time one job has a priority adjustement, all other jobs priorities should be recalculated? This may create quite large eventlogs for jobs on systems where jobs stay pending for awhile...

Comment on lines 103 to 109
age_weight = priority_weights["age"];

// check values of priority factor weights; if not configured,
// these will be set to -1, so just use default weights
if (fshare_weight == -1) fshare_weight = 100000;
if (queue_weight == -1) queue_weight = 10000;
if (age_weight == -1) age_weight = 1000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleanup suggestion: place defaults as defines at top of the file and initialize each `priority_weights[factor] = DEFAULT`` so that you don't have to repeat these checks for every job.

Comment on lines 129 to 142
auto it = released_jobs.find (jobid);
if (it != released_jobs.end ()) {
// job has an "age" associated with it; fetch the timestamp
t_released = it->second;
// get current time
t_current = std::chrono::duration_cast <std::chrono::duration<double>>
(std::chrono::system_clock::now ().time_since_epoch ()).count ();
// calculate age of job
age_factor = t_current - t_released;
} else {
// job does not have an age; don't factor in the age in
// the priority calculation
age_factor = 0;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest refactoring this find into a function, something like double released_job_age (jobid).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally chrono::duration<double> as the return there rather than a regular double unless you have a reason you need it as a double in all cases. I'll comment a couple other places, but try to avoid using .count() unless you're interfacing with C code or really, really need a bare number. Chrono prevents a lot of mistakes when you use its types, and it's a bit of a shame to immediately convert that away.

Comment on lines 855 to 944
* When a job gets to the job.state.sched state, pack a timestamp to be
* associated with the job in the case that it is eligible to be run but
* cannot (e.g due to resource constraints). The timestamp will be used
* to calculate the "age" of a job and increase its priority according to
* the time it has been waiting to run.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: "pack a timestamp" -> "generate a timestamp"

Comment on lines 1166 to 1264
auto it = released_jobs.find (jobid);
if (it != released_jobs.end ())
// job had an "age" associated with it; remove it from the map that
// stores its ID
released_jobs.erase (it);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like std::map erarse() accepts a key, so could this be replaced with released_jobs.erase (jobid)?

Comment on lines 879 to 969
if (urgency > 0) {
// the job is not being held, and therefore we need to get the current
// time to keep track of how long the job has been released and waiting
t_released = std::chrono::duration_cast <std::chrono::duration<double>>
(std::chrono::system_clock::now ().time_since_epoch ()).count ();
// store job id, t_released in map
released_jobs[jobid] = t_released;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: If urgency is 0, should released_jobs.erase (jobid) be called?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like there may be times when the t_released for a job should not be updated in SCHED state. E.g. if a priority adjustment is made and the job was not previously held, shouldn't the age stay the same?

Comment on lines +31 to +38
test_expect_success 'disable age factor in multi-factor priority plugin' '
cat >config/test.toml <<-EOT &&
[accounting.factor-weights]
age = 0
EOT
flux config reload
'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as an FYI for next time, I think this could be a one liner:

echo accounting.factor-weights.age=0 | flux config load

@cmoussa1
Copy link
Member Author

cmoussa1 commented Jun 4, 2024

Thanks @grondo for your review and help so far. I could definitely be misunderstanding the intention behind implementing it, but my thoughts on how "age" should be factored into the calculation were that it was only considered a factor if the job was still in SCHED state by the time all jobs were called to reprioritized. If it is in SCHED state and waiting to be run before a call was made to reprioritize all jobs, its age is not considered in the calculation of its priority if it later proceeds to RUN (even just before a reprioritization of all jobs).

In the example you laid out and with my proposal, I think the first job submitted that immediately proceeds to SCHED state would not get a priority bump due to its age unless it never proceeded to RUN before a reprioritization was called. If a second job is submitted and the same occurs, wouldn't the first job still have a significantly higher priority after all jobs were reprioritized because it was waiting for a longer period of time? Sorry if I am misunderstanding your example!

In any case, you bring up a good point about if one job needs a priority adjustment, all other jobs need their priority to be recalculated. Maybe the plugin needs to implement a flux_jobtap_reprioritize_job () call somewhere? Either in a callback or somewhere else?

flux_plugin_arg_t *args,
void *data)
{
double t_released = 0.0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example here, this should be a chrono::time_point of whatever type you're using.

@@ -43,6 +44,7 @@ std::map<std::string, Queue> queues;
std::map<int, std::string> users_def_bank;
std::vector<std::string> projects;
std::map<std::string, int> priority_weights;
std::map<flux_jobid_t, double> released_jobs;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be a map of jobid-> time_point also I believe.

// job has an "age" associated with it; fetch the timestamp
t_released = it->second;
// get current time
t_current = std::chrono::duration_cast <std::chrono::duration<double>>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then if those are time_point, you can directly subtract the system_clock::now result instead of doing this duration_cast at all. If you need the double in age_factor then use the cast on the result of the subtraction.

@grondo
Copy link
Contributor

grondo commented Jun 5, 2024

wouldn't the first job still have a significantly higher priority after all jobs were reprioritized because it was waiting for a longer period of time? Sorry if I am misunderstanding your example!

Yes, that sounds correct. I wasn't really considering when all jobs are reprioritized because I didn't really see where that is happening in the context of this PR. Does it occur periodically as information is pulled from the accounting DB and pushed into the priority plugin?

In any case, you bring up a good point about if one job needs a priority adjustment, all other jobs need their priority to be recalculated.

Yeah, I think this was the main point I was trying to make. Implementing this could be tricky because you don't want to reprioritize jobs recursively (i.e. you have to tell the difference between a job being reprioritized for other reasons, vs one being reprioritized because all jobs are being reprioritized due to the first job's prioritization)

@cmoussa1
Copy link
Member Author

cmoussa1 commented Jun 6, 2024

Yes, that sounds correct. I wasn't really considering when all jobs are reprioritized because I didn't really see where that is happening in the context of this PR. Does it occur periodically as information is pulled from the accounting DB and pushed into the priority plugin?

Yes, .reprioritize is called every time flux-accounting data is pushed to the plugin, which will in turn call flux_jobtap_reprioritize_all () . These updates occur periodically (every few minutes I'm pretty sure?). This is currently the only sport in the plugin where flux_jobtap_reprioritize_* () is called.

Yeah, I think this was the main point I was trying to make. Implementing this could be tricky because you don't want to reprioritize jobs recursively (i.e. you have to tell the difference between a job being reprioritized for other reasons, vs one being reprioritized because all jobs are being reprioritized due to the first job's prioritization)

I'm trying to think of how reprioritizing a job due to its age might make sense in the case where it should be the only job that should have its priority recalculated - my thought was that this wouldn't be necessary. I guess I figured that something like an age factor would/should only be considered when every pending job's priority was being reprioritized, not just a single one. Does that make sense? Sorry if I am misunderstanding you.

@grondo
Copy link
Contributor

grondo commented Jun 6, 2024

I'm trying to think of how reprioritizing a job due to its age might make sense in the case where it should be the only job that should have its priority recalculated - my thought was that this wouldn't be necessary. I guess I figured that something like an age factor would/should only be considered when every pending job's priority was being reprioritized, not just a single one. Does that make sense? Sorry if I am misunderstanding you.

I think we're saying the same thing.

My question is about the implementation in the current PR. I may be missing a subtlety, but with the following assumptions:

  • The mf_priority plugin returns the current priority whenever a job enters the PRIORITY state.
  • if the job has a t_released assigned, then age is a factor used in the priority calculation returned in the PRIORITY state
  • whenever a job reaches the SCHED state, the job is assigned a t_released timestamp if urgency > 0
  • An urgency or jobspec-update event will send a job in SCHED state back to PRIORITY

Therefore, any job with an urgency update will have its priority recalculated using age. In this case, not all jobs are reprioritzed, just the one affected job, therefore this gives the job an unfair priority bump?

@cmoussa1
Copy link
Member Author

cmoussa1 commented Jun 7, 2024

I think I see what you're saying - sorry for it taking me until now to understand. 🤦 Any urgency/jobspec update will send a job in SCHED state back to PRIORITY, and if they already have an age, their priority will get bumped up.

It sounds to me like, then, the plugin needs to handle additional cases where a job's:

  • urgency is updated from a value > 0 to another value > 0 - the age of the job should be reset when recalculating the job's priority
  • attributes are updated (e.g their queue or bank) - the age of the job should be reset when recalculating the job's priority

Does that sound right?

@grondo
Copy link
Contributor

grondo commented Jun 7, 2024

Yeah, I guess if a job's age is reset any time the job is modified then that is fair and wouldn't require recalculating the priority for all other jobs at the time of that job's priority update, since it would by definition have the smallest age.
I'm not sure I'm the right person to answer if that is how the age factor is supposed to work, so I'd get @ryanday36's opinion there.

Should this rule only apply if the user themselves requested the job update? E.g. an urgency >16 set by the instance owner should perhaps not reset the age? Another question for which I'm not sure I have the answer.

Problem: The priority plugin does not unpack the priority weight
associated with an "age" factor for a job's priority calculation.

Add the "age" factor's associated integer weight to the plugin's
internal map that stores the weights of all of the priority factors by
unpacking it in the callback for conf.update.
@ryanday36
Copy link
Contributor

Hopefully I'm actually understanding the questions(s) here. I think that the ideal age factor would be based on the total time that the job is eligible to be run (i.e. urgency!=0, not in any sort of depend state). That said, considering the age to be the time since the urgency was set to >0 seems like a perfectly reasonable approximation of this. We shouldn't reset the age factor when a job goes from a 'default' urgency (16) to an 'expedite' urgency though.

@cmoussa1 cmoussa1 force-pushed the age.factor.plugin branch from 147085a to 811dc2e Compare June 11, 2024 17:11
cmoussa1 added 2 commits June 11, 2024 10:29
Problem: The priority plugin does not factor in the age of a job in its
priority calculation.

Add a new callback to the plugin for job.state.sched and store the
timestamp of a job in SCHED state in a map where the key is the job ID
and the value is the timestamp. When the job enters job.priority.get,
the timestamp will be used with the time at that moment to calculate the
"age" of the job, where the older the age, the more of a priority bump
the job gets.

If a job currently accumulating age has its urgency changed to 0, the
age of the job will be saved and no longer grow until the urgency of the
job is changed back to a value > 0.

If a job currently accumulating age has one of its attributes updated,
such as the queue or the bank it is running under, any
previously-accumulated age will be cleared and reset for the job.

When the job enters job.state.inactive, the key-value pair for the job
ID and the timestamp of it entering state SCHED is removed.
Problem: A number of tests in the testsuite look for a specific priority
value that becomes very difficult to test with the addition of the "age"
factor.

In the tests that look for a specific priority value, disable the "age"
factor in the priority calculation of a job.
cmoussa1 added 3 commits June 11, 2024 10:29
Problem: flux-accounting has no tests for the implementation of the
"age" factor in the priority plugin.

Add some tests.
Problem: The flux-config-accounting manpage does not describe the "age"
factor in the priority plugin even though it is a factor in the
calculation of a job's priority.

Add "age" to the description of the priority factors in the
flux-config-accounting manpage.
Problem: The accounting guide does not list "queue" or "age" as factors
considered by the plugin when calculating a job's priority.

Add both of these factors to the listed description of factors used
when calculating a job's priority in the priority plugin.
@cmoussa1 cmoussa1 force-pushed the age.factor.plugin branch from 811dc2e to 37cad48 Compare June 11, 2024 17:33
@cmoussa1
Copy link
Member Author

OK, I've force-pushed up some changes (notably to the plugin: add "age" factor in priority calculation commit) based on the feedback above that I hope begins to capture the number of cases we talked about. I can break this up into multiple commits if needed, but I thought it might be simpler to have it all in just one commit.

More specifically, I've added logic to pause the accumulation of age for a job when it is held with an urgency update of 0. This check is made in job.state.priority, and if the urgency is found to be 0, the age up to that point is saved (and paused from growing further) to the job until it is updated to have a positive urgency value. When the priority calculation is performed for the job, any previously accumulated age for the job is then factored in to the overall calculation of age.

If a job's attributes are updated (for example, if they update the bank or queue they want their job to run under), the age for the job is now cleared and reset. I've added a short set of tests to t1035-mf-priority-age.t which ensures that the priority of a job that previously accumulated age and then was updated to run under a different bank would have its age reset.

One thing to note with this most recent iteration I've pushed up: updating a job from one urgency value > 0 to another value <= 16 will not reset the age it accumulated before the urgency update since we are considering the age to be based on the total time that the job is eligible to be run (i.e. urgency != 0, not in any sort of DEPEND state). I'm wondering here if the plugin needs to completely reset the age of the job in this case, or if it is okay to leave as is. @ryanday36 do you have any thoughts or opinions on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new feature new feature plugin related to the multi-factor priority plugin
Projects
Status: In progress
Development

Successfully merging this pull request may close these issues.

4 participants