-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
testsuite: change check for specific job states #393
Conversation
I've just rebased after #399, so I think I'll mark this as ready for review. |
t/t1005-max-jobs-limits.t
Outdated
flux job wait-event -vt 60 $jobid3 dependency-add && | ||
flux job info $jobid3 eventlog > eventlog.out && | ||
grep "dependency-add" eventlog.out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The validate
event should transition a job from NEW to DEPEND, so it may be possible to replace the test for state == DEPEND
with flux job wait-event JOBID validate
. However, I may be misunderstanding the intent here.
Also, it probably isn't necessary to use flux job wait-event
and then grep for the event in the eventlog fetched from flux job info
. In this case, if flux job wait-event
has reported that the event has appeared, it is guaranteed that the event is posted to the eventlog. Therefore I think you could skip the two extra steps here as redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this pointer! The intent with this test is to ensure that the job has a dependency added to it by the priority plugin. In this case, should I still change the check to instead use your suggestion flux job wait-event JOBID validate
? I should update the description of the test to make it clearer what it is trying to do, sorry about that.
Thanks also for your suggestion about adding flux job info
after a flux job wait-event
- I can just remove those from this test and the others that have it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I misunderstood the intent. If the intent is to ensure a dependency is added, the flux job wait-event dependency-add
is the correct approach.
You could also consider using the flux job wait-event
-m, --match-context
option to ensure that the context matches the dependency you expect to be adding too.
Sorry for my confusion!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a problem! I just force-pushed up a change to use --match-context
with flux job wait-event
per your suggestion above (I also updated the test description to hopefully be a little bit clearer), as well as removed the redundant flux job info
calls throughout the changed tests that look at job state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Noted a couple commit message issues inline.
jobid3=$(flux python ${SUBMIT_AS} 5011 sleep 60) && | ||
test $(flux jobs -no {state} ${jobid3}) = DEPEND |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update commit message to remove "and grep-ing for certain events..."
@@ -175,8 +175,9 @@ test_expect_success 'run update-usage and update-fshare commands' ' | |||
' | |||
|
|||
test_expect_success 'check that job usage and fairshare values get updated' ' | |||
flux account-shares -p $(pwd)/FluxAccountingTest.db > post_update2.test && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like perhaps two commits were accidentally squashed here? (Commit message has two Problem statements)
Problem: There are some tests in the flux-accounting testsuite that might produce a race condition, especially when run under valgrind. Some of the tests look at specific job states, but there is a chance the job may not have reached a certain state, like DEPEND, by the time flux job info is run. In the testsuite, change the way that specific job states are checked by using "flux job wait-event".
Problem: When run under valgrind, there is a chance that the timeout is hit while waiting for jobs to be posted to flux-core's job archive. Increase the timeout from 50 to 100.
Problem: Checking for an exact job usage value can be challenging when running tests under valgrind because the value might be ever so slightly different. Change the way the job usage value is checked to instead just look that the job usage value is greater than or equal to the expected value.
Problem: Running t1019-mf-priority-info-fetch.t under valgrind causes the third submitted job to enter an alloc event by the time the first two are canceled, which we don't want for the purpose of the tests because we are checking for certain cur_run_jobs and cur_active_jobs values when we query the plugin's internal map. Change the order of the job cancellations to cancel the last submitted job first so that it never enters the alloc event while the first two are in the process of being cancelled.
Problem: t1020-mf-priority-issue262.t uses spaces instead of tabs, which is inconsistent with the rest of the sharness tests in flux-accounting. Convert the spaces to tabs in this test file.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #393 +/- ##
========================================
Coverage 83.34% 83.34%
========================================
Files 23 23
Lines 1555 1435 -120
========================================
- Hits 1296 1196 -100
+ Misses 259 239 -20 |
Thanks @grondo! I believe I've addressed both commit suggestions above. Will set MWP here |
Problem
As mentioned in #390, there are some tests in the flux-accounting testsuite that might produce a race condition when run under
valgrind
. Some of the tests look at specific job states, but there is a chance the job may not have reached a certain state, like DEPEND, by the timeflux job info
is run. It might be better to useflux job wait-event
to look fordepend
oralloc
events instead of looking at DEPEND or RUN states of jobs.This PR makes a number of changes to the testsuite so that flux-accounting tests can be run under
valgrind
. The first commit changes the checks for certain job states to useflux job wait-event
andflux job info $jobid eventlog
to check for certain events and states in the tests.The next commit removes a couple of checks for specific job usage values because running this set of tests in
t1011-job-archive-interface.t
undervalgrind
might cause the job usage values as a result of the submitted job to be different than what is checked.The third commit changes the order of job cancellations of the submitted jobs in
t1019-mf-priority-info-fetch.t
because we don't want the last submitted job to enter thealloc
event while the first two jobs are being canceled, since in this test we're checking for certainrunning_jobs
andactive_jobs
values in the tests.The last commit just converts
t1020-mf-priority-issue262.t
to use tabs instead of spaces, which would make it consistent with the rest of the tests in the flux-accounting testsuite.If desired, perhaps I could add a
valgrind
builder to the GitHub actions for flux-accounting in this PR? Is that possible? I can't remember if flux-core has one; I tried doing a quick GitHub search but couldn't find anything.