Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testsuite: change check for specific job states #393

Merged
merged 5 commits into from
May 28, 2024

Conversation

cmoussa1
Copy link
Member

@cmoussa1 cmoussa1 commented Sep 20, 2023

Problem

As mentioned in #390, there are some tests in the flux-accounting testsuite that might produce a race condition when run under valgrind. Some of the tests look at specific job states, but there is a chance the job may not have reached a certain state, like DEPEND, by the time flux job info is run. It might be better to use flux job wait-event to look for depend or alloc events instead of looking at DEPEND or RUN states of jobs.


This PR makes a number of changes to the testsuite so that flux-accounting tests can be run under valgrind. The first commit changes the checks for certain job states to use flux job wait-event and flux job info $jobid eventlog to check for certain events and states in the tests.

The next commit removes a couple of checks for specific job usage values because running this set of tests in t1011-job-archive-interface.t under valgrind might cause the job usage values as a result of the submitted job to be different than what is checked.

The third commit changes the order of job cancellations of the submitted jobs in t1019-mf-priority-info-fetch.t because we don't want the last submitted job to enter the alloc event while the first two jobs are being canceled, since in this test we're checking for certain running_jobs and active_jobs values in the tests.

The last commit just converts t1020-mf-priority-issue262.t to use tabs instead of spaces, which would make it consistent with the rest of the tests in the flux-accounting testsuite.

If desired, perhaps I could add a valgrind builder to the GitHub actions for flux-accounting in this PR? Is that possible? I can't remember if flux-core has one; I tried doing a quick GitHub search but couldn't find anything.

@cmoussa1 cmoussa1 added the testing issues that deal with testing label Sep 20, 2023
@cmoussa1 cmoussa1 changed the title [WIP] testsuite: change check for specifc job states testsuite: change check for specifc job states Nov 7, 2023
@cmoussa1 cmoussa1 marked this pull request as ready for review November 7, 2023 21:45
@cmoussa1
Copy link
Member Author

cmoussa1 commented Nov 7, 2023

I've just rebased after #399, so I think I'll mark this as ready for review.

@cmoussa1 cmoussa1 changed the title testsuite: change check for specifc job states testsuite: change check for specific job states Feb 9, 2024
Comment on lines 95 to 97
flux job wait-event -vt 60 $jobid3 dependency-add &&
flux job info $jobid3 eventlog > eventlog.out &&
grep "dependency-add" eventlog.out
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validate event should transition a job from NEW to DEPEND, so it may be possible to replace the test for state == DEPEND with flux job wait-event JOBID validate. However, I may be misunderstanding the intent here.

Also, it probably isn't necessary to use flux job wait-event and then grep for the event in the eventlog fetched from flux job info. In this case, if flux job wait-event has reported that the event has appeared, it is guaranteed that the event is posted to the eventlog. Therefore I think you could skip the two extra steps here as redundant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this pointer! The intent with this test is to ensure that the job has a dependency added to it by the priority plugin. In this case, should I still change the check to instead use your suggestion flux job wait-event JOBID validate? I should update the description of the test to make it clearer what it is trying to do, sorry about that.

Thanks also for your suggestion about adding flux job info after a flux job wait-event - I can just remove those from this test and the others that have it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I misunderstood the intent. If the intent is to ensure a dependency is added, the flux job wait-event dependency-add is the correct approach.

You could also consider using the flux job wait-event -m, --match-context option to ensure that the context matches the dependency you expect to be adding too.

Sorry for my confusion!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a problem! I just force-pushed up a change to use --match-context with flux job wait-event per your suggestion above (I also updated the test description to hopefully be a little bit clearer), as well as removed the redundant flux job info calls throughout the changed tests that look at job state.

Copy link
Contributor

@grondo grondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Noted a couple commit message issues inline.

jobid3=$(flux python ${SUBMIT_AS} 5011 sleep 60) &&
test $(flux jobs -no {state} ${jobid3}) = DEPEND
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update commit message to remove "and grep-ing for certain events..."

@@ -175,8 +175,9 @@ test_expect_success 'run update-usage and update-fshare commands' '
'

test_expect_success 'check that job usage and fairshare values get updated' '
flux account-shares -p $(pwd)/FluxAccountingTest.db > post_update2.test &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like perhaps two commits were accidentally squashed here? (Commit message has two Problem statements)

cmoussa1 added 5 commits May 28, 2024 14:56
Problem: There are some tests in the flux-accounting testsuite that
might produce a race condition, especially when run under valgrind.
Some of the tests look at specific job states, but there is a chance
the job may not have reached a certain state, like DEPEND, by the time
flux job info is run.

In the testsuite, change the way that specific job states are checked
by using "flux job wait-event".
Problem: When run under valgrind, there is a chance that the timeout is
hit while waiting for jobs to be posted to flux-core's job archive.

Increase the timeout from 50 to 100.
Problem: Checking for an exact job usage value can be challenging when
running tests under valgrind because the value might be ever so
slightly different.

Change the way the job usage value is checked to instead just look that
the job usage value is greater than or equal to the expected value.
Problem: Running t1019-mf-priority-info-fetch.t under valgrind causes
the third submitted job to enter an alloc event by the time the first
two are canceled, which we don't want for the purpose of the tests
because we are checking for certain cur_run_jobs and cur_active_jobs
values when we query the plugin's internal map.

Change the order of the job cancellations to cancel the last submitted
job first so that it never enters the alloc event while the first two
are in the process of being cancelled.
Problem: t1020-mf-priority-issue262.t uses spaces instead of tabs, which
is inconsistent with the rest of the sharness tests in flux-accounting.

Convert the spaces to tabs in this test file.
Copy link

codecov bot commented May 28, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 83.34%. Comparing base (2485a88) to head (0f7e1a5).
Report is 7 commits behind head on master.

Current head 0f7e1a5 differs from pull request most recent head 07816f3

Please upload reports for the commit 07816f3 to get more accurate results.

Additional details and impacted files
@@           Coverage Diff            @@
##           master     #393    +/-   ##
========================================
  Coverage   83.34%   83.34%            
========================================
  Files          23       23            
  Lines        1555     1435   -120     
========================================
- Hits         1296     1196   -100     
+ Misses        259      239    -20     

see 3 files with indirect coverage changes

@cmoussa1
Copy link
Member Author

cmoussa1 commented May 28, 2024

Thanks @grondo! I believe I've addressed both commit suggestions above. Will set MWP here

@mergify mergify bot merged commit 351ae02 into flux-framework:master May 28, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merge-when-passing testing issues that deal with testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants