change time range for incomplete jobs #1506
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
During load testing we found that the scheduled check_job_status tasks assumes that any job that started between 30 and 35 minutes ago and has not finished is 'incomplete' and it will kick off and try to complete this job from what it thinks was the last place the job ran.
In practice what happened during the load test was that the job was almost finished (it had been running for 30 minutes and sent more than 9000 of the 10000 messages), but that the check_job_status declared the job incomplete, somehow determined that it should resume from row 5525, and then started trying to save 4000+ notifications to the db that were already there, resulting in IntegrityErrors and multiple retries and a long period of chaos where the db was being hammered, etc.
So for the short term, avoid changing the logic but change the definition of "too long" from 30 minutes to 4 hours. This gives us time to send over 50k messages, which is more than we currently support in in one job. This should allow all jobs to finish before the check_job_status() task jumps in and declares them incomplete.
Security Considerations
N/A