-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sidekiq jobs are getting run twice, leading to database locks #2333
Comments
Sidekiq thread on migrating to a new Redis suggests using redis replication
|
|
Please make sure to use the central redis var described in https://github.com/pulibrary/princeton_ansible/blob/main/roles/redis/README.md |
Thanks @hackartisan! I had missed this. |
Is Sidekiq connecting to Redis twice?
|
The redis production server is still on Redis 6.0, which is too old for our current version of Sidekiq. See Princeton Ansible ticket to upgrade this server |
We should talk to ops on how to transition the production environment to the same version as staging or discuss alternative plans. |
This may be related to #1959. Newer Honeybadger error. May be addressed by postgres configuration change |
For more detailed notes, see the datadog notebook
Ensure Sidekiq jobs are not created twice with the same job id. This is very likely a redis latency issue.
History
We have a monitoring alert that is going off repeatedly for bibdata, saying that postgres queries are taking a very long time - sometimes as much as 15 minutes (at which point they're probably getting killed by Postgres, not finishing).
The postgres queries that are taking so long are:
Which seem to be getting called from the AlmaDumpTransferJob.
It seems possible that this job is somehow getting called twice with different GlobalIDs, and causing a database lock? If that's it, it could be a redis latency issue? Or a sidekiq thread management issue? (again, see datadog notebook)
Acceptance Criteria
journalctl -u bibdata-workers.service --grep AlmaDumpTransferJob
on the worker box to confirm this issue isn't happening any longer.The text was updated successfully, but these errors were encountered: