Sidekiq jobs are getting run twice, leading to database locks #2333

maxkadel · 2024-04-03T14:45:45Z

For more detailed notes, see the datadog notebook

Ensure Sidekiq jobs are not created twice with the same job id. This is very likely a redis latency issue.

History

We have a monitoring alert that is going off repeatedly for bibdata, saying that postgres queries are taking a very long time - sometimes as much as 15 minutes (at which point they're probably getting killed by Postgres, not finishing).

The postgres queries that are taking so long are:

SELECT dump_types . * FROM dump_types WHERE dump_types . constant = ? LIMIT ?

Which seem to be getting called from the AlmaDumpTransferJob.

It seems possible that this job is somehow getting called twice with different GlobalIDs, and causing a database lock? If that's it, it could be a redis latency issue? Or a sidekiq thread management issue? (again, see datadog notebook)

Acceptance Criteria

Reconfigure staging to use central staging redis server
Reconfigure qa to use central staging redis server
Reconfigure production to use central production redis server. (We need to make sure to flush existing jobs and stop workers prior to switching to this in production).
Check journalctl -u bibdata-workers.service --grep AlmaDumpTransferJob on the worker box to confirm this issue isn't happening any longer.

The text was updated successfully, but these errors were encountered:

maxkadel · 2024-04-04T18:30:49Z

Sidekiq thread on migrating to a new Redis suggests using redis replication

To migrate, you set up a new replica of the old primary, let it replicate, shut down workers, shut down primary, promote new replica to new primary, start up workers.

Beck-Davis · 2024-04-04T20:30:01Z

change BIBDATA_REDIS_URL in group_vars
ssh to each worker box & make sure no jobs are running/queued
stop sidekiq workers
run entire bibdata playbook from branch ~~with site-config flag~~
restart sidekiq workers on the boxes
make sure sidekiq is still running
run a background job on bibdata & check how many jobs are run

hackartisan · 2024-04-05T13:52:14Z

Please make sure to use the central redis var described in https://github.com/pulibrary/princeton_ansible/blob/main/roles/redis/README.md

maxkadel · 2024-04-05T14:15:55Z

Thanks @hackartisan! I had missed this.

maxkadel · 2024-04-09T15:04:18Z

Is Sidekiq connecting to Redis twice?

NFO: Sidekiq Pro 7.2.0, commercially licensed. Thanks for your support!
Apr 08 06:27:53 bibdata-alma-worker-staging1 sidekiq[865]: 2024-04-08T10:27:53.704Z pid=865 tid=49l INFO: Sidekiq 7.2.2 connecting to Redis with options {:size=>10, :pool_name=>"internal", :url=>"redis://lib-redis-staging1.princeton.ed>
Apr 08 06:27:53 bibdata-alma-worker-staging1 sidekiq[865]: 2024-04-08T10:27:53.714Z pid=865 tid=49l INFO: Sidekiq 7.2.2 connecting to Redis with options {:size=>2, :pool_name=>"default", :url=>"redis://lib-redis-staging1.princeton.edu:>

maxkadel · 2024-04-10T16:36:27Z

The redis production server is still on Redis 6.0, which is too old for our current version of Sidekiq. See Princeton Ansible ticket to upgrade this server

kevinreiss · 2024-04-11T15:17:10Z

We should talk to ops on how to transition the production environment to the same version as staging or discuss alternative plans.

maxkadel · 2024-04-16T21:05:32Z

This may be related to #1959. Newer Honeybadger error. May be addressed by postgres configuration change

maxkadel added the bug The application does not work as expected because of a defect label Apr 3, 2024

maxkadel changed the title ~~AlmaDumpTransferJob generates long-running / locking postgres queries~~ Sidekiq jobs are getting run twice, leading to database locks Apr 3, 2024

maxkadel mentioned this issue Apr 3, 2024

Slow Postgres queries - Identify the root cause and write a ticket to fix it #2312

Closed

3 tasks

maxkadel assigned maxkadel and Beck-Davis Apr 4, 2024

This was referenced Apr 9, 2024

[bibdata] Point bibdata to shared redis. pulibrary/princeton_ansible#4827

Closed

[firewall rules] Allow Bibdata to connect to Central Redis pulibrary/princeton_ansible#4836

Closed

maxkadel mentioned this issue Apr 10, 2024

[bibdata] move production to central Redis pulibrary/princeton_ansible#4840

Merged

christinach added template-update Tickets that need to be updated using the appropriate issue template and removed template-update Tickets that need to be updated using the appropriate issue template labels Apr 23, 2024

maxkadel closed this as completed in pulibrary/princeton_ansible#4840 Apr 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sidekiq jobs are getting run twice, leading to database locks #2333

Sidekiq jobs are getting run twice, leading to database locks #2333

maxkadel commented Apr 3, 2024 •

edited

Loading

maxkadel commented Apr 4, 2024

Beck-Davis commented Apr 4, 2024 •

edited by maxkadel

Loading

hackartisan commented Apr 5, 2024

maxkadel commented Apr 5, 2024

maxkadel commented Apr 9, 2024

maxkadel commented Apr 10, 2024

kevinreiss commented Apr 11, 2024

maxkadel commented Apr 16, 2024

Sidekiq jobs are getting run twice, leading to database locks #2333

Sidekiq jobs are getting run twice, leading to database locks #2333

Comments

maxkadel commented Apr 3, 2024 • edited Loading

History

Acceptance Criteria

maxkadel commented Apr 4, 2024

Beck-Davis commented Apr 4, 2024 • edited by maxkadel Loading

hackartisan commented Apr 5, 2024

maxkadel commented Apr 5, 2024

maxkadel commented Apr 9, 2024

maxkadel commented Apr 10, 2024

kevinreiss commented Apr 11, 2024

maxkadel commented Apr 16, 2024

maxkadel commented Apr 3, 2024 •

edited

Loading

Beck-Davis commented Apr 4, 2024 •

edited by maxkadel

Loading