Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Deadlock in PostgreSQL. #4445

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

armory-abedonik
Copy link
Contributor

@armory-abedonik armory-abedonik commented Apr 6, 2023

In PostgreSQL the rows will be locked as they are updated - in fact, the way this actually works is that each tuple (version of a row) has a system field called xmin to indicate which transaction made that tuple current (by insert or update) and a system field called xmax to indicate which transaction expired that tuple (by update or delete). When we access data, it checks each tuple to determine whether it is visible to our transaction, by checking our active "snapshot" against these values.

If we are executing an UPDATE and a tuple which matches our search conditions has an xmin which would make it visible to our snapshot and an xmax of an active transaction, it blocks, waiting for that transaction to complete. If the transaction which first updated the tuple rolls back, our transaction wakes up and processes the row; if the first transaction commits, our transaction wakes up and takes action depending on the current transaction isolation level.

Obviously, a deadlock is the result of this happening to rows in different order. There is no row-level lock in RAM which can be obtained for all rows at the same time, but if rows are updated in the same order we can't have the circular locking. Unfortunately, the suggested IN(1, 2) syntax doesn't guarantee that. Different sessions may have different costing factors active, a background "analyze" task may change statistics for the table between the generation of one plan and the other, or it may be using a seqscan and be affected by the PostgreSQL optimization which causes a new seqscan to join one already in progress and "loop around" to reduce disk I/O.

Copy link
Contributor

@kkotula kkotula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

ovidiupopa07
ovidiupopa07 previously approved these changes Apr 12, 2023
@mattgogerly
Copy link
Member

I'm assuming there's no way to get a test to demonstrate this?

@@ -281,7 +281,8 @@ class SqlQueue(
return
}

candidates.shuffle()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SO MOSTLY concerned b/c if I'm reading the comments ABOVE correct:

 To minimize lock contention, this is a non-locking read. The id's returned may be
 locked or removed by another instance before we can acquire them. We read more id's
 than [maxMessages] and shuffle them to decrease the likelihood that multiple instances
 polling concurrently are all competing for the oldest ready messages when many more
 than [maxMessages] are read.

The shuffle shouldn't have mattered. Except... it turns out it DOES matter b/c of lock behavior. Wondering if we can update the comments or explanations on this... OR refactor this code to be less... confusing on how it operates in combination with below which does seem to use a lock mechanism...

@ovidiupopa07 ovidiupopa07 dismissed their stale review April 13, 2023 17:36

Not yet to be merged

@armory-abedonik
Copy link
Contributor Author

armory-abedonik commented Apr 14, 2023

Benchmark:

cpu: "100m"
memory: "128Mi"

MySQL

10 records = 23 ms / 13 ms / 11 ms / 12 ms
25 records = 20 ms / 18 ms / 16 ms / 17 ms
50 records = 31 ms / 28 ms / 40 ms / 32 ms
75 records = 41 ms / 38 ms / 32 ms / 38 ms

PostgreSQL

10 records = 5 ms / 7 ms / 6 ms / 5 ms
25 records = 10 ms / 14 ms / 9 ms / 8 ms
50 records = 15 ms / 15 ms / 15 ms / 23 ms
75 records = 15 ms / 18 ms / 16 ms / 26 ms

@armory-abedonik armory-abedonik marked this pull request as draft April 19, 2023 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants