fix: Deadlock in PostgreSQL. #4445

armory-abedonik · 2023-04-06T17:47:31Z

In PostgreSQL the rows will be locked as they are updated - in fact, the way this actually works is that each tuple (version of a row) has a system field called xmin to indicate which transaction made that tuple current (by insert or update) and a system field called xmax to indicate which transaction expired that tuple (by update or delete). When we access data, it checks each tuple to determine whether it is visible to our transaction, by checking our active "snapshot" against these values.

If we are executing an UPDATE and a tuple which matches our search conditions has an xmin which would make it visible to our snapshot and an xmax of an active transaction, it blocks, waiting for that transaction to complete. If the transaction which first updated the tuple rolls back, our transaction wakes up and processes the row; if the first transaction commits, our transaction wakes up and takes action depending on the current transaction isolation level.

Obviously, a deadlock is the result of this happening to rows in different order. There is no row-level lock in RAM which can be obtained for all rows at the same time, but if rows are updated in the same order we can't have the circular locking. Unfortunately, the suggested IN(1, 2) syntax doesn't guarantee that. Different sessions may have different costing factors active, a background "analyze" task may change statistics for the table between the generation of one plan and the other, or it may be using a seqscan and be affected by the PostgreSQL optimization which causes a new seqscan to join one already in progress and "loop around" to reduce disk I/O.

kkotula

LGTM 🚀

keiko-sql/src/main/kotlin/com/netflix/spinnaker/q/sql/SqlQueue.kt

mattgogerly · 2023-04-12T13:00:04Z

I'm assuming there's no way to get a test to demonstrate this?

jasonmcintosh · 2023-04-12T18:28:38Z

keiko-sql/src/main/kotlin/com/netflix/spinnaker/q/sql/SqlQueue.kt

@@ -281,7 +281,8 @@ class SqlQueue(
      return
    }

-    candidates.shuffle()


SO MOSTLY concerned b/c if I'm reading the comments ABOVE correct:

To minimize lock contention, this is a non-locking read. The id's returned may be locked or removed by another instance before we can acquire them. We read more id's than [maxMessages] and shuffle them to decrease the likelihood that multiple instances polling concurrently are all competing for the oldest ready messages when many more than [maxMessages] are read.

The shuffle shouldn't have mattered. Except... it turns out it DOES matter b/c of lock behavior. Wondering if we can update the comments or explanations on this... OR refactor this code to be less... confusing on how it operates in combination with below which does seem to use a lock mechanism...

Not yet to be merged

armory-abedonik · 2023-04-14T13:44:30Z

Benchmark:

cpu: "100m"
memory: "128Mi"

MySQL

10 records = 23 ms / 13 ms / 11 ms / 12 ms
25 records = 20 ms / 18 ms / 16 ms / 17 ms
50 records = 31 ms / 28 ms / 40 ms / 32 ms
75 records = 41 ms / 38 ms / 32 ms / 38 ms

PostgreSQL

10 records = 5 ms / 7 ms / 6 ms / 5 ms
25 records = 10 ms / 14 ms / 9 ms / 8 ms
50 records = 15 ms / 15 ms / 15 ms / 23 ms
75 records = 15 ms / 18 ms / 16 ms / 26 ms

armory-abedonik force-pushed the BOB-31035 branch from 69eb4d0 to 6559f2b Compare April 6, 2023 17:50

fix: Deadlock in PostgreSQL.

9ef84c3

armory-abedonik force-pushed the BOB-31035 branch from e29d93f to 9ef84c3 Compare April 10, 2023 11:28

kkotula approved these changes Apr 11, 2023

View reviewed changes

kkotula reviewed Apr 11, 2023

View reviewed changes

keiko-sql/src/main/kotlin/com/netflix/spinnaker/q/sql/SqlQueue.kt Outdated Show resolved Hide resolved

feat: add comment.

c0788cb

ovidiupopa07 previously approved these changes Apr 12, 2023

View reviewed changes

jasonmcintosh reviewed Apr 12, 2023

View reviewed changes

armory-abedonik force-pushed the BOB-31035 branch from 940bb17 to dc8eb74 Compare April 14, 2023 14:44

feat: replace multiple updates requests with a single requests.

a81744e

armory-abedonik force-pushed the BOB-31035 branch from dc8eb74 to a81744e Compare April 14, 2023 15:03

armory-abedonik mentioned this pull request Apr 14, 2023

chore(test): Add HikariConfig to initDatabase util used for testing. spinnaker/kork#1045

Merged

armory-abedonik and others added 3 commits April 14, 2023 17:53

feat: Add comment.

2b6e899

fix: tests.

dc3ea00

Merge branch 'master' into BOB-31035

0d66eef

armory-abedonik marked this pull request as draft April 19, 2023 14:57

Merge branch 'master' into BOB-31035

d5dab60

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Deadlock in PostgreSQL. #4445

fix: Deadlock in PostgreSQL. #4445

armory-abedonik commented Apr 6, 2023 •

edited

Loading

kkotula left a comment

mattgogerly commented Apr 12, 2023

jasonmcintosh Apr 12, 2023

armory-abedonik commented Apr 14, 2023 •

edited

Loading

fix: Deadlock in PostgreSQL. #4445

Are you sure you want to change the base?

fix: Deadlock in PostgreSQL. #4445

Conversation

armory-abedonik commented Apr 6, 2023 • edited Loading

kkotula left a comment

Choose a reason for hiding this comment

mattgogerly commented Apr 12, 2023

jasonmcintosh Apr 12, 2023

Choose a reason for hiding this comment

armory-abedonik commented Apr 14, 2023 • edited Loading

armory-abedonik commented Apr 6, 2023 •

edited

Loading

armory-abedonik commented Apr 14, 2023 •

edited

Loading