Add id column as PRIMARY KEY for evm.logs & evm.log_poller_blocks #1441

reductionista · 2024-09-14T02:06:35Z

Problem:

The query for pruning expired logs with a max limit set was taking longer than it should. This was part due to needing to join on an awkward combination combination of columns due to their being no single primary key.

Solution:

Add id column as PRIMARY KEY for evm.logs & evm.log_poller_blocks
Join on id column instead of previous primary keys
Replace all SELECT *'s with helper functions for selecting all columns
Refactor nestedBlockQuery into withConfs, and make a bit more use of it## Motivation

While adding the id column, we can't just remove the old primary key because the index on it was helping to accelerate some queries.
Instead of just resurrecting it as-is, I took the opportunity to clean up several of the indices on the logs & blocks table. Some indexed columns (eg created_at) were never actually being used,
while others were not ordered in the most optimal way for accelerating the queries we have. Also, at least one of them was redundant with the primary key just in a different order.

reductionista · 2024-09-16T06:47:09Z

core/chains/evm/logpoller/orm_test.go

@@ -1565,7 +1565,7 @@ func TestSelectLatestBlockNumberEventSigsAddrsWithConfs(t *testing.T) {
 			events:              []common.Hash{event1, event2},
 			addrs:               []common.Address{address1, address2},
 			confs:               0,
-			fromBlock:           3,


This behavior looked wrong to me, hopefully there aren't any upstream clients depending on it:
everywhere else, passing a fromBlock implies you're searching for blocks >= fromBlock. This code has been updated in this PR to use that now instead of matching only on blocks > fromBlock

BCI-3492 [LogPoller]: Allow withObservedExecAndRowsAffected to report non-zero rows affected (#14057) * Fix withObservedExecAndRowsAffected Also: - Change behavior of DeleteExpiredLogs to delete logs which don't match any filter - Add a test case to ensure the dataset size is published properly during pruning * pnpm changeset * changeset #fix -> #bugfix

Also: - Add UNIQUE INDEXes to replace previous primary keys (still necessary, both for optimizing queries and for enforcing uniqueness constraints) - Replace all SELECT *'s with helper functions for selecting all columns - Refactor nestedBlockQuery into withConfs, and make a bit more use of it

Some of the columns in these indexes (such as created_at) are no longer used. Others were not optimized for the queries we need.

Previously it was using fromBlock > :block_number which is inconsistent with the other fromBlocks in queries

On a node with more than one chain, each LogPoller would have deleted all logs from chains it's not running on! Because of the LEFT JOIN, ON evm_chain_id = $1 does not filter out any rows where evm_chain_id != $1; only WHERE evm_chain_id = $1 can do that

…LIMIT

cl-sonarqube-production · 2024-09-20T16:20:06Z

Quality Gate failed

Failed conditions
74.7% Coverage on New Code (required ≥ 75%)
C Reliability Rating on New Code (required ≥ A)

See analysis details on SonarQube

Catch issues before they fail your Quality Gate with our IDE extension SonarLint

reductionista temporarily deployed to sdlc September 14, 2024 02:06 — with GitHub Actions Inactive

reductionista temporarily deployed to sdlc September 16, 2024 06:43 — with GitHub Actions Inactive

reductionista marked this pull request as ready for review September 16, 2024 06:44

reductionista requested a review from a team as a code owner September 16, 2024 06:44

reductionista commented Sep 16, 2024

View reviewed changes

reductionista temporarily deployed to publish September 16, 2024 21:52 — with GitHub Actions Inactive

reductionista temporarily deployed to sdlc September 17, 2024 01:10 — with GitHub Actions Inactive

reductionista temporarily deployed to sdlc September 17, 2024 01:20 — with GitHub Actions Inactive

reductionista force-pushed the log_poller_id_columns branch from c03e447 to 0d1c216 Compare September 17, 2024 01:59

reductionista temporarily deployed to sdlc September 17, 2024 01:59 — with GitHub Actions Inactive

reductionista temporarily deployed to sdlc September 17, 2024 03:27 — with GitHub Actions Inactive

reductionista temporarily deployed to publish September 17, 2024 19:59 — with GitHub Actions Inactive

reductionista temporarily deployed to sdlc September 18, 2024 01:55 — with GitHub Actions Inactive

reductionista temporarily deployed to publish September 18, 2024 01:55 — with GitHub Actions Inactive

reductionista had a problem deploying to publish September 18, 2024 01:55 — with GitHub Actions Error

reductionista requested a review from a team as a code owner September 18, 2024 02:13

reductionista temporarily deployed to sdlc September 18, 2024 02:13 — with GitHub Actions Inactive

reductionista force-pushed the log_poller_id_columns branch from b20e122 to 52d080c Compare September 19, 2024 19:54

reductionista temporarily deployed to sdlc September 19, 2024 19:54 — with GitHub Actions Inactive

reductionista temporarily deployed to publish September 19, 2024 19:55 — with GitHub Actions Inactive

reductionista force-pushed the log_poller_id_columns branch from 52d080c to 069660e Compare September 19, 2024 20:35

reductionista temporarily deployed to sdlc September 19, 2024 20:35 — with GitHub Actions Inactive

reductionista temporarily deployed to publish September 19, 2024 20:50 — with GitHub Actions Inactive

reductionista temporarily deployed to sdlc September 20, 2024 15:50 — with GitHub Actions Inactive

reductionista deployed to publish September 20, 2024 15:51 — with GitHub Actions Active

reductionista added 5 commits September 20, 2024 08:56

Clean up db indexes

9ace2c0

Some of the columns in these indexes (such as created_at) are no longer used. Others were not optimized for the queries we need.

Fix 2 unrelated bugs I noticed

90dd2e7

Update ExpiredLogs query

ac60768

reductionista added 8 commits September 20, 2024 08:56

Update test for fromBlock >= :block_number

bb4d81b

Previously it was using fromBlock > :block_number which is inconsistent with the other fromBlocks in queries

Increase staggering of initial pruning runs

e8268f0

Decrease retention periods for CCIP events, for testing

e1fc86b

restore retention periods

5bfea8c

Set LogPrunePageSize = 2001

7a0568d

Update DeleteBlocksBefore query to use block_number index instead of …

1ff9472

…LIMIT

Split off SelectUnmatchedLogs from DeleteExpiredLogs

a99e37f

reductionista force-pushed the log_poller_id_columns branch from 94e15fa to a99e37f Compare September 20, 2024 15:57

reductionista temporarily deployed to sdlc September 20, 2024 15:57 — with GitHub Actions Inactive

reductionista temporarily deployed to integration September 20, 2024 16:20 — with GitHub Actions Inactive

reductionista temporarily deployed to integration September 20, 2024 16:21 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add id column as PRIMARY KEY for evm.logs & evm.log_poller_blocks #1441

Add id column as PRIMARY KEY for evm.logs & evm.log_poller_blocks #1441

reductionista commented Sep 14, 2024

reductionista Sep 16, 2024

cl-sonarqube-production bot commented Sep 20, 2024

Add id column as PRIMARY KEY for evm.logs & evm.log_poller_blocks #1441

Are you sure you want to change the base?

Add id column as PRIMARY KEY for evm.logs & evm.log_poller_blocks #1441

Conversation

reductionista commented Sep 14, 2024

reductionista Sep 16, 2024

Choose a reason for hiding this comment

cl-sonarqube-production bot commented Sep 20, 2024

Quality Gate failed