[Transaction Service Status] Batch status and memo writes to DB. #3026

fkouteib · 2024-09-29T09:05:53Z

Problem

Transaction status service issues individual writes to DB backend for each transaction memo, and each account/pubkey update within a transaction. This seems inefficient at least from CPU execution time perspective, even if the DB backend is aggregating and batching IO updates to disk.

Summary of Changes

Batch transaction status writes before writing them to DB backend.
Batch transaction memo writes before writing them to DB backend.
Perform batching at transaction batch level (typically 64 tx).

fkouteib · 2024-09-29T09:25:53Z

This data was collected using a synthetic workload, from an internal test, and a 2-node local cluster setup running on a single physical node (Linux). This test runs in about a minute. The transaction mix is largely test transactions that have 35 account pubkeys including the fee payer [these land in TSS in 64 tx batches] and single vote transactions [they land in a batch of 1]. This was measured execution time for the match segment that handles transaction status batches in transaction status service.

Stat	Baseline (exec time in us)	batched (exec time in us)
min	22	16
max	226,770	49,679
mean	517.01	186.23
median	233	87
std dev	2581.53	795.16
datapoints	112,066	124,027

I also ran the same synthetic workload and internal test on a small multi-node distributed private test cluster, and compared high level disk io metrics for num sectors written and num writes completed and the ratio were pretty close, and seemed to me the writes must generally be 128k commands based on the ratios averaged out.

bw-solana · 2024-09-30T21:03:59Z

disk io metrics for num sectors written and num writes completed and the ratio were pretty close

This is interesting given the high level timings seem significantly better after this change. Do we think this is just due to reduction in CPU time from merging writes internally?

bw-solana

Left a couple of suggestions.

I'm also wondering if we have an existing bencher for some of these operations. If not, might be nice to add one and confirm it shows benefits relative to the unbatched behavior.

ledger/src/blockstore.rs

bw-solana · 2024-09-30T21:15:30Z

ledger/src/blockstore.rs

@@ -9618,6 +9653,7 @@ pub mod tests {
                                .map(|key| (key, true)),
                            TransactionStatusMeta::default(),
                            counter,
+                            None,
                        )


We should probably have a unit test that confirms the new batching behavior

bw-solana · 2024-09-30T21:23:38Z

@lijunwangs - I'm thinking you're the best person to review these changes while Tyera is out. Let me know if there is a better candidate

lijunwangs · 2024-10-02T08:10:57Z

This data was collected using a synthetic workload, from an internal test, and a 2-node local cluster setup running on a single physical node (Linux). This test runs in about a minute. The transaction mix is largely test transactions that have 35 account pubkeys including the fee payer [these land in TSS in 64 tx batches] and single vote transactions [they land in a batch of 1]. This was measured execution time for the match segment that handles transaction status batches in transaction status service.

Stat Baseline (exec time in us) batched (exec time in us)
min 22 16
max 226,770 49,679
mean 517.01 186.23
median 233 87
std dev 2581.53 795.16
datapoints 112,066 124,027
I also ran the same synthetic workload and internal test on a small multi-node distributed private test cluster, and compared high level disk io metrics for num sectors written and num writes completed and the ratio were pretty close, and seemed to me the writes must generally be 128k commands based on the ratios averaged out.

What is the exact metrics being used for this data?

fkouteib · 2024-10-02T21:28:14Z

What is the exact metrics being used for this data?

@lijunwangs It's not an official metric that uploads to the metrics db, but I posted the debug code used for timing it here.

fkouteib requested a review from bw-solana September 29, 2024 09:26

bw-solana reviewed Sep 30, 2024

View reviewed changes

bw-solana requested a review from lijunwangs September 30, 2024 21:23

fkouteib force-pushed the tss_batch_writes branch from 688214f to 905cc0e Compare October 4, 2024 05:00

fkouteib added 8 commits October 3, 2024 22:11

Batch tx status writes to db.

ca31428

Batch tx status writes across multiple transactions to db.

5e91783

Add db batch writes to tx memos.

4dc9630

Default derivation no longer needed.

1132947

Fix blockstore unit tests.

f4fab55

minor fix

4bcb225

Add tss unit test. Split batching into new functions.

87c8502

Fix CI issue.

10b05e0

fkouteib force-pushed the tss_batch_writes branch from 2a45713 to 10b05e0 Compare October 4, 2024 05:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Transaction Service Status] Batch status and memo writes to DB. #3026

[Transaction Service Status] Batch status and memo writes to DB. #3026

fkouteib commented Sep 29, 2024

fkouteib commented Sep 29, 2024 •

edited

Loading

bw-solana commented Sep 30, 2024

bw-solana left a comment

bw-solana Sep 30, 2024

bw-solana commented Sep 30, 2024

lijunwangs commented Oct 2, 2024

fkouteib commented Oct 2, 2024

[Transaction Service Status] Batch status and memo writes to DB. #3026

Are you sure you want to change the base?

[Transaction Service Status] Batch status and memo writes to DB. #3026

Conversation

fkouteib commented Sep 29, 2024

Problem

Summary of Changes

fkouteib commented Sep 29, 2024 • edited Loading

bw-solana commented Sep 30, 2024

bw-solana left a comment

Choose a reason for hiding this comment

bw-solana Sep 30, 2024

Choose a reason for hiding this comment

bw-solana commented Sep 30, 2024

lijunwangs commented Oct 2, 2024

fkouteib commented Oct 2, 2024

fkouteib commented Sep 29, 2024 •

edited

Loading