[audit] fix: avoid reprocessing withdrawals #20

bendanzhentan · 2023-12-06T02:05:39Z

This PR introduces additional checks to prevent redundant processing of withdrawTo transactions in the ProcessBotDelegatedWithdrawals function.

We now compare the chain nonce with the pending nonce prior to processing. A discrepancy implies existence of transactions from the same address currently in the node’s tx-pool. Therefore, processing is deferred until the chain and pending nonces match, circumventing unnecessary reprocessing of already proven or finalized withdrawals.
A local LRU table is implemented as a tombstone, storing recently processed transactions timestamp. For each transaction, we check if it was processed within the past 5 minutes. If this is true, the transaction is skipped.

Implementing these checks enhances our code efficiency and reduces the risk of errors derived from reprocessing transactions

owen-reorg · 2023-12-06T12:38:43Z

cmd/bot/run.go

@@ -100,6 +115,11 @@ func ProcessUnprovenBotDelegatedWithdrawals(ctx context.Context, log log.Logger,
 	}

 	for _, unproven := range unprovens {
+		// In order to avoid re-processing the same withdrawal, we use a tombstone to mark the withdrawal as processed.
+		if hasWithdrawalRecentlyProcessed(&unproven, tombstone) {


If we already have isPendingAndChainNonceEqual check before calling ProcessUnprovenBotDelegatedWithdrawals, this check should be redundant.

Since there is no pending txns, and this loop will only send at most once for the same ID.
Without this check, the event will get the error OptimismPortal: withdrawal hash has already been proven in dry run, and marked proven then.
But with this check, we have to wait at least 10 min before marking it as proven.

What we want to ensure is that there are no pending txns which handle the same ID. But this check can't achieve this goal.

I agree with your analysis. At first, I only implemented the nonce check, but during testing, I found that re-processing still existed. I have no clue that why re-processing still happens even with nonce check, just suspect it’s related to the L1 node(?)

I didn't spend time to troubleshoot this issue, just only add the local tombstone check to check more strictly.

but during testing, I found that re-processing still existed. I have no clue that why re-processing still happens even with nonce check, just suspect it’s related to the L1 node(?)

The pending status is inconsistent and varies between different nodes when called.

One easy and reliable way is checking the nonce of the transaction and the on-chain latest nonce.
If the on-chain latest nonce >= the txn nonce, then either the txn is mined, or replaced by another txn, which leads this txns to be dropped. What we can confirm is this txn is not pending anymore.

Consider the implementation below:

type PendingTxnCheck struct { pendingTxns map[uint]nonce } // add event id and associated Txn hash to the map func (c *PendingTxnCheck) AddPendingTx(uint id, hash nonce) // check whether there is pending Txn for specific event id // call this function before send tx for event in both approve and finalize process, skip the event if it returns true func (c *PendingTxnCheck) ExistsPendingTxn(uint id) bool { return if the id exists in the map } func (c *PendingTxnCheck) CheckLoop() { // loop the map, check whether the txn is still pending, if not, remove it from the map for { // get the latest on-chain nonce for id, nonce := range map { // remove the item whose nonce is <= on-chain nonce } sleep() } }

Another thing we can change is we can always manage the nonce manually.

After we ensured the pending nonce and latest on-chain nonce is equal, we keep the latest on-chain nonce, use it to send next txn and +1 after we do this.

Thus we don't depend on the the underlying function to fetch the nonce. And even if something is wrong(e.g. some txn stuck), we can replace it with new txns.

Hi @owen-reorg, thanks for your review and sorry for late reply.

I have removed hasWithdrawalRecentlyProcessed check and added PendingTxnCheckcheck at c18eb3c

Thanks again.

owen-reorg · 2023-12-06T12:54:07Z

cmd/bot/run.go

 		case <-ctx.Done():
 			return
 		}
 	}
 }

-func ProcessUnprovenBotDelegatedWithdrawals(ctx context.Context, log log.Logger, db *gorm.DB, l1Client *core.ClientExt, l2Client *core.ClientExt, cfg core.Config) {
+func ProcessUnprovenBotDelegatedWithdrawals(ctx context.Context, log log.Logger, db *gorm.DB, l1Client *core.ClientExt, l2Client *core.ClientExt, cfg core.Config, tombstone *lru.Cache[uint, time.Time]) {
 	processor := core.NewProcessor(log, l1Client, l2Client, cfg)
 	limit := 1000
 	maxBlockTime := time.Now().Unix() - cfg.Misc.ProposeTimeWindow


Use time.Now().Unix() - cfg.Misc.ProposeTimeWindow to filter events to prove is unreliable.
Most of the time it will be faster than this, while when op-proposer fails to submit output root in time, it will result in we try the case and get error in every loop.

Consider using a global variable to track the latestBlockNumber in oracle contract and update it periodically(say every minute) in a separate goroutine.
Then you can compare the block number of the event with this number to ensure the events you take from the db can be proven.

Comparing with the latestBlockNumber could be more accurate!

Will apply this suggestion soon. Thanks

Apply this suggestion at 5b2740b

There is one difference, instead of maintaining a latestBlockNumber in background, I call L2OutputOracle.latestBlockNumber() every time.

Mention that there are some changes for DB type at ef5844c

owen-reorg · 2023-12-06T13:01:27Z

cmd/bot/run.go

 		}
 	}
 }

-func ProcessUnfinalizedBotDelegatedWithdrawals(ctx context.Context, log log.Logger, db *gorm.DB, l1Client *core.ClientExt, l2Client *core.ClientExt, cfg core.Config) {
+func ProcessUnfinalizedBotDelegatedWithdrawals(ctx context.Context, log log.Logger, db *gorm.DB, l1Client *core.ClientExt, l2Client *core.ClientExt, cfg core.Config, tombstone *lru.Cache[uint, time.Time]) {
 	processor := core.NewProcessor(log, l1Client, l2Client, cfg)
 	limit := 1000
 	maxBlockTime := time.Now().Unix() - cfg.Misc.ChallengeTimeWindow


The ChallengeTimeWindow starts from the time when the txn is proven. So in the hour between eventTime + cfg.Misc.ChallengeTimeWindow to proveTime + cfg.Misc.ChallengeTimeWindow, we'll keep trying and get error for the event.

I suggest when we mark the event as proven, we also update the timestamp to the the time it happens.
But this way the field should not be block_time. Maybe we can add a new proven_time field for this.

The accurate way is to monitor the prove event and use the timestamp in the event. But it will take more effect. The solution above should be good enough to reduce the error cases(reduce the dry run call).

if strings.Contains(err.Error(), "OptimismPortal: withdrawal hash has already been proven") { // The withdrawal has already proven, mark it result := db.Model(&unproven).Update("proven", true) if result.Error != nil { log.Error("failed to update proven l2_contract_events", "error", result.Error) }

Make sense. This optimization can reduce the re-try times. Will update soon.

Updated at 5b2740b
PTAL, thanks.

1. check pending nonce and chain nonce before processing 2. check recently processed using local records before processing

1. Rename L2ContractEvent to BotDelegatedWithdrawal 2. Add unique constraint idx_bot_delegated_withdrawals_transaction_hash_log_index_key 3. Add new field `InitiatedBlockNumber int64` to indicate the L2 number of initiated withdrawal transaction 3. Add new fields `ProvenTime *Time` and `FinalizedTime *Time` to indicate the local time of L1 proven transaction and finalized transaction 4. Modify the `FailureReason` to type `FailureReason *string`

1. Determine the proven timing based on the `L2OutputOracle.latestBlockNumber` 2. Determine the finalized timing based on the db `proven_time`

welkin22

LGTM

bendanzhentan changed the base branch from main to migrate-to-mysql December 6, 2023 02:05

bendanzhentan requested review from owen-reorg and welkin22 December 6, 2023 02:11

owen-reorg reviewed Dec 6, 2023

View reviewed changes

bendanzhentan force-pushed the migrate-to-mysql branch 2 times, most recently from 78c88d4 to 66af473 Compare December 8, 2023 03:39

Base automatically changed from migrate-to-mysql to develop December 8, 2023 03:40

bendanzhentan force-pushed the avoid-reprocessing-withdrawals branch 3 times, most recently from 094a157 to c6292a3 Compare December 8, 2023 07:25

bendanzhentan requested a review from owen-reorg December 8, 2023 07:26

bendanzhentan force-pushed the avoid-reprocessing-withdrawals branch from c6292a3 to 5b2740b Compare December 8, 2023 07:26

bendanzhentan added 4 commits December 12, 2023 10:46

fix: avoid re-processing withdrawals

0ff28f1

1. check pending nonce and chain nonce before processing 2. check recently processed using local records before processing

improve: compare timings of proven and finalized more precisely

a5cd966

1. Determine the proven timing based on the `L2OutputOracle.latestBlockNumber` 2. Determine the finalized timing based on the db `proven_time`

bindings: update binding

c274f49

bendanzhentan force-pushed the avoid-reprocessing-withdrawals branch from 5b2740b to c18eb3c Compare December 12, 2023 08:16

feat: manage nonce locally

494b941

bendanzhentan force-pushed the avoid-reprocessing-withdrawals branch from c18eb3c to 494b941 Compare December 12, 2023 10:59

config: update bot contract

26474e1

owen-reorg approved these changes Dec 15, 2023

View reviewed changes

welkin22 approved these changes Dec 15, 2023

View reviewed changes

bendanzhentan merged commit 3c81f35 into develop Jan 4, 2024
2 checks passed

bendanzhentan deleted the avoid-reprocessing-withdrawals branch January 4, 2024 03:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[audit] fix: avoid reprocessing withdrawals #20

[audit] fix: avoid reprocessing withdrawals #20

bendanzhentan commented Dec 6, 2023 •

edited

Loading

owen-reorg Dec 6, 2023

bendanzhentan Dec 7, 2023

owen-reorg Dec 8, 2023

bendanzhentan Dec 12, 2023

owen-reorg Dec 6, 2023 •

edited

Loading

bendanzhentan Dec 7, 2023

bendanzhentan Dec 8, 2023 •

edited

Loading

bendanzhentan Dec 8, 2023 •

edited

Loading

owen-reorg Dec 6, 2023

bendanzhentan Dec 7, 2023

bendanzhentan Dec 8, 2023 •

edited

Loading

welkin22 left a comment

[audit] fix: avoid reprocessing withdrawals #20

[audit] fix: avoid reprocessing withdrawals #20

Conversation

bendanzhentan commented Dec 6, 2023 • edited Loading

owen-reorg Dec 6, 2023

Choose a reason for hiding this comment

bendanzhentan Dec 7, 2023

Choose a reason for hiding this comment

owen-reorg Dec 8, 2023

Choose a reason for hiding this comment

bendanzhentan Dec 12, 2023

Choose a reason for hiding this comment

owen-reorg Dec 6, 2023 • edited Loading

Choose a reason for hiding this comment

bendanzhentan Dec 7, 2023

Choose a reason for hiding this comment

bendanzhentan Dec 8, 2023 • edited Loading

Choose a reason for hiding this comment

bendanzhentan Dec 8, 2023 • edited Loading

Choose a reason for hiding this comment

owen-reorg Dec 6, 2023

Choose a reason for hiding this comment

bendanzhentan Dec 7, 2023

Choose a reason for hiding this comment

bendanzhentan Dec 8, 2023 • edited Loading

Choose a reason for hiding this comment

welkin22 left a comment

Choose a reason for hiding this comment

bendanzhentan commented Dec 6, 2023 •

edited

Loading

owen-reorg Dec 6, 2023 •

edited

Loading

bendanzhentan Dec 8, 2023 •

edited

Loading

bendanzhentan Dec 8, 2023 •

edited

Loading

bendanzhentan Dec 8, 2023 •

edited

Loading