-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Impossible to Spin-Up a BASE/OP Node - Execution service is busy, cannot assemble blocks & wrong head block #170
Comments
Update: We are seeing these issues in nodes syncing from genesis as well. |
We are facing a similar issue
Erigon Logs
|
@andreclaro @harshsingh-cs Thanks for reporting this issue so thoroughly guys. I'm trying to reproduce this issue on my own node with the same configurations, but it's still being synced to the chain head. I'll be able to debug more once I have nodes with this same issue, but for now, I'm looking over the codebase only. I did find some potential bug in the codebase which might be related to this, but I can't make any guarantees that this will fix this issue. Do you guys have any spare nodes that you can play around with? Can you check out If you don't have any development nodes, please give me a few days to sync and reproduce this issue on my end 😄 |
@mininny I tried the branch but it did not help
By the way I tried this on a node which was already broken |
@andreclaro @harshsingh-cs We have increased the FCU timeout of op-erigon to the same value of op-node's FCU timeout. Can you try the latest release? https://github.com/testinprod-io/op-erigon/releases/tag/v2.60.0-0.6.2 |
It helped, the node which was not syncing is now syncing |
Hi! We tried upgrading to
These are the corresponding logs for
|
@ncavedale-xlabs Are you using our official docker image? We increased the FCU timeout from 1s to 5s, but your logs said |
My bad, I was still using the old binary (we're compiling from source). The errors are gone now 😄 , but I'll keep monitoring and let you know if I see any odd behaviour. |
Summary
Update (2024-05-22): We are also seeing these issues in nodes syncing from genesis as well.
In the past few days, I have been trying with the support an op-erigon developer to spin-up nodes for OP and Base mainnet from snapshots they provide (https://snapshot.testinprod.io/). However the nodes are unable to quickly sync, due to very slow sync, to the point where the node falls further behind the latest block.
The warnings and errors message below are also observed on other nodes syncing from genesis.
System information
Versions:
OS & Version:
Erigon Command (with flags/config):
Consensus Layer: op-node
Consensus Layer Command (with flags/config):
Chain/Network: Optimism Mainnet, Sepolia, Base Mainnet and Sepolia
Expected behaviour
Syncing fast without errors.
Actual behaviour
Getting errors on op-erigon and op-node.
Looking at the source code, logs and OP stack specification here is what it is going on:
[INFO] [05-19|12:21:20.341] [updateForkchoice] Fork choice update: flushing in-memory state (built by previous newPayload)
forkChoiceUpdated timeout
error and forces op-node to retry:[WARN] [05-19|12:21:21.341] [rpc] served conn=127.0.0.1:37458 method=engine_forkchoiceUpdatedV3 reqid=3166 t=1.000392006s err="forkChoiceUpdated timeout"
[WARN] [05-19|12:21:23.555] [rpc] served conn=127.0.0.1:37458 method=engine_forkchoiceUpdatedV3 reqid=3167 t=180.32µs err="[ForkChoiceUpdated]: execution service is busy, cannot assemble blocks"
[WARN] [05-19|12:21:27.806] wrong head block current=0xa1d4ccf32c6d2eb88f9e20761600312198176995a6a538af96d533399ee01cb4 requested=0xbce4dfb53b04b1c369f0b52f00c8d3be128eefbb40f96606c34c8c71e7ec9e73 executionAt=120231966
May 19 12:21:27 m-optimism-02 erigon[1576]: [INFO] [05-19|12:21:27.806] [5/7 HashState] Unwinding started from=120231966 to=120231965 storage=false codes=true
This issue repeats continuously, causing the synchronization from the snapshot to be very very slow, to the point where the node falls further behind the latest block.
As recommended by the op-erigon team, different settings were tried (disable snapshots and enable snap.keepblocks), and the same software versions used on the node where the snapshot was taken (op_erigon v2.55.1-0.4.3 and op-node v1.7.0) were used, but without any improvement observed.
Steps to reproduce the behaviour
The quickest way to reproduce this issue is to use one of the latest snapshots available at https://snapshot.testinprod.io and try to spin-up a new node using op-erigon and op-node
Logs
OP-Erigon:
OP-Node:
Source code related to the logs messages
UpdateForkChoice
function: https://github.com/testinprod-io/op-erigon/blob/cc6aa2f869d3a2aed28f10f6405d04fef25c228a/turbo/execution/eth1/forkchoice.go#L69C35-L97forkChoiceUpdated timeout
:op-erigon/turbo/execution/eth1/forkchoice.go
Lines 85 to 87 in cc6aa2f
execution service is busy, cannot assemble block
:op-erigon/turbo/engineapi/engine_server.go
Line 545 in cc6aa2f
UnwindOnHistoryV3
function:op-erigon/eth/stagedsync/stage_hashstate.go
Line 719 in cc6aa2f
failed to make the new L2 block canonical via forkchoice
: https://github.com/ethereum-optimism/optimism/blob/40eee4f55f253d17e8c1cebe71e6765174ad7f6d/op-node/rollup/derive/engine_update.go#L190The text was updated successfully, but these errors were encountered: