Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Miner bug: CommitFailed with nil CommR causes panic #12397

Open
rvagg opened this issue Aug 16, 2024 · 0 comments
Open

Miner bug: CommitFailed with nil CommR causes panic #12397

rvagg opened this issue Aug 16, 2024 · 0 comments
Assignees

Comments

@rvagg
Copy link
Member

rvagg commented Aug 16, 2024

From slack: https://filecoinproject.slack.com/archives/CPFTWMY7N/p1723758975588969

2024-08-16T05:33:52.389+0800	ERROR	evtsm	[email protected]/machine.go:116	executing step: panic: runtime error: invalid memory address or nil pointer dereference
goroutine 35499 [running]:
github.com/filecoin-project/lotus/storage/pipeline.(*Sealing).Plan.func1.1()
	/home/runner/work/lotus/lotus/lotus/storage/pipeline/fsm.go:48 +0x7b
panic({0x4b32940?, 0xa1df940?})
	/opt/hostedtoolcache/go/1.21.12/x64/src/runtime/panic.go:914 +0x21f
github.com/filecoin-project/lotus/storage/pipeline.(*Sealing).checkCommit(_, {_, _}, {{0xc01e075940, 0xc}, 0x226c, 0xd, 0x66ad4e8e, {0xc03fea1220, 0x1, ...}, ...}, ...)
	/home/runner/work/lotus/lotus/lotus/storage/pipeline/checks.go:240 +0x43b
github.com/filecoin-project/lotus/storage/pipeline.(*Sealing).handleCommitFailed(_, {{_, _}, _}, {{0xc01e075940, 0xc}, 0x226c, 0xd, 0x66ad4e8e, {0xc03fea1220, ...}, ...})
	/home/runner/work/lotus/lotus/lotus/storage/pipeline/states_failed.go:333 +0x44c
github.com/filecoin-project/lotus/storage/pipeline.(*Sealing).Plan.func1({{_, _}, _}, {{0xc01e075940, 0xc}, 0x226c, 0xd, 0x66ad4e8e, {0xc03fea1220, 0x1, ...}, ...})
	/home/runner/work/lotus/lotus/lotus/storage/pipeline/fsm.go:63 +0xd6
reflect.Value.call({0x4a9fc80?, 0xc016ce1fc0?, 0x55eefc?}, {0x5013777, 0x4}, {0xc029d17f98, 0x2, 0x5559c5?})
	/opt/hostedtoolcache/go/1.21.12/x64/src/reflect/value.go:596 +0xce7
reflect.Value.Call({0x4a9fc80?, 0xc016ce1fc0?, 0x4c203a7261762076?}, {0xc029d17f98?, 0x434152545f425553?, 0x454352554f535245?})
	/opt/hostedtoolcache/go/1.21.12/x64/src/reflect/value.go:380 +0xb9
github.com/filecoin-project/go-statemachine.(*StateMachine).run.func3()
	/home/runner/go/pkg/mod/github.com/filecoin-project/[email protected]/machine.go:113 +0x269
created by github.com/filecoin-project/go-statemachine.(*StateMachine).run in goroutine 19110
	/home/runner/go/pkg/mod/github.com/filecoin-project/[email protected]/machine.go:109 +0x656

It fails here: https://github.com/filecoin-project/lotus/blob/v1.28.2/storage/pipeline/checks.go#L240

User has a CommitFailed sector that won't go away:

 2024-08-03 18:37:29 +0800 CST:  [event;sealing.SectorForceState]        {"User":{"State":"PreCommit2"}}
5489.   2024-08-03 18:37:29 +0800 CST:  [event;sealing.SectorForceState]        {"User":{"State":"PreCommit2"}}
5490.   2024-08-03 18:37:29 +0800 CST:  [error;*xerrors.wrapError]      state machine error: running planner for state Committing failed: planCommitting got event of unknown type sealing.SectorRetrySealPreCommit1, events: [{User:{}} {User:sector had nil commR or commD} {User:{State:PreCommit2}} {User:{State:PreCommit2}}]
5491.   2024-08-03 18:37:29 +0800 CST:  [event;sealing.SectorRetrySealPreCommit1]       {"User":{}}
5492.   2024-08-03 18:37:29 +0800 CST:  [event;sealing.SectorCommitFailed]      {"User":{}}
        sector had nil commR or commD
5493.   2024-08-03 18:37:29 +0800 CST:  [event;sealing.SectorForceState]        {"User":{"State":"PreCommit2"}}
5494.   2024-08-03 18:37:29 +0800 CST:  [event;sealing.SectorForceState]        {"User":{"State":"PreCommit2"}}
5495.   2024-08-03 18:37:29 +0800 CST:  [error;*xerrors.wrapError]      state machine error: running planner for state Committing failed: planCommitting got event of unknown type sealing.SectorRetrySealPreCommit1, events: [{User:{}} {User:sector had nil commR or commD} {User:{State:PreCommit2}} {User:{State:PreCommit2}}]
5496.   2024-08-03 18:41:20 +0800 CST:  [event;sealing.SectorCommitFailed]      {"User":{}}
        sector had nil commR or commD
5497.   2024-08-03 18:41:20 +0800 CST:  [event;sealing.SectorRetryWaitSeed]     {"User":{}}
5498.   2024-08-03 18:41:20 +0800 CST:  [event;sealing.SectorSeedReady] {"User":{"SeedValue":"zsVNjgIhKPS1dWJybMxGVmVb/XM5nl5325FOAjZwB9s=","SeedEpoch":4145422}}
5499.   2024-08-03 18:41:20 +0800 CST:  [event;sealing.SectorCommitFailed]      {"User":{}}
        sector had nil commR or commD

When this comes into the pipeline, it hits handleCommitting and fails here: https://github.com/filecoin-project/lotus/blob/v1.28.2/storage/pipeline/states_sealing.go#L583-L585

It then moves in to handleCommitFailed which tries to dereference the nil CommR: https://github.com/filecoin-project/lotus/blob/v1.28.2/storage/pipeline/states_failed.go#L333

So it never gets resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants