-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: migration("re-indexing"), backfilling and diasgnostics tooling for the ChainIndexer
#12450
base: feat/msg-eth-tx-index
Are you sure you want to change the base?
feat: migration("re-indexing"), backfilling and diasgnostics tooling for the ChainIndexer
#12450
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the PR title to match https://github.com/filecoin-project/lotus/blob/master/CONTRIBUTING.md#pr-title-conventions
chainindex/api.go
Outdated
return &types.IndexValidation{ | ||
TipsetKey: ts.Key().String(), | ||
Height: uint64(ts.Height()), | ||
Backfilled: true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fill out the other fields here using SQL queries.
@rvagg Would be great to have a first round of review here when you get the time. |
ChainIndexer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the PR title to match https://github.com/filecoin-project/lotus/blob/master/CONTRIBUTING.md#pr-title-conventions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the PR title to match https://github.com/filecoin-project/lotus/blob/master/CONTRIBUTING.md#pr-title-conventions
ChainIndexer
ChainIndexer
ChainIndexer
ChainIndexer
ChainIndexer
ChainIndexer
chain/index/api.go
Outdated
return revertedCount, nonRevertedCount, nil | ||
} | ||
|
||
func (si *SqliteIndexer) ChainValidateIndex(ctx context.Context, epoch abi.ChainEpoch, backfill bool) (*types.IndexValidation, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we did this all in a single transaction, would that remove the need for a write lock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, transaction provides atomicity, not exclusivity. Imagine this scenario:
Backfill
is called for epoch e -> it reads tipsetts
at that epochRevert
executes forts
and marks the tipset as reverted inTX1
Backfill
executes forts
and marks it as applied inTX2
TX1
is committed first and then TX2
is comitted.
Basically, it's hard to reason about state here without this lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One work-around here it to only allow backfilling for "safe" epochs i.e. <=head - 30
.
Then, the rewrite can only ever happen in an edge case.
|
||
// fetch the non-reverted tipset at this epoch | ||
var indexedTsKeyCidBytes []byte | ||
err = si.stmts.getNonRevertedTipsetAtHeightStmt.QueryRowContext(ctx, epoch).Scan(&indexedTsKeyCidBytes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see the LIMIT 1
below if you're only expecting one result
chain/index/api.go
Outdated
} | ||
|
||
return &types.IndexValidation{ | ||
TipsetKey: expectedTs.Key().String(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see below, I think just the key would be good here, also note capitalisation of TipSetKey
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
chain/index/api.go
Outdated
return &types.IndexValidation{ | ||
TipsetKey: ts.Key().String(), | ||
Height: uint64(ts.Height()), | ||
Backfilled: true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: other fields
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
* feat: finish todos of validation api * feat: add indexed data verification with chain store * feat: address comments and finish TODO * fix: build issue * address comments * fix: ci issue
Thanks for all your work here @akaladarshi 👍 |
Co-authored-by: Rod Vagg <[email protected]>
NonRevertedEventsCount uint64 | ||
Backfilled bool | ||
IndexedMessagesCount uint64 | ||
IndexedEventsCount uint64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we rename this to IndexedNonRevertedMsgCount
and IndexedNonRevertedEventsCount
to be precise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@akaladarshi That is an implementation detail that will confuse users. They're only asking for the indexed entries at a specific epoch here and so ofcourse expect only non-reverted tipsets back.
* feat: add lotus-shed command for backfilling chain indexer * feat: add lotus-shed command for inspecting the chain indexer * feat: use single lotus-shed command to inspect and backfill * fix: remove the unused queries * small changes * add change log
@BigLep This is now ready for review. Working on a user guide for this tooling as discussed offline. |
@rvagg This is now in a good shape and has been tested on a calibnet node. Please take a look when you get the time. |
_, _ = fmt.Fprintf(cctx.App.Writer, "chain index validation progress: %.2f%%\n", progress) | ||
} | ||
|
||
indexValidateResp, err := api.ChainValidateIndex(ctx, abi.ChainEpoch(epoch), backfill) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aarshkshah1992 what will happen if user provided chainHead
as the fromEpoch
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getNextTipset
will error out, I think.
cmd/lotus-shed/chain_index.go
Outdated
backfilledEpochs = append(backfilledEpochs, epoch) | ||
} | ||
|
||
if output { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should also log the result of each epoch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never mind, didn't see the new changes.
_, _ = fmt.Fprintf(cctx.App.Writer, "✓ Epoch %d (%s)\n", epoch, string(jsonData)) | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a log for completion as well.
I think we can also print how many epochs were success out of total epoch or something like that.
if strings.Contains(err.Error(), "retry") { | ||
log.Warnf("epoch %d; failed to validate index with re-tryable error: %s. retrying...", epoch, err) | ||
epoch++ // Increment epoch to retry the same epoch | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we sleep here for sometime because retry will happen due to re-orgs and that might take sometime?.
// is it a null round ? | ||
if indexValidateResp.IsNullRound { | ||
if logGood { | ||
_, _ = fmt.Fprintf(cctx.App.Writer, "✓ Epoch %d; null round\n", epoch) | ||
} | ||
continue | ||
} | ||
|
||
// log success | ||
if logGood { | ||
jsonData, err := json.Marshal(indexValidateResp) | ||
if err != nil { | ||
return fmt.Errorf("failed to marshal results to JSON: %w", err) | ||
} | ||
|
||
_, _ = fmt.Fprintf(cctx.App.Writer, "✓ Epoch %d (%s)\n", epoch, string(jsonData)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// is it a null round ? | |
if indexValidateResp.IsNullRound { | |
if logGood { | |
_, _ = fmt.Fprintf(cctx.App.Writer, "✓ Epoch %d; null round\n", epoch) | |
} | |
continue | |
} | |
// log success | |
if logGood { | |
jsonData, err := json.Marshal(indexValidateResp) | |
if err != nil { | |
return fmt.Errorf("failed to marshal results to JSON: %w", err) | |
} | |
_, _ = fmt.Fprintf(cctx.App.Writer, "✓ Epoch %d (%s)\n", epoch, string(jsonData)) | |
} | |
// log success | |
if logGood { | |
if indexValidateResp.IsNullRound { | |
_, _ = fmt.Fprintf(cctx.App.Writer, "✓ Epoch %d; null round\n", epoch) | |
} else { | |
jsonData, err := json.Marshal(indexValidateResp) | |
if err != nil { | |
return fmt.Errorf("failed to marshal results to JSON: %w", err) | |
} | |
_, _ = fmt.Fprintf(cctx.App.Writer, "✓ Epoch %d (%s)\n", epoch, string(jsonData)) | |
} | |
} | |
ChainIndexer Migration and Diagnostics Tooling
This PR implements the "migration" (really re-indexing / backfilling), and diagnostics tooling for the
ChainIndexer
implemented in PR #12450, and is part of the work for #12453. This tooling takes the form of both RPC APIs on the daemon andlotus-shed
CLI commands.Re-indexing Process
The re-indexing tool enables clients to index their entire existing ChainState in the
ChainIndexer
. This process is necessary due to the removal of the existingMsgIndex
,EthTxIndex
, andEventIndex
from Lotus.Why Re-index Instead of Migrate?
We've chosen to re-index rather than migrate data from existing indices for two primary reasons:
Instead, we're re-indexing the
Chainstore
/Chainstate
on the node into theChainIndexer
. This ensures that all re-indexed entries have gone through the indexing logic of the newChainIndexer
and that the Index is in sync/reflects the actual contents of theChainstore
/Chainstate
post re-indexing.Diagnostics Tooling
This PR introduces diagnostic tools for detecting corrupt Index entries at specific epochs or epoch ranges.
While this PR implements functionality for optionally backfilling missing Index entries, it does not yet include the capability to "repair" corrupted Indexed entries. The repair functionality will be introduced in a subsequent PR. This approach allows us to first gather and analyze user reports, helping us understand the types and causes of corrupted Indexed entries(and if all they exist in the new
ChainIndexer
) before implementing repair mechanisms.Core API
The fundamental building block for this tooling is the following RPC API:
This API has the following features:
Chainstore
state and the Indexed entries (tipset messages/events)lotus-shed
CLI toolingThe
lotus-shed
CLI tooling for both re-indexing/backfilling and diagnostics can then invoke this RPC API over epoch ranges. The correspondinglotus-shed backfill index [from, to]
andlotus-shed inspect index [from, to]
can then backfill/inspect/diagnose the Index for the given epoch ranges.TODO
lotus-shed
commands