Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: migration("re-indexing"), backfilling and diasgnostics tooling for the ChainIndexer #12450

Open
wants to merge 10 commits into
base: feat/msg-eth-tx-index
Choose a base branch
from

Conversation

aarshkshah1992
Copy link
Contributor

@aarshkshah1992 aarshkshah1992 commented Sep 11, 2024

ChainIndexer Migration and Diagnostics Tooling

This PR implements the "migration" (really re-indexing / backfilling), and diagnostics tooling for the ChainIndexer implemented in PR #12450, and is part of the work for #12453. This tooling takes the form of both RPC APIs on the daemon and lotus-shed CLI commands.

Re-indexing Process

The re-indexing tool enables clients to index their entire existing ChainState in the ChainIndexer. This process is necessary due to the removal of the existing MsgIndex, EthTxIndex, and EventIndex from Lotus.

Why Re-index Instead of Migrate?

We've chosen to re-index rather than migrate data from existing indices for two primary reasons:

  1. Known issues: The existing indices have multiple known problems, and migrating could perpetuate incorrect index entries.
  2. Lack of garbage collection: Existing indices contain many entries for which the corresponding tipset messages/events no longer exist in the ChainStore due to splitstore GC.

Instead, we're re-indexing the Chainstore/Chainstate on the node into the ChainIndexer. This ensures that all re-indexed entries have gone through the indexing logic of the new ChainIndexer and that the Index is in sync/reflects the actual contents of the Chainstore/Chainstate post re-indexing.

Diagnostics Tooling

This PR introduces diagnostic tools for detecting corrupt Index entries at specific epochs or epoch ranges.

While this PR implements functionality for optionally backfilling missing Index entries, it does not yet include the capability to "repair" corrupted Indexed entries. The repair functionality will be introduced in a subsequent PR. This approach allows us to first gather and analyze user reports, helping us understand the types and causes of corrupted Indexed entries(and if all they exist in the new ChainIndexer) before implementing repair mechanisms.

Core API

The fundamental building block for this tooling is the following RPC API:

type IndexValidation struct {
	TipsetKey string
	Height    uint64

	TotalMessages  uint64
	TotalEvents    uint64
	EventsReverted bool

	Backfilled bool
}

func (si *SqliteIndexer) ChainValidateIndex(ctx context.Context, epoch abi.ChainEpoch, backfill bool) (*types.IndexValidation, error)

This API has the following features:

  • Optionally backfills the Index with a tipset on the canonical chain for the given epoch if it is absent in the Index
  • Returns some aggregated stats for an indexed entry for diagnostics/inspection
  • Reports errors/corrupted indexed entries at the given epoch. Forms of Index corruption that can be diagnosed includes:
    • Presence of multiple non-reverted tipsets at the given epoch
    • Complete absence of a non-reverted tipset at the given epoch that does contain reverted tipsets
    • Mismatch between the Chainstore state and the Indexed entries (tipset messages/events)
    • Incorrect Indexing of null rounds at the given epoch

lotus-shed CLI tooling

The lotus-shed CLI tooling for both re-indexing/backfilling and diagnostics can then invoke this RPC API over epoch ranges. The corresponding lotus-shed backfill index [from, to] and lotus-shed inspect index [from, to] can then backfill/inspect/diagnose the Index for the given epoch ranges.

TODO

  • lotus-shed commands
  • RPC API should compare the Indexed entries with the chain store to detect a mismatch between indexed content and chain store content
  • Fill out all the diagnostics information in the RPC response by queying the messages and events tables in the Index.
  • automated tests

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chainindex/api.go Outdated Show resolved Hide resolved
return &types.IndexValidation{
TipsetKey: ts.Key().String(),
Height: uint64(ts.Height()),
Backfilled: true,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fill out the other fields here using SQL queries.

chain/types/tipset.go Outdated Show resolved Hide resolved
chainindex/indexer.go Outdated Show resolved Hide resolved
chainindex/indexer.go Outdated Show resolved Hide resolved
chainindex/api.go Outdated Show resolved Hide resolved
chainindex/api.go Outdated Show resolved Hide resolved
chainindex/api.go Outdated Show resolved Hide resolved
chainindex/ddls.go Outdated Show resolved Hide resolved
chainindex/ddls.go Outdated Show resolved Hide resolved
github-actions[bot]

This comment was marked as duplicate.

@aarshkshah1992 aarshkshah1992 changed the title [WIP] Chain Index Validation API Chain Index Validation API Sep 12, 2024
github-actions[bot]

This comment was marked as duplicate.

@aarshkshah1992
Copy link
Contributor Author

@rvagg Would be great to have a first round of review here when you get the time.

github-actions[bot]

This comment was marked as duplicate.

github-actions[bot]

This comment was marked as duplicate.

github-actions[bot]

This comment was marked as duplicate.

@aarshkshah1992 aarshkshah1992 changed the title Chain Index Validation API Migration and Diasnostics tooling for the ChainIndexer Sep 12, 2024
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aarshkshah1992 aarshkshah1992 changed the title Migration and Diasnostics tooling for the ChainIndexer Migration("Re-Indexing") and Inspection tooling for the ChainIndexer Sep 12, 2024
github-actions[bot]

This comment was marked as duplicate.

github-actions[bot]

This comment was marked as duplicate.

@aarshkshah1992 aarshkshah1992 changed the title Migration("Re-Indexing") and Inspection tooling for the ChainIndexer Migration("Re-Indexing") and Diasgnostics tooling for the ChainIndexer Sep 12, 2024
github-actions[bot]

This comment was marked as duplicate.

@aarshkshah1992 aarshkshah1992 self-assigned this Sep 12, 2024
github-actions[bot]

This comment was marked as duplicate.

@aarshkshah1992 aarshkshah1992 changed the title Migration("Re-Indexing") and Diasgnostics tooling for the ChainIndexer feat: migration("re-indexing") and diasgnostics tooling for the ChainIndexer Sep 12, 2024
chain/index/api.go Outdated Show resolved Hide resolved
return revertedCount, nonRevertedCount, nil
}

func (si *SqliteIndexer) ChainValidateIndex(ctx context.Context, epoch abi.ChainEpoch, backfill bool) (*types.IndexValidation, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we did this all in a single transaction, would that remove the need for a write lock?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, transaction provides atomicity, not exclusivity. Imagine this scenario:

  1. Backfill is called for epoch e -> it reads tipset ts at that epoch
  2. Revertexecutes for ts and marks the tipset as reverted in TX1
  3. Backfill executes for ts and marks it as applied in TX2

TX1 is committed first and then TX2 is comitted.

Basically, it's hard to reason about state here without this lock.

Copy link
Contributor Author

@aarshkshah1992 aarshkshah1992 Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rvagg

One work-around here it to only allow backfilling for "safe" epochs i.e. <=head - 30.

Then, the rewrite can only ever happen in an edge case.

chain/index/api.go Outdated Show resolved Hide resolved
chain/index/api.go Outdated Show resolved Hide resolved
chain/index/ddls.go Outdated Show resolved Hide resolved

// fetch the non-reverted tipset at this epoch
var indexedTsKeyCidBytes []byte
err = si.stmts.getNonRevertedTipsetAtHeightStmt.QueryRowContext(ctx, epoch).Scan(&indexedTsKeyCidBytes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see the LIMIT 1 below if you're only expecting one result

}

return &types.IndexValidation{
TipsetKey: expectedTs.Key().String(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see below, I think just the key would be good here, also note capitalisation of TipSetKey

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return &types.IndexValidation{
TipsetKey: ts.Key().String(),
Height: uint64(ts.Height()),
Backfilled: true,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: other fields

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

* feat: finish todos of validation api

* feat: add indexed data verification with chain store

* feat: address comments and finish TODO

* fix: build issue

* address comments

* fix: ci issue
@aarshkshah1992
Copy link
Contributor Author

Thanks for all your work here @akaladarshi 👍

NonRevertedEventsCount uint64
Backfilled bool
IndexedMessagesCount uint64
IndexedEventsCount uint64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we rename this to IndexedNonRevertedMsgCount and IndexedNonRevertedEventsCount to be precise?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akaladarshi That is an implementation detail that will confuse users. They're only asking for the indexed entries at a specific epoch here and so ofcourse expect only non-reverted tipsets back.

aarshkshah1992 and others added 4 commits September 18, 2024 15:16
* feat: add lotus-shed command for backfilling chain indexer

* feat: add lotus-shed command for inspecting the chain indexer

* feat: use single lotus-shed command to inspect and backfill

* fix: remove the unused queries

* small changes

* add change log
@aarshkshah1992 aarshkshah1992 marked this pull request as ready for review September 19, 2024 09:10
@aarshkshah1992
Copy link
Contributor Author

@BigLep This is now ready for review. Working on a user guide for this tooling as discussed offline.

@aarshkshah1992
Copy link
Contributor Author

@rvagg This is now in a good shape and has been tested on a calibnet node. Please take a look when you get the time.

_, _ = fmt.Fprintf(cctx.App.Writer, "chain index validation progress: %.2f%%\n", progress)
}

indexValidateResp, err := api.ChainValidateIndex(ctx, abi.ChainEpoch(epoch), backfill)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aarshkshah1992 what will happen if user provided chainHead as the fromEpoch?

Copy link
Contributor

@akaladarshi akaladarshi Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getNextTipset will error out, I think.

backfilledEpochs = append(backfilledEpochs, epoch)
}

if output {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also log the result of each epoch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, didn't see the new changes.

_, _ = fmt.Fprintf(cctx.App.Writer, "✓ Epoch %d (%s)\n", epoch, string(jsonData))
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a log for completion as well.

I think we can also print how many epochs were success out of total epoch or something like that.

if strings.Contains(err.Error(), "retry") {
log.Warnf("epoch %d; failed to validate index with re-tryable error: %s. retrying...", epoch, err)
epoch++ // Increment epoch to retry the same epoch
continue
Copy link
Contributor

@akaladarshi akaladarshi Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we sleep here for sometime because retry will happen due to re-orgs and that might take sometime?.

Comment on lines +121 to +137
// is it a null round ?
if indexValidateResp.IsNullRound {
if logGood {
_, _ = fmt.Fprintf(cctx.App.Writer, "✓ Epoch %d; null round\n", epoch)
}
continue
}

// log success
if logGood {
jsonData, err := json.Marshal(indexValidateResp)
if err != nil {
return fmt.Errorf("failed to marshal results to JSON: %w", err)
}

_, _ = fmt.Fprintf(cctx.App.Writer, "✓ Epoch %d (%s)\n", epoch, string(jsonData))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// is it a null round ?
if indexValidateResp.IsNullRound {
if logGood {
_, _ = fmt.Fprintf(cctx.App.Writer, "✓ Epoch %d; null round\n", epoch)
}
continue
}
// log success
if logGood {
jsonData, err := json.Marshal(indexValidateResp)
if err != nil {
return fmt.Errorf("failed to marshal results to JSON: %w", err)
}
_, _ = fmt.Fprintf(cctx.App.Writer, "✓ Epoch %d (%s)\n", epoch, string(jsonData))
}
// log success
if logGood {
if indexValidateResp.IsNullRound {
_, _ = fmt.Fprintf(cctx.App.Writer, "✓ Epoch %d; null round\n", epoch)
} else {
jsonData, err := json.Marshal(indexValidateResp)
if err != nil {
return fmt.Errorf("failed to marshal results to JSON: %w", err)
}
_, _ = fmt.Fprintf(cctx.App.Writer, "✓ Epoch %d (%s)\n", epoch, string(jsonData))
}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ⌨️ In Progress
Development

Successfully merging this pull request may close these issues.

4 participants