Skip to content

Commit

Permalink
Add reference implementation for Flat state to resharding NEP (#575)
Browse files Browse the repository at this point in the history
  • Loading branch information
Trisfald authored Nov 15, 2024
1 parent 65ece27 commit 615a92f
Showing 1 changed file with 85 additions and 0 deletions.
85 changes: 85 additions & 0 deletions neps/nep-0568.md
Original file line number Diff line number Diff line change
Expand Up @@ -225,6 +225,91 @@ During a resharding event, at the boundary of the epoch, when we need to split t

![Hybrid MemTrie diagram](assets/nep-0568/NEP-HybridMemTrie.png)

### State Storage - Flat State

Resharding Flat State is a time consuming operation and it runs in parallel with block processing for several block heights.
Thus, there are a few important aspects to consider during implementation:
- Flat State's own status should be resilient to application crashes.

Check failure on line 232 in neps/nep-0568.md

View workflow job for this annotation

GitHub Actions / markdown-lint

Unordered list style [Expected: asterisk; Actual: dash]

neps/nep-0568.md:232:1 MD004/ul-style Unordered list style [Expected: asterisk; Actual: dash]

Check failure on line 232 in neps/nep-0568.md

View workflow job for this annotation

GitHub Actions / markdown-lint

Lists should be surrounded by blank lines [Context: "- Flat State's own status shou..."]

neps/nep-0568.md:232 MD032/blanks-around-lists Lists should be surrounded by blank lines [Context: "- Flat State's own status shou..."]
- The parent shard's Flat State should be split at the correct block height.

Check failure on line 233 in neps/nep-0568.md

View workflow job for this annotation

GitHub Actions / markdown-lint

Unordered list style [Expected: asterisk; Actual: dash]

neps/nep-0568.md:233:1 MD004/ul-style Unordered list style [Expected: asterisk; Actual: dash]
- New shards' Flat States should eventually converge to same representation the chain is using to process blocks (MemTries).

Check failure on line 234 in neps/nep-0568.md

View workflow job for this annotation

GitHub Actions / markdown-lint

Unordered list style [Expected: asterisk; Actual: dash]

neps/nep-0568.md:234:1 MD004/ul-style Unordered list style [Expected: asterisk; Actual: dash]
- Resharding should work correctly in the presence of chain forks.

Check failure on line 235 in neps/nep-0568.md

View workflow job for this annotation

GitHub Actions / markdown-lint

Unordered list style [Expected: asterisk; Actual: dash]

neps/nep-0568.md:235:1 MD004/ul-style Unordered list style [Expected: asterisk; Actual: dash]
- Retired shards are cleaned up.

Check failure on line 236 in neps/nep-0568.md

View workflow job for this annotation

GitHub Actions / markdown-lint

Unordered list style [Expected: asterisk; Actual: dash]

neps/nep-0568.md:236:1 MD004/ul-style Unordered list style [Expected: asterisk; Actual: dash]

Note that the Flat States of the newly created shards won't be available until resharding is completed. This is fine because the temporary MemTries are
built instantly and they can satisfy all block processing needs.

The main component responsible to carry out resharding on Flat State is the [FlatStorageResharder](https://github.com/near/nearcore/blob/f4e9dd5d6e07089dfc789221ded8ec83bfe5f6e8/chain/chain/src/flat_storage_resharder.rs#L68).

#### Flat State's status persistence

Check failure on line 243 in neps/nep-0568.md

View workflow job for this annotation

GitHub Actions / markdown-lint

Headings should be surrounded by blank lines [Expected: 1; Actual: 0; Below] [Context: "#### Flat State's status persistence"]

neps/nep-0568.md:243 MD022/blanks-around-headings/blanks-around-headers Headings should be surrounded by blank lines [Expected: 1; Actual: 0; Below] [Context: "#### Flat State's status persistence"]
Every shard Flat State has a status associated to it and stored in the database, called `FlatStorageStatus`. We propose to extend the existing object
by adding the new enum variant named `FlatStorageStatus::Resharding`. This approach has two benefits. First, the progress of any Flat State resharding is
persisted to disk, which makes the operation resilient to a node crash or restart. Second, resuming resharding on node restart shares the same code path as Flat
State creation (see `FlatStorageShardCreator`), reducing the code duplication factor.

`FlatStorageStatus` is updated at every committable step of resharding. The commit points are the following:
- Beginning of resharding or, in other words, the last block of the old shard layout.

Check failure on line 250 in neps/nep-0568.md

View workflow job for this annotation

GitHub Actions / markdown-lint

Unordered list style [Expected: asterisk; Actual: dash]

neps/nep-0568.md:250:1 MD004/ul-style Unordered list style [Expected: asterisk; Actual: dash]

Check failure on line 250 in neps/nep-0568.md

View workflow job for this annotation

GitHub Actions / markdown-lint

Lists should be surrounded by blank lines [Context: "- Beginning of resharding or, ..."]

neps/nep-0568.md:250 MD032/blanks-around-lists Lists should be surrounded by blank lines [Context: "- Beginning of resharding or, ..."]
- Scheduling of the _"split parent shard"_ task.

Check failure on line 251 in neps/nep-0568.md

View workflow job for this annotation

GitHub Actions / markdown-lint

Unordered list style [Expected: asterisk; Actual: dash]

neps/nep-0568.md:251:1 MD004/ul-style Unordered list style [Expected: asterisk; Actual: dash]
- Execution, cancellation or failure of the _"split parent shard"_ task.
- Execution or failure of any _"child catchup"_ task.

#### Splitting a shard Flat State

When, at the end of an epoch, the shard layout changes we identify a so called _resharding block_ that corresponds to the last block of the current epoch.
A task to split the parent shard's Flat State is scheduled to happen after the _resharding block_ becomes final. The reason to wait for the finality condition
is to avoid a split on a block that might be excluded from the canonical chain; needless to say, such situation would lock the node
into an erroneous state.

Inside the split task we iterate over the Flat State and copy each element into either child. This routine is performed in batches in order to lessen the performance
impact on the node.

Finally, if the split completes successfully, the parent shard Flat State is removed from the database and the children Flat States enter a catch-up phase.

One current technical limitation is that, upon a node crash or restart, the _"split parent shard"_ task will start copying all elements again from the beginning.

A reference implementation of splitting a Flat State can be found in [FlatStorageResharder::split_shard_task](https://github.com/near/nearcore/blob/fecce019f0355cf89b63b066ca206a3cdbbdffff/chain/chain/src/flat_storage_resharder.rs#L295).

#### Values assignment from parent to child shards
Key-value pairs in the parent shard Flat State are inherited by children according to the rules stated below.

Elements inherited by the child shard which tracks the `account_id` contained in the key:
- `ACCOUNT`
- `CONTRACT_DATA`
- `CONTRACT_CODE`
- `ACCESS_KEY`
- `RECEIVED_DATA`
- `POSTPONED_RECEIPT_ID`
- `PENDING_DATA_COUNT`
- `POSTPONED_RECEIPT`
- `PROMISE_YIELD_RECEIPT`

Elements inherited by both children:
- `DELAYED_RECEIPT_OR_INDICES`
- `PROMISE_YIELD_INDICES`
- `PROMISE_YIELD_TIMEOUT`
- `BANDWIDTH_SCHEDULER_STATE`

Elements inherited only be the lowest index child:
- `BUFFERED_RECEIPT_INDICES `
- `BUFFERED_RECEIPT`

#### Bring children shards up to date with the chain's head
Children shards' Flat States build a complete view of their content at the height of the `resharding block` sometime during the new epoch
after resharding. At that point in time many new blocks have been processed already, and these will most likely contain updates for the new shards. A catch-up step is necessary to apply all Flat State deltas accumulated until now.

This phase of resharding doesn't have to take extra steps to handle chain forks. On one hand, the catch-up task doesn't start until the parent shard
splitting is done, and at such point we know the `resharding block` is final; on the other hand, Flat State deltas are capable of handling forks automatically.

The catch-up task commits to the database "batches" of Flat State deltas. If the application crashes or restarts the task will resume from where it left.

Once all Flat State deltas are applied, the child shard's status is changed to `Ready` and clean up of Flat State deltas leftovers is performed.

A reference implementation of the catch-up task can be found in [FlatStorageResharder::shard_catchup_task](https://github.com/near/nearcore/blob/fecce019f0355cf89b63b066ca206a3cdbbdffff/chain/chain/src/flat_storage_resharder.rs#L564).

#### Failure of Flat State resharding

In the current proposal any failure during Flat State resharding is considered non-recoverable.
`neard` will attempt resharding again on restart, but no automatic recovery is implemented.

### State Storage - State mapping

To enable efficient shard state management during resharding, Resharding V3 uses the `DBCol::ShardUIdMapping` column.
Expand Down

0 comments on commit 615a92f

Please sign in to comment.