-
Notifications
You must be signed in to change notification settings - Fork 146
Adopt the Turbo-Geth database layout #779
Comments
What Turbo Geth doesThe Turbo Geth database consists of a few sections:
I believe that Turbo Geth only stores the leaf nodes for each trie, it keeps all the intermediate nodes for the most recent trie in memory. For instance, the account storage section only stores I believe that account history is stored as Turbo Geth calls their history store "reverse differences". It means pruning can be supported by dropping all keys |
TurbotrieFrom an email thread with someone else working on this problem. Leaving out some details and their identity in case they don't want this to be made public yet. turbotrie has a couple other improvements based on simplifying the data model and being more space efficient. In terms of layout, nodes are stored as: |
Should we store intermediate nodes?
|
Which changes will need to be made?I believe that many of the changes will need to be made to py-evm, maybe this issue should be transferred there.
|
What's the MVP?This is going to be a large undertaking, what's the smallest useful change which can be tested to work and then expanded upon? Jason suggested running both databases in parallel. I can implement some subset of functionality, such as storing reverse diffs of the account history, and have all calls persist both the old format and the new format. Reads also read from both databases and fail if any discrepancies are noticed. This allows starting with some subset of the functionality and then expanding it to include everything the old format supported. Potential starting points:
Add an |
Some Questions
|
Some RPCs which involve inspecting the account / state history
Special Bonus
|
I'm very ok with intermediate solutions that fail to support some of the JSON-RPC methods if the trade-off means a functional client sooner. |
Yeah, I absolutely agree! My hope is that there's a small core of functionality here that can be implemented quickly and then expanded upon. I agree that some features (like |
Storing Account historyAlmost all the RPCs which require looking into history require a lookup on The leveldb docs state that forward range queries are "somewhat" faster than backward range queries. It's worth experimenting to see how drastic the difference is, but that probably means the common case should be scanning forward from This means Storing reverse diffs means that pruning the oldest blocks is relatively simple, though it makes rewinding blocks a little complicated. |
Reading and Writing older older statesThis might be the most complicated operation. The problem with If the block to be imported (the importing block) is a child of a block in the canonical chain, It's more complicated if the block to be imported is a child of a block which is also not canonical. In that case trinity will have to find the latest ancestor which is part of the canonical chain, and then lookup and apply the state transitions for all ancestor non-canonical blocks (all of this in-memory), and finally compute the state transitions for the block to be imported. If this block is the new canonical block then the database will have to be rewound and the state transitions from the newly canonical blocks applied. (A diagram is probably appropriate here) This leaves a few questions:
I believe that this requires an index of Rewinding the state for a single account is easy, however rewinding the state for all accounts is difficult unless you already know which accounts need to be changed, and for this |
A simpler alternative which drops support for RPCsAll of the above is pretty complicated, and it's complicated because of the
importing non-canonical blocks still works the same way, an in-memory view of the database is created which undoes and applies transitions to the current state until it's ready for the importing block to be run. To make very old states easier to reach checkpoints could be created which materialize the entire state at different points along the chain. |
The current stateIt's possible to quickly serve single-account requests using the |
Current PlanPhase 0: Preparation
Phase 1: Current Account State
In more detail:
Phase 2: Current Storage Item State
Phase 3: Reverse account diffs
Phase 4:
Some open items:
|
Getting back onto this train, I still think it's important for Trinity also also for Ethereum 1.x that we give adopting the TurboGeth layout a real shot. Saving block diffs is mostly finished, though it needs some more thorough testing. Today I started working on the current account state database, I think I have a prototype, now I need to write a bunch of tests for it. Some open questions:
|
The current account database, when hooked up, passes all core tests and also all fixtures. Though I should write some more tests! Some things left on my list:
|
Serving
|
The current account database is nearing completion. Some things which now work:
Some next steps:
A todo list with smaller items:
|
Exciting! Do you have a wip branch online (the linked one doesn't seem to be up to date)? |
Seconding what @cburgdorf said. Can you make sure your work is available for us to look at. Visibility into in-progress work is important! |
Yeah, sorry, I thought I had already posted it @cburgdorf ! Here's a compare view for the branch I've been using: ethereum/py-evm@master...lithp:lithp/turbo-account-state It's still very much so in "make it work" mode, this will have to be cleaned up a lot before it can be merged in, but here it is! |
@lithp you should open a "Draft" PR for that so that it is more easily findable. |
Here's the draft PR: ethereum/py-evm#1879 |
What is wrong?
In order for Trinity to benefit from the improved sync speed Firehose makes possible it needs to change the way it lays out data on-disk. Specifically, account and storage data is currently stored as
sha(object) -> object
, which means that syncing requires a lot of random reads and writes.How can it be fixed
I'm not sure what the best approach is, I'll add some comments below exploring the options.
To limit scope I'm going to try to stick to using leveldb, I think that switching it out for a different database might lead to further performance improvements but would drastically increase the amount of work required.
Some requirements & opportunities
The new layout needs to be able to handle re-orgs, meaning it needs to be able to rewind blocks. This requires that some kind of history is kept.
The new layout needs to be able to quickly generate chunks of leaves and serve them to clients, so other clients will be able to quickly sync.
In order to be a good citizen on the network the new layout needs to be able to quickly serve
eth/63
andles/2
. In particular, requests likeGetNodeData
require keeping around a materialized trie or maybe more. Alternatively, we might want to suggest aeth/64
which does nothing but replaceGetNodeData
with something easier to serve.The new layout also needs to support quick responses to all the JSON-RPCs.
Currently Trinity runs as an archive node and doesn't perform any pruning. In fact,
sha(object) -> object
makes pruning very difficult because it naturally de-dupes, and the only way to know that you've removed the last reference to an object is to do a very expensive garbage collection pass of the entire database. Ideally the new layout would make pruning easier so that trinity won't have to save everything.Trie nodes in random order are about the least compressible disk pages possible. Storing all the accounts next to other is likely to improve the compression ratio of the database (leveldb compresses on a per-page basis)
The text was updated successfully, but these errors were encountered: