[Perf] Only use raw iterators with RocksDB and speed up ledger load #2561
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
Replaces #2518
Original message:
The signatures of iteration-related methods (with the rocksdb feature) have been haunting my heap profiles for a long time now, and while I initially believed this was something we could fine-tune with some configuration, all such attempts were unsatisfactory. It turns out that the "default" (higher-level) RocksDB iterators are just inherently inefficient, and only the DBRawIterator is able to avoid a massive number of allocations, some of which may remain lingering in the RSS and contribute to its inflation over time. The raw iterator is somewhat trickier to use correctly, but since it's what the regular iterator is built upon, there is nothing particularly novel about it, and we're already using it to count records.
With these changes I ran profiling with both an instrumented OS allocator and jemalloc and - contrary to the usual results - it was the latter that experienced larger relative differences when loading the ledger:
In addition, heaptrack ran 73% faster and produced a 99% smaller profile, which is very practical for future profiling needs.
Alongside these changes I realized that the current len_confirmed method - while working correctly and passing all tests - can be made more solid by including iterator validity checks; the only situation where this would truly be needed is if it was called on the very last map in the database and if the map was empty, but this commit guards against that edge case.
The final commit is a further improvement on one of the changes from #2515 (and made possible with the switch to raw iterators).
Test Plan
CI run link