Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace Index File Backfill with Bitmap Indexer Backfill #744

Closed
Tracked by #740
darunrs opened this issue May 17, 2024 · 1 comment · Fixed by #803
Closed
Tracked by #740

Replace Index File Backfill with Bitmap Indexer Backfill #744

darunrs opened this issue May 17, 2024 · 1 comment · Fixed by #803
Assignees

Comments

@darunrs
Copy link
Collaborator

darunrs commented May 17, 2024

Fully replace Index file backfill with Bitmap query backfill. This involves switching out the delta lake get_matching_block_heights function call with the BlockHeightStream function of the same name. The yield of the BlockHeightStream call is a single block height instead of a full list, so some behavior of backfill could change, potentially for the better. In addition, the first iteration of the new backfill may have some performance problems. Thus, the performance needs to be evaluated and any less complex improvements should be made (Such as preloading the next day's bitmaps while yielding the current day).

@darunrs darunrs self-assigned this May 17, 2024
@darunrs darunrs changed the title Integrate grphQL querying into Block height Stream Query Bitmap Indexer for Bitmaps in Stream May 22, 2024
@darunrs darunrs changed the title Query Bitmap Indexer for Bitmaps in Stream Replace Index File Backfill with Bitmap Indexer Backfill Jun 11, 2024
@darunrs
Copy link
Collaborator Author

darunrs commented Jun 22, 2024

Leaving Context here regarding #803 before I go on a 2 week Leave.

I've migrated the code to use the new production indexer. I reran tests and it all still passes. There's a couple things I have not done explicitly which I'd like to do for manual testing, before releasing to dev.

  • I want to add a unit test or at least manually test once that the pagination works correctly. To do this, we'd need to query at least 1000 receivers. Maybe *near (Not *.near) would work.
  • Run a backfill on all indexers simultaneously and see if it impacts the performance of the indexer while its running in prod.
  • Get feedback from Morgan so he can call out any improvements to the implementation.
  • Doing a harder backfill from as early a block as possible

There's various callouts of unused code by Cargo, which are definitely false. For example, that the new() constructor for CompressedBitmap, or StreamExt. Not sure what to do about those.

For next steps, the main performance problem is that we only query graphql next when we exhaust the current stream of block heights. I would like to prefetch the next stream while yielding the current one. In many cases, the yielding of the block will happen extremely fast, so there's wait time between block dates. Otherwise, it's mainly optimizing the indexer for space or query speed.

For a quick explanation of the algorithm:
We store base64 encoded bitmaps in the indexer. Decoded, they look like this:
[Sign Bit][EG of SIgn Bit][EG of Other Bit][EG of Sign Bit]... An example would be: 1 00101 0001110

We encode the EG for compressing consecutive bits down. Essentially encoding X number of Y value bits instead of having Y bits X times. The Elias Gamma of a single bit is the bit itself. So no compression is done. Otherwise, it takes the following form:
[N number of 0s]1[binary of remainder, of length N]. If there are 10 consecutive 1s for example, we find the largest power of 2 which fits inside the number. In this case its 8 so the power is 3. We write three zeros: 000. Then we write a 1, to indicate the 0s portion of the EG is done: 0001. Finally, we enncode the remainder (10 - 8 = 2) as binary, padded left such that the length of the binary is N. So, in this case, its 010. Combined, we get 000 1 010. This is done repeatedly for the other bit value until the end is reached.

The first bit in the bitmap is special as it merely encodes what sign the first elias gamma encodes. The remaining EGs can be guessed as to what they encode by counting them. We do explicitly write what bit value the last EG is encoding as when we want to append a bit to the end of the bitmap, we can decompress only the last EG instead of all of them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants