Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Implement Block Streamer Bitmap Operations #747

Merged
merged 11 commits into from
Jun 4, 2024

Conversation

darunrs
Copy link
Collaborator

@darunrs darunrs commented May 24, 2024

The bitmap indexer will return a list of bitmaps in the form of base 64 strings, and associated start block heights. We need a way to convert all that data into a single block height and an associated bitmap.

This PR introduces a new BitmapOperator class which holds all the operations necessary to perform the function of returning a combined binary bitmap with the lowest start block height as index 0.

@darunrs darunrs marked this pull request as ready for review May 24, 2024 19:51
@darunrs darunrs requested a review from a team as a code owner May 24, 2024 19:51
Copy link
Collaborator

@morgsmccauley morgsmccauley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I have no idea what's going on here, mainly due to unfamiliarity with the algorithm used. It will take some time for me to understand, so rather than block you, I've provided the best review I can with my limited knowledge. I'm trusting that you know what's going on 😄.

If it's possible, it would be good to break up the code up in to more well-defined/cohesive methods, but that doesn't need to happen here.

There are a lot of clippy errors, could you run cargo clippy and fix please?

@@ -33,6 +33,7 @@ tonic = "0.10.2"
wildmatch = "2.1.1"

registry-types = { path = "../registry/types" }
base64 = "0.22.1"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you place with the other versioned imports please?

Self {}
}

pub fn get_bit(&self, byte_array: &[u8], bit_index: usize) -> bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub fn get_bit(&self, byte_array: &[u8], bit_index: usize) -> bool {
pub fn get_bit(&self, bytes: &[u8], bit_index: usize) -> bool {

Nit: bytes seems sufficient here?

(byte_array[byte_index] & (1u8 << (7 - bit_index_in_byte))) > 0
}

fn set_bit(&self, byte_array: &mut [u8], bit_index: usize, bit_value: bool, write_zero: bool) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we take both bit_value and write_zero? Would bit_value alone be sufficient?

Copy link
Collaborator Author

@darunrs darunrs Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is necessary because we sometimes want to write the 0 over a 1, when we usually don't want to. Specifically when we are replacing one Elias Gamma encoding over another, as the length might be shorter (leaving extra 1's that should be zero). Technically we don't need it in the current code, but I ported it over as its exactly how we have it in the indexer logic.

}
}

fn get_number_between_bits(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is number? Can we be more explicit?

Copy link
Collaborator Author

@darunrs darunrs Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm basically we encoded a number as binary and are simply reading the binary value from a particular stretch of bits. Perhaps I can rename this to read_integer_from_binary? Even though all our functions deal with binary, maybe this can explicitly state this binary is utilized to build an integer.

&self,
byte_array: &[u8],
start_bit_index: usize,
) -> anyhow::Result<usize> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
) -> anyhow::Result<usize> {
) -> Option<usize> {

Option seems more idiomatic here

Comment on lines 85 to 88
return EliasGammaDecoded {
value: 0,
last_bit_index: 0,
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return EliasGammaDecoded {
value: 0,
last_bit_index: 0,
};
return EliasGammaDecoded::default()

Could use Default here, but you'll need to derive it on EliasGammaDecoded

Copy link
Collaborator Author

@darunrs darunrs Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea actually. It would definitely make the match look nicer too. It seems the default for usize is 0 anyway.

fn decompress_bitmap(&self, compressed_bitmap: &[u8]) -> Vec<u8> {
let compressed_bit_length: usize = compressed_bitmap.len() * 8;
let mut current_bit_value: bool = (compressed_bitmap[0] & 0b10000000) > 0;
let mut decompressed_byte_array: Vec<u8> = Vec::new();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know the length of this upfront? Vec::with_capacity() would avoid unnecessary re-allocations.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we knew capacity, maybe we could just use &[u8] 🤔

Copy link
Collaborator Author

@darunrs darunrs Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't know the size upfront. We need to decompress the EG to know how long the bit sequence is for each EG, and we can have many of them. We do know the upper bound, which is 86000 bits, since 1 bit per block and 86000 seconds in a day. But, I felt it was unnecessary to create 12KB byte arrays every time as we usually don't need that many.

decompressed_byte_array
}

fn merge_compressed_bitmap_into_base_bitmap(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between base_bitmap and compressed_bitmap? Maybe this would be more obvious if we just had a merge function, and called decompress from the outside?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge could even be defined on the Bitmap struct instead for further clarity

Copy link
Collaborator Author

@darunrs darunrs Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the confusion is that it is doing two different things. Decompression, and merging. Before decompression, it matters which bitmap is the compressed one as we want to ensure bits re written to the decompressed one. But if the bitmaps are both decompressed, this is no longer an issue.

I think the better way to go forward is creating a merge_bitmap function like you mentioned but keep it in BitmapOperator. Then we do a three step sequence in the public get_merged_bitmap function: decode, decompress, merge. This I think would be clear while retaining BitmapOperator as a stateless utility class. I'm a little confused with how a Bitmap struct function would perform merge. I imagine it would need to have a BitmapOperator internally. I think it might make things confusing regarding who owns these data operator functions.

If there's a more clear way to structure this class, I'm happy to rework it when you're back!

Ok(())
}

pub fn get_merged_bitmap(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub fn get_merged_bitmap(
pub fn merge_bitmaps(

Nit: this seems more clear?

Copy link
Collaborator Author

@darunrs darunrs Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! I was originally thinking of merge_compressed_bitmaps but maybe its not worth requiring someone to know they're compressed before calling the function? Especially since the argument type is Base64Bitmap which should only really be received form the graphQL query.

use super::*;

#[test]
fn test_getting_bit_from_array() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fn test_getting_bit_from_array() {
fn getting_bit_from_array() {

Nit: test_ seems superfluous here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true. I'll reword the test names.

@darunrs darunrs merged commit 989c432 into main Jun 4, 2024
8 checks passed
@darunrs darunrs deleted the block-streamer-bitmap-operations branch June 4, 2024 20:21
@darunrs darunrs linked an issue Jun 4, 2024 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement Bitmap Operations into Block Streamer
2 participants