Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Select vs. ranked for succinct counting blocked bloom filter #28

Open
fredpflintstone opened this issue Sep 23, 2021 · 2 comments
Open

Comments

@fredpflintstone
Copy link

What is the difference between the select version and the ranked version? Thanks.

@thomasmueller
Copy link
Contributor

Yes, documentation is needed... I'm currently away, and won't have time this week sorry. But next week should be ok.

@thomasmueller
Copy link
Contributor

thomasmueller commented Oct 5, 2021

So, the "succinct counting blocked Bloom filter", as the name implies, is a blocked Bloom filter, and at the same time a counting Bloom filter. The data structure contains 3 areas:

  • First area is identical to the blocked Bloom filter (which makes lookup identical to blocked Bloom filter).
  • Second area succinctly counts which bits in the first area are set how many times (counting part).
  • Third area is overflow area, in case counts overflow somewhere.

There are two ways to succinctly count how often the bits in the first area were set: using "rank", and using "select".

Using "select" may be a bit easier to explain. Say, in the Bloom filter area, we have a block of 64 bits as follows:

00000000 00000000 00000000 00000000 00000000 00000000 00000001 00000001

So only bit 0 and bit 8 are set. The question is, how often (the count). This is stored in a block of 64 bits in the second area. Only bits that are set in the first area need to be counted, so we only need 2 counters. Those are encoded in the number of zeros between two bit that are set. E.g. for counts 1 (for Bloom filter bit 0) and 3 (for Bloom filter bit 8), it is:

00000000 00000000 00000000 00000000 00000000 00000000 00000000 00001001

If we increment the count for bit 0 from 1 to 2, we shift all the bits to the left, and get:

00000000 00000000 00000000 00000000 00000000 00000000 00000000 00010010

If we decrement the count for bit 0 from 1 to 0, we reset the low bit and shift, so we get:

00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000100

This was for "select". (I called it "select" because the count is read using the "select" operation).

"Rank" is similar to a https://en.wikipedia.org/wiki/Hash_array_mapped_trie. For "rank", we count the number of bits that are set in the Bloom filter area (in this case it's 2). So the lowest 2 bits are used to say whether the count is higher than 1. For those that are set, we need one more bit in this area, to the left, until there are none left. For counts 1 (for Bloom filter bit 0) and 3 (for Bloom filter bit 8), it is:

00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000110

If we increment the count for bit 0 from 1 to 2, we set the last bit and shift, to get:

00000000 00000000 00000000 00000000 00000000 00000000 00000000 00001011

Both "rank" and "select" use roughly the same space (I think rank uses a bit less on average).

The overflow area is needed if a block in the counter area grows too large, in which case a pointer to the overflow area is stored, and the highest bit is set.

Well I see it is very, very hard to explain, and understand. I'm afraid I need to give up here... In order to explain this, more time, and more examples are needed...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants