Skip to content
This repository has been archived by the owner on Jan 12, 2021. It is now read-only.

HyperLogLog.add() have a return value. #59

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

pocorall
Copy link

HyperLogLog.add() returns true when the buckets are updated. It has nice use cases for caching or persistence.

@facebook-github-bot
Copy link

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please sign up at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need the corporate CLA signed.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

@pocorall
Copy link
Author

OK, I signed it.

@facebook-github-bot
Copy link

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

@pocorall
Copy link
Author

Protected access of the class members enables some useful subclass extensions like custom bucket format. I use base64 encoded String for buckets, and directly access to byte[] buckets will be more efficient than using int[] buckets().

@jesboat
Copy link
Contributor

jesboat commented Sep 4, 2020

If previous == 0 and lowestBitPosition == 0, then the code does nonZeroBuckets++ but would still return false. I don't know HLL well enough to say whether previous = lowestBitPosition = 0 can happen or whether the behavior of nonZeroBuckets++ modifies the HLL state in a way that add 'should' return true.

@pocorall
Copy link
Author

pocorall commented Sep 4, 2020

We use HyperLogLog to store view count of user article. buckets represents internal representation of the count, which is persisted in DB, and estimate() converts buckets to a number. nonZeroBuckets used for estimate() to speedup count non zero buckets, in other words, it totally depends on buckets. So we only care for change of buckets.

@pocorall
Copy link
Author

pocorall commented Sep 4, 2020

After I reviewed the HLL paper, I found that the updating of nonZeroBuckets in add() function should be moved into if (lowestBitPosition > previous) { ... } part, otherwise nonZeroBuckets quickly grow larger than buckets.length without changing buckets itself.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants