Skip to content

Latest commit

 

History

History
 
 

evaluation

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

Evaluation

In order to verify the accuracy of the HyperLogLog implementations we performed two experiments.

In the first experiment we generated 500 files containing 1e5 randomly generated numbers each. The numbers were added to the sketch and the cardinality was estimated with each addition. The relative error is displayed in figures 1, 3.

In the second experiment we added 1e9 randomly generated numbers to the sketch. We measured cardinality estimations at predefined (actual) cardinalities which followed a geometric series with ratio 1.007. The relative error for 500 runs is displayed in figures 2, 4.

HyperLogLog++ displays lower relative error compared to HyperLogLog in small cardinalities because the sparse representation of the sketch is used which allows us to perform estimations with much higher precision.

HyperLogLog

hyperloglog-figure-0

hyperloglog-figure-1

HyperLogLog++

hyperloglogplus-figure-0 hyperloglogplus-figure-1

Usage

To reproduce the results follow the steps in exp.py.