This repo contains implementation of sketching algorithms for size of join estimation. The update performance of sketches can be significantly improved if only a sample of the data is sketched, without significant degradation in the accuracy. In this repo, Bernoulli sampling is used. For details of the sampling algorithms and sketching techniques, please checkout the references page.
If you are using Mac, follow these steps:
- launch the terminal
- run
ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" < /dev/null 2> /dev/null
- run
brew install gsl
For other systems, please checkout the documentation on GSL
- run
make
- run
./sketch_bernoulli_sampling.out
followed by the following parameters:
dom_size
tuples_no
buckets_no
rows_no
DIST_PARAM
DIST_SHUFF
SAMP_PROB
num_runs
For details of corresponding parameters, please checkout the documentation at GitHub Wiki
- run
make clean
to remove all intermediate files.