Releases: gakhov/pdsa
Implement frequency algorithms
NEW
- Count Sketch algorithm implementation
- Count-Min Sketch algorithm implementation
Count Sketch and Count–Min Sketch are simple space-efficient probabilistic data structures
that are used to estimate frequencies of elements in data streams and can address the Heavy hitters problem.
Count Sketch was proposed by Moses Charikar, Kevin Chen, and Martin Farach-Colton in 2002.
Count–Min Sketch was presented in 2003 by Graham Cormode and Shan Muthukrishnan and published in 2005.
Maintenance release
FIXES
- Fix README to indicate that the minimal required Cython version is 0.28 (Thank you @jseabold)
- Explicitly require Cython 0.28+ and Python 3.5+ in
setup.py
- Fix cardinality tests (or at least come one step closer to the correct test approach without actual big data)
Implemented HyperLogLog algorithm
NEW
- HyperLogLog algorithm implementation
HyperLogLog algorithm was proposed by Philippe Flajolet, Éric Fusy, Olivier Gandouet, and Frédéric Meunier in 2007. There is a number of modifications of the algorithm, but this implementation uses the original version with a 32-bit hash function.
FIXES
- Fix cardinality estimation in Bloom filters
- Correct the inline example for q-digest
- Removed support for Python < 3.5
Implemented algorithms for cardinality and rank estimation
In this pre-release implemented the following algorithms:
Cardinality problem
- Linear counter
- Probabilistic counter (Flajolet–Martin algorithm)
Rank problem
- Quantile Digest (q-digest)
Additionally, the overall code has been improved, added tests and examples how to use the implemented data structures.
Implement classical data structures for Membership queries
In this release first 2 classical data structures have been implemented:
- Classical Bloom Filter
- Counting Bloom Filter
In order to support memory-efficient representation we also developed the BitCounter and BitVector.