HyperLogLog-based set cardinality estimation library
Cardinality Estimation is a library to estimate the number of unique elements in a set, in a quick and memory-efficient manner. It’s based on the following:
- Flajolet et al., “HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm”, DMTCS proc. AH 2007, http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf
- Heule, Nunkesser and Hall 2013, “HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm”, http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/pubs/archive/40671.pdf
The accuracy/memory usage are user selectable. Typically, a cardinality estimator will give a perfect estimate of small cardinalities (up to 100 unique elements), and 97% accuracy or better (usually much better) for any cardinality up to near 2^64, while consuming several KB of memory (no more than 16KB).
How to use CardinalityEstimator
Usage is very simple:
ICardinalityEstimator<string> estimator = new CardinalityEstimator();
estimator.Add("Alice");
estimator.Add("Bob");
estimator.Add("Alice");
estimator.Add("George Michael");
ulong numberOfuniqueElements = estimator.Count(); // will be 3
Source Code
The library is available as an open-source project at GitHub: https://github.com/saguiitay/CardinalityEstimation
Nuget Package
This code is available as the Nuget package CardinalityEstimation
. To install, run the following command in the Package Manager Console:
Install-Package CardinalityEstimation
Version History
- Version 1.11:
- Remove usage of LINQ
- Cosmetic changes
- Changed Add to return bool
- Less sensitive UTs
- Add copy CTOR to CardinalityEstimator and implement IEquatable
- Version 1.10:
- Update to .Net Core 3.1
- Make direct counting option
- Version 1.9:
- Align all versions (.Net Core/Standard, signed/unsigned)
- Version 1.8:
- .NET Core support
- Fix merging bug.
- Version 1.7:
- .NET Standard conversion
- Version 1.6:
- Create signed version and fix deserialization bug
- Version 1.5:
- Expose BinaryReader/Writer in Serializer
- Version 1.4:
- Relax Merge to IEnumerable
- Version 1.3:
- Serializer allows keeping stream open
- Version 1.2.1:
- Fixed concurrency issue in Murmur3
- Version 1.2:
- Switch to Murmur3 hash for improved accuracy; added CountAdditions
- Version 1.1:
- Added a Serializer class