Cardinality Estimation (HyperLogLog implemenation in C#)

Nuget Logo


HyperLogLog-based set cardinality estimation library

Cardinality Estimation is a library to estimate the number of unique elements in a set, in a quick and memory-efficient manner. It’s based on the following:

  1. Flajolet et al., “HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm”, DMTCS proc. AH 2007, http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf
  2. Heule, Nunkesser and Hall 2013, “HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm”, http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/pubs/archive/40671.pdf

The accuracy/memory usage are user selectable. Typically, a cardinality estimator will give a perfect estimate of small cardinalities (up to 100 unique elements), and 97% accuracy or better (usually much better) for any cardinality up to near 2^64, while consuming several KB of memory (no more than 16KB).

How to use CardinalityEstimator

Usage is very simple:

ICardinalityEstimator<string> estimator = new CardinalityEstimator();

estimator.Add("Alice");
estimator.Add("Bob");
estimator.Add("Alice");
estimator.Add("George Michael");

ulong numberOfuniqueElements = estimator.Count(); // will be 3

Source Code

The library is available as an open-source project at GitHub: https://github.com/saguiitay/CardinalityEstimation

Nuget Package

This code is available as the Nuget package CardinalityEstimation. To install, run the following command in the Package Manager Console:

Install-Package CardinalityEstimation

Version History

  • Version 1.11:
    • Remove usage of LINQ
    • Cosmetic changes
    • Changed Add to return bool
    • Less sensitive UTs
    • Add copy CTOR to CardinalityEstimator and implement IEquatable
  • Version 1.10:
    • Update to .Net Core 3.1
    • Make direct counting option
  • Version 1.9:
    • Align all versions (.Net Core/Standard, signed/unsigned)
  • Version 1.8:
    • .NET Core support
    • Fix merging bug.
  • Version 1.7:
    • .NET Standard conversion
  • Version 1.6:
    • Create signed version and fix deserialization bug
  • Version 1.5:
    • Expose BinaryReader/Writer in Serializer
  • Version 1.4:
    • Relax Merge to IEnumerable
  • Version 1.3:
    • Serializer allows keeping stream open
  • Version 1.2.1:
    • Fixed concurrency issue in Murmur3
  • Version 1.2:
    • Switch to Murmur3 hash for improved accuracy; added CountAdditions
  • Version 1.1:
    • Added a Serializer class