HyperLogLog – a near-optimal cardinality estimation algorithm
UPDATE: My implementation is no longer updated. I strong suggest you take a look at Microsoft’s CardinalityEstimator project, which implements the HyperLogLog algorithm, along with a few optimizations Over the last several months I’ve been immersing myself further and further into the big data world, and into developing and operating a SaaS. I’ve been trying … Read more