This document discusses using HyperLogLog in PostgreSQL to estimate cardinality or unique counts within a small memory footprint. It introduces HyperLogLog concepts like KMV, bit patterns, and stochastic averaging. It then demonstrates creating a PostgreSQL extension, inserting data into an HLL column, and using HLL functions to estimate unique counts across rows and tables. It also covers tuning HLL parameters and best practices like batching updates. The document presents HLL as a way to estimate large unique counts with only 1280 bytes and a few percent error rate.
22. Inserting data
UPDATE
helloworld
SET
set
=
hll_add(set,
hll_hash_integer(12345))
WHERE
id
=
1;
UPDATE
helloworld
SET
set
=
hll_add(set,
hll_hash_text('hello
world'))
WHERE
id
=
1;
24. Real world
INSERT
INTO
daily_uniques(date,
users)
SELECT
date,
hll_add_agg(hll_hash_integer(user_id))
FROM
users
GROUP
BY
1;
25. Real world
SELECT
EXTRACT(MONTH
FROM
date)
AS
month,
hll_cardinality(hll_union_agg(users))
FROM
daily_uniques
WHERE
date
>=
'2012-‐01-‐01'
AND
date
<
'2013-‐01-‐01'
GROUP
BY
1;
26. Real world
SELECT
EXTRACT(MONTH
FROM
date)
AS
month,
hll_cardinality(hll_union_agg(users))
FROM
daily_uniques
WHERE
date
>=
'2012-‐01-‐01'
AND
date
<
'2013-‐01-‐01'
GROUP
BY
1;
33. Tuning Parameters
• log2m - log base 2 of registers
• Between 4 and 17
• Each 1 increase doubles storage
34. Tuning Parameters
• log2m - log base 2 of registers
• Between 4 and 17
• Each 1 increase doubles storage
• regwidth - bits per register
35. Tuning Parameters
• log2m - log base 2 of registers
• Between 4 and 17
• Each 1 increase doubles storage
• regwidth - bits per register
• expthresh - threshold for explicit vs sparse
36. Tuning Parameters
• log2m - log base 2 of registers
• Between 4 and 17
• Each 1 increase doubles storage
• regwidth - bits per register
• expthresh - threshold for explicit vs sparse
• spareson - on/off for sparse