PGDay UK 2016 -- Performace for queries with grouping

1/50
PostgreSQL
Performance for queries with grouping
Alexey Bashtanov, Brandwatch, Brighton
05 Jul 2016

2/50
What is it all about?
This talk will cover optimisation of
Grouping
Aggregation
Unfortunately it will not cover optimisation of
Getting the data
Filtering
Joins
Window functions
Other data transformations

3/50
Outline
1 What is a grouping?
2 How does it work?
Aggregation functions under the hood
Grouping algorithms
3 Optimisation: avoid sorts
Simple group-by
Count distinct values
Ordered aggregates
4 Summation optimisation
5 Denormalized data aggregation
6 Arg-maximum
7 Some magic: loose index scan
8 The future: parallel execution

5/50
What is a grouping?
What do we call a grouping/aggregation operation?
An operation of splitting input data into several classes and
then compilation each class into one row.
3
32 21 1
3
3
3
3
1
1
2
2 15
2
2
2
2
3
3
1
1 8
3
3
2
2
3
3
1
1 9

6/50
Examples
SELECT department_id,
avg(salary)
FROM employees
GROUP BY department_id
SELECT DISTINCT department_id
FROM employees

7/50
Examples
SELECT DISTINCT ON (department_id)
department_id,
employee_id,
salary
FROM employees
ORDER BY department_id,
salary DESC

8/50
Examples
SELECT max(salary)
FROM employees
SELECT salary
FROM employees
ORDER BY salary DESC
LIMIT 1

10/50
INITCOND SFUNC
Input data
state SFUNC
Input data
state SFUNC
Input data
state
FINALFUNC
Result
An aggregate function is deﬁned by:
State, input and output types
Initial state (INITCOND)
Transition function (SFUNC)
Final function (FINALFUNC)

10/50
state = 0 state += input
2
2 state += input
3
5 state += input
7
12
=
sum=12
SELECT sum(column1),
avg(column1)
FROM (VALUES (2), (3), (7)) _

10/50
cnt = 0
sum = 0
cnt++
sum+=input
2
cnt=1
sum=2
cnt++
sum+=input
3
cnt=2
sum=5
cnt++
sum+=input
7
cnt=3
sum=12
sum / cnt
avg=4
SELECT sum(column1),
avg(column1)
FROM (VALUES (2), (3), (7)) _

11/50
SFUNC and FINALFUNC functions can be written in
C — fast (SFUNC may reuse state variable)
SQL
PL/pgSQL — SLOW!
any other language
SFUNC and FINALFUNC functions can be declared STRICT
(i.e. not called on null input)

12/50
Grouping algorithms
PostgreSQL uses 2 algorithms to feed aggregate functions by
grouped data:
GroupAggregate: get the data sorted and apply
aggregation function to groups one by one
HashAggregate: store state for each key in a hash table

13/50
GroupAgg
1 3 1 2 2 3 1 3 2 1 state: 0

13/50
GroupAgg
1 3 1 2 2 3 1 3 2 1 state: 0
1 3 1 2 2 3 1 3 state: 3

13/50
GroupAgg
1 3 1 2 2 3 1 3 2 1 state: 0
1 3 1 2 2 3 1 3 state: 3
1 3 1 2 2 state: 4 6

13/50
GroupAgg
1 3 1 2 2 3 1 3 2 1 state: 0
1 3 1 2 2 3 1 3 state: 3
1 3 1 2 2 state: 4 6
1 3 1 state: 0 8 6

13/50
GroupAgg
1 3 1 2 2 3 1 3 2 1 state: 0
1 3 1 2 2 3 1 3 state: 3
1 3 1 2 2 state: 4 6
1 3 1 state: 0 8 6
5 8 6

14/50
HashAggregate
1 2 3 2 3 1 2 1 3 1 state: 0

14/50
HashAggregate
1 2 3 2 3 1 2 1 3 1 state: 0
1 2 3 2 3 1 2 1 3 state: 1

14/50
HashAggregate
1 2 3 2 3 1 2 1 3 1 state: 0
1 2 3 2 3 1 2 1 3 state: 1
1 2 3 2 3 1 2 1
state: 1
state: 3

14/50
HashAggregate
1 2 3 2 3 1 2 1 3 1 state: 0
1 2 3 2 3 1 2 1 3 state: 1
1 2 3 2 3 1 2 1
state: 1
state: 3
1 2 3
state: 6
state: 6
state: 1

14/50
HashAggregate
1 2 3 2 3 1 2 1 3 1 state: 0
1 2 3 2 3 1 2 1 3 state: 1
1 2 3 2 3 1 2 1
state: 1
state: 3
1 2 3
state: 6
state: 6
state: 1
state: 6
state: 8
state: 5

14/50
HashAggregate
1 2 3 2 3 1 2 1 3 1 state: 0
1 2 3 2 3 1 2 1 3 state: 1
1 2 3 2 3 1 2 1
state: 1
state: 3
1 2 3
state: 6
state: 6
state: 1
state: 6
state: 8
state: 5
68 5

15/50
GroupAggregate vs. HashAggregate
GroupAggregate
− Requires sorted data
+ Needs less memory
+ Returns sorted data
+ Returns data on the ﬂy
+ Can perform
count(distinct x),
array_agg(x order by y)
etc.
+ On cardinality
misestimation will sort on
disk
HashAggregate
+ Accepts unsorted data
− Needs more memory
− Returns unsorted data
− Returns data at the end
− Can perform only basic
aggregation
− On groups count
misestimation will OOM

16/50
Optimisation: avoid sorts

17/50
Simple group-by: avoid sorts
Sorts are really slow. Prefer HashAggregation if possible.
100 101 102 103 104 105
106 107
0
1
2
3
4
5
6
7
Groups
Time,s
SELECT a, COUNT(*) FROM t_10m GROUP BY a
HashAgg
Sort + GroupAgg

17/50
What to do if you get something like this?
EXPLAIN
SELECT region_id,
avg(age)
FROM people
GROUP BY region_id
GroupAggregate (cost=149244.84..156869.46 rows=9969 width=10)
-> Sort (cost=149244.84..151744.84 rows=1000000 width=10)
Sort Key: region_id
-> Seq Scan on people (cost=0.00..15406.00 rows=1000000 width=10)
1504.474 ms

17/50
EXPLAIN
SELECT region_id,
avg(age)
FROM people
GROUP BY region_id
set enable_sort to off?

17/50
EXPLAIN
SELECT region_id,
avg(age)
FROM people
GROUP BY region_id
set enable_sort to off? No!
GroupAggregate (cost=10000149244.84..10000156869.46 rows=9969 width=10)
-> Sort (cost=10000149244.84..10000151744.84 rows=1000000 width=10)
Sort Key: region_id
1497.167 ms

17/50
EXPLAIN
SELECT region_id,
avg(age)
FROM people
GROUP BY region_id
Increase work_mem: set work_mem to ’100MB’
HashAggregate (cost=20406.00..20530.61 rows=9969 width=10)
685.689 ms

17/50
EXPLAIN
SELECT region_id,
avg(age)
FROM people
GROUP BY region_id
Increase work_mem: set work_mem to ’100MB’
HashAggregate (cost=20406.00..20530.61 rows=9969 width=10)
685.689 ms
Increase sanely to avoid OOM

18/50
How to spend less memory to allow HashAggregation?
Don’t aggregate joined
SELECT p.region_id,
d.region_description,
avg(age)
FROM people p
JOIN regions r using (region_id)
GROUP BY region_id,
region_description
Join aggregated instead
SELECT a.region_id,
r.region_description,
a.avg_age
FROM (
SELECT region_id,
avg(age) avg_age
FROM people p
GROUP BY region_id
) a
JOIN regions r using (region_id)

19/50
Count distinct: avoid sorts as well
How to avoid sorts for count(DISTINCT ...)?
SELECT location_id,
count(DISTINCT visitor_id)
FROM visits
GROUP BY location_id
GroupAggregate (actual time=2371.992..4832.437 rows=1000 loops=1)
Group Key: location_id
-> Sort (actual time=2369.322..3488.261 rows=10000000 loops=1)
Sort Key: location_id
Sort Method: quicksort Memory: 818276kB
-> Seq Scan on visitors (actual time=0.007..943.090 rows=10000000 loops=1)

20/50
Count distinct: avoid sorts as well!
Two levels of HashAggregate could be faster!
SELECT location_id,
count(*)
FROM (
SELECT DISTINCT location_id,
visitor_id
FROM visits
) _
HashAggregate (actual time=2409.378..2409.471 rows=1000 loops=1)
Group Key: visits.location_id
-> HashAggregate (actual time=2235.069..2235.156 rows=1000 loops=1)
Group Key: visits.location_id, visits.visitor_id
-> Seq Scan on visits (actual time=0.005..884.194 rows=10000000 loops=1)

21/50
Or use an extension by Tomáš Vondra:
https://github.com/tvondra/count_distinct
SELECT location_id,
count_distinct(visitor_id)
FROM visits
Group Key: visitor_id
Warning: this algorithm uses much memory in certain
circumstances

22/50
There is another extension that allows to calculate approximate
number of distinct values using constant amount of memory:
https:
//github.com/aggregateknowledge/postgresql-hll
SELECT location_id,
hll_cardinality(
hll_add_agg(hll_hash_integer(c))
)
FROM visits
Group Key: visitor_id

23/50
100 101 102 103 104
0
1
2
3
4
5
6
7
8
Distinct values per group
Time,s
Count-distinct from a 10M-rows table by 1000 groups
Sort+GroupAgg
HashAgg+HashAgg
Count_distinct ext.
Postgres_hll ext.

24/50
Ordered aggregates: avoid massive sorts
How to avoid sorts for array_agg(...ORDER BY ...)?
SELECT
visit_date,
array_agg(visitor_id ORDER BY visitor_id)
FROM visits
GROUP BY visit_date
GroupAggregate (actual time=5433.658..8010.309 rows=10000 loops=1)
-> Sort (actual time=5433.416..6769.872 rows=4999067 loops=1)
Sort Key: visit_date
Sort Method: external merge Disk: 107504kB

25/50
Avoiding sorts
Might be better to sort each line separately
SELECT
visit_date,
(
select array_agg(i ORDER BY i)
from unnest(visitors_u) i
)
FROM (
SELECT visit_date,
array_agg(visitor_id) visitors_u
FROM visits
GROUP BY visit_date
) _
Subquery Scan on _ (actual time=2504.915..3767.300 rows=10000 loops=1)
-> HashAggregate (actual time=2504.757..2555.038 rows=10000 loops=1)
SubPlan 1
-> Aggregate (actual time=0.120..0.121 rows=1 loops=10000)
-> Function Scan on unnest i (actual time=0.033..0.055 rows=500 loops=10000)

27/50
Summation: integer data types
smallint int bigint numeric
0
10
20
11 11
23
20
12 12 12
21
Time,s
Summating 100M numbers
9.4
9.5
sum(bigint) returns numeric was slow in 9.4 as it used
to convert every input value to numeric.

28/50
Summation: zeroes
0 % 20 % 40 % 60 % 80 % 100 %
0
1
2
Non-zero values
Time,s
Summation of 10M numerics
SELECT SUM(a) FROM t
SELECT SUM(a) FROM t WHERE a <> 0
SELECT SUM(a) FROM t, nulls stored instead of zeroes

29/50
Denormalized data aggregation

30/50
Sometimes we need to aggregate denormalized data
Most common solution is
SELECT account_id,
account_name,
sum(payment_amount)
FROM payments
GROUP BY account_id,
account_name
Planner does not know that account_id and account_name
correlate. It can lead to wrong estimates and suboptimal plan.

31/50
A bit less-known approach is
SELECT account_id,
min(account_name),
sum(payment_amount)
FROM payments
GROUP BY account_id
Works only if the type of "denormalized payload" supports
comparison operator.

32/50
Also we can write a custom aggregate function
CREATE FUNCTION frst (text, text)
RETURNS text IMMUTABLE LANGUAGE sql AS
$$ select $1; $$;
CREATE AGGREGATE a (text) (
SFUNC=frst,
STYPE=text
);
SELECT account_id,
a(account_name),
sum(payment_amount)
FROM payments
GROUP BY account_id

33/50
Or even write it in C:
https://github.com/bashtanov/argm
SELECT account_id,
anyold(account_name),
sum(payment_amount)
FROM payments
GROUP BY account_id

34/50
And what is the fastest?
It depends on the width of "denormalized payload":
1 10 100 1000 10000
dumb 366ms 374ms 459ms 1238ms 53236ms
min 375ms 377ms 409ms 716ms 16747ms
SQL 1970ms 1975ms 2031ms 2446ms 2036ms
C 385ms 385ms 408ms 659ms 436ms

34/50
And what is the fastest?
It depends on the width of "denormalized payload":
1 10 100 1000 10000
dumb 366ms 374ms 459ms 1238ms 53236ms
min 375ms 377ms 409ms 716ms 16747ms
SQL 1970ms 1975ms 2031ms 2446ms 2036ms*
C 385ms 385ms 408ms 659ms 436ms*
* — The more data the faster we proceed?
It is because we do not need to extract TOASTed values.

36/50
Arg-maximum
Max
Population of the largest
city in each country
Date of last tweet by each
author
The highest salary in each
department

36/50
Arg-maximum
Max
Population of the largest
city in each country
Date of last tweet by each
author
The highest salary in each
department
Arg-max
What is the largest city in
each country
What is the last tweet by
each author
Who gets the highest
salary in each department

37/50
Arg-maximum
Max is built-in. How to perform Arg-max?
Self-joins?
Window-functions?

37/50
Arg-maximum
Self-joins?
Window-functions?
Use DISTINCT ON() (PG-speciﬁc, not in SQL standard)
SELECT DISTINCT ON (author_id)
author_id,
twit_id
FROM twits
ORDER BY author_id,
twit_date DESC

37/50
Arg-maximum
Self-joins?
Window-functions?
Use DISTINCT ON() (PG-speciﬁc, not in SQL standard)
SELECT DISTINCT ON (author_id)
author_id,
twit_id
FROM twits
ORDER BY author_id,
twit_date DESC
But it still can be performed only by sorting, not by hashing :(

38/50
Arg-maximum
We can emulate Arg-max by ordinary max and dirty hacks
SELECT author_id,
(max(array[
twit_date,
date’epoch’ + twit_id
]))[2] - date’epoch’
FROM twits
GROUP BY author_id;
But such types tweaking is not always possible.

39/50
Arg-maximum
It’s time to write more custom aggregate functions
CREATE TYPE amax_ty AS (key_date date, payload int);
CREATE FUNCTION amax_t (p_state amax_ty, p_key_date date, p_payload int)
RETURNS amax_ty IMMUTABLE LANGUAGE sql AS
$$
SELECT CASE WHEN p_state.key_date < p_key_date
OR (p_key_date IS NOT NULL AND p_state.key_date IS NULL)
THEN (p_key_date, p_payload)::amax_ty
ELSE p_state END
$$;
CREATE FUNCTION amax_f (p_state amax_ty) RETURNS int IMMUTABLE LANGUAGE sql AS
$$ SELECT p_state.payload $$;
CREATE AGGREGATE amax (date, int) (
SFUNC = amax_t,
STYPE = amax_ty,
FINALFUNC = amax_f,
INITCOND = ’(,)’
);
SELECT author_id,
amax(twit_date, twit_id)
FROM twits
GROUP BY author_id;

40/50
Arg-maximum
Argmax is similar to amax, but written in C
https://github.com/bashtanov/argm
SELECT author_id,
argmax(twit_date, twit_id)
FROM twits
GROUP BY author_id;

41/50
Arg-maximum
Who wins now?
1002 3332 10002 33332 50002
DISTINCT ON 6ms 42ms 342ms 10555ms 30421ms
Max(array) 5ms 47ms 399ms 4464ms 10025ms
SQL amax 38ms 393ms 3541ms 39539ms 90164ms
C argmax 5ms 37ms 288ms 3183ms 7176ms

41/50
Arg-maximum
Who wins now?
1002 3332 10002 33332 50002
SQL amax ﬁnally outperforms DISTINCT ON on 109-ish rows

42/50
Some magic: loose index scan

43/50
Loose index scan
Slow distinct, max or arg-max query?
Sometimes we can fetch the rows one-by-one using index:
3 2 1 4 2 2 1 3 31 0
CREATE TABLE balls(colour_id int, label int);
INSERT INTO balls ...
CREATE INDEX ON balls(colour_id);
-- find the very first colour
SELECT colour_id FROM balls
ORDER BY colour_id LIMIT 1;
-- find the next colour
SELECT colour_id FROM balls
WHERE colour_id > ?
-- and so on ...

44/50
Loose index scan
CREATE FUNCTION loosescan() RETURNS
TABLE (o_colour_id int) AS $$
BEGIN
o_colour_id := -1; --less than all real ids
LOOP
SELECT colour_id
INTO o_colour_id
FROM balls
WHERE colour_id > o_colour_id
EXIT WHEN NOT FOUND; RETURN NEXT;
END LOOP;
END;
$$ LANGUAGE plpgsql;
SELECT * FROM loosescan();

45/50
Loose index scan
Or better do it in pure SQL instead
WITH RECURSIVE d AS (
(
SELECT colour_id
FROM balls
ORDER BY colour_id LIMIT 1
)
UNION
SELECT (
SELECT b.colour_id
FROM balls b
WHERE b.colour_id > d.colour_id
ORDER BY b.colour_id LIMIT 1
) colour_id
FROM d
)
SELECT * FROM d WHERE colour_id IS NOT NULL;

46/50
Loose index scan
One-by-one retrieval by index
+ Incredibly fast unless returns too many rows
− Needs an index
Fetching distinct values from a 10M-rows table:
101 103 105 106 107
HashAgg 1339ms 1377ms 2945ms 4086ms 5130ms
LIS proc 0ms 9ms 815ms 8004ms 80800ms
LIS SQL 0ms 6ms 555ms 5460ms 56153ms

47/50
Loose index scan
It is possible to explore similar approach for max and argmax
+ Incredibly fast unless returns too many rows
− Needs an index
− SQL version needs tricks if the data types differ
1002 3332 10002 33332 50002
LIS proc 2ms 6ms 12ms 42ms 63ms
LIS SQL 1ms 4ms 11ms 29ms 37ms

48/50
The future: parallel execution

49/50
The future: parallel execution
PostgreSQL 9.6 (currently Beta 2) introduces parallel execution
of many nodes including aggregation.
Parallel aggregation extension is already available:
http://www.cybertec.at/en/products/
agg-parallel-aggregations-postgresql/
+ Up to 30 times faster
+ Speeds up SeqScan as well
− Mostly useful for complex row operations
− Requires PG 9.5+

PGDay UK 2016 -- Performace for queries with grouping

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (16)

Similar to PGDay UK 2016 -- Performace for queries with grouping

Similar to PGDay UK 2016 -- Performace for queries with grouping (20)

Recently uploaded

Recently uploaded (20)

PGDay UK 2016 -- Performace for queries with grouping