3. 12:49:093
New optimizer features
Subqueries Batched Key Access
(MRR)
Index Condition Pushdown
Extended Keys
EXPLAIN UPDATE/
DELETE
Subqueries
FROM IN Others
PERFORMANCE_SCHEMA
Engine-independent
statistics
InnoDB persistent statistics
4. 12:49:094
New optimizer features
Subqueries Batched Key Access
(MRR)
Index Condition Pushdown
Extended Keys
EXPLAIN UPDATE/
DELETE
Subqueries
FROM IN Others
Engine-independent
statistics
InnoDB persistent statistics
PERFORMANCE_SCHEMA
5. 12:49:095
Subqueries in MySQL
● Subqueries are practially unusable
● e.g. Facebook disabled them in the parser
● Reason - “naive execution”.
6. 12:49:096
Naive subquery execution
● For IN (SELECT... ) subqueries:
select * from hotel
where
hotel.country='USA' and
hotel.name IN (select hotel_stays.hotel
from hotel_stays
where hotel_stays.customer='John Smith')
for (each hotel in USA ) {
if (john smith stayed here) {
…
}
}
● Naive execution:
● Slow!
7. 12:49:097
Naive subquery execution (2)
● For FROM(SELECT …) subquereis:
1. Retrieve all hotels with > 500 rooms, store in a temporary
table big_hotel;
2. Search in big_hotel for hotels near AMS.
● Naive execution:
● Slow!
select *
from
(select *
from hotel
where hotel.rooms > 500
) as big_hotel
where
big_hotel.nearest_aiport='AMS';
8. 12:49:098
New subquery optimizations
● Handle IN (SELECT ...)
● Handle FROM (SELECT …)
● Handle a lot of cases
● Comparison with
PostgreSQL
– ~1000x slower before
– ~same order of magnitude now
● Releases
– MySQL 6.0
– MariaDB 5.5
● Sheeri Kritzer @ Mozilla seems
happy with this one
– MySQL 5.6
● Subset of MariaDB 5.5's
features
9. 12:49:099
Subquery optimizations - summary
● Subqueries were generally unusable before MariaDB
5.3/5.5
● “Core” subquery optimizations are in
– MariaDB 5.3/5.5
– MySQL 5.6
● MariaDB has extra additions
● Further information:
https://kb.askmonty.org/en/subquery-optimizations/
10. 12:49:0910
Subqueries Batched Key Access
(MRR)
Index Condition Pushdown
Extended Keys
EXPLAIN UPDATE/
DELETE
Subqueries
FROM IN Others
Engine-independent
statistics
InnoDB persistent statistics
PERFORMANCE_SCHEMA
11. 12:49:0911
Batched Key Access - background
● Big, IO-bound joins were slow
– DBT-3 benchmark could not finish*
● Reason?
● Nested Loops join hits the second table at random
locations.
12. 12:49:0912
Batched Key Access idea
Nested Loops Join Batched Key Access
Speedup reasons
● Fewer disk head movements
● Cache-friendliness
● Prefetch-friendliness
13. 12:49:0913
Batched Key Access benchmark
set join_cache_level=6; – enable BKA
select max(l_extendedprice)
from orders, lineitem
where
l_orderkey=o_orderkey and
o_orderdate between $DATE1 and $DATE2
Run with
● Various join_buffer_size settings
● Various size of $DATE1...$DATE2 range
15. 12:49:0915
Batched Key Access summary
● Optimization for big, IO-bound joins
– Orders-of-magnitude speedups
● Available in
– MariaDB 5.3/5.5 (more advanced)
– MySQL 5.6
● Not fully automatic yet
– Needs to be manually enabled
– Need to set buffer sizes.
16. 12:49:0916
Subqueries Batched Key Access
(MRR)
Index Condition Pushdown
Extended Keys
EXPLAIN UPDATE/
DELETE
Subqueries
FROM IN Others
Engine-independent
statistics
InnoDB persistent statistics
PERFORMANCE_SCHEMA
17. 12:49:0917
Index Condition Pushdown
alter table lineitem add index s_r (l_shipdate, l_receiptdate);
select count(*) from lineitem
where
l_shipdate between '1993-01-01' and '1993-02-01' and
datediff(l_receiptdate,l_shipdate) > 25 and
l_quantity > 40
● A new feature in MariaDB 5.3/ MySQL 5.6
+----+-------------+----------+-------+---------------+------+---------+------+--------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+---------------+------+---------+------+--------+------------------------------------+
| 1 | SIMPLE | lineitem | range | s_r | s_r | 4 | NULL | 158854 | Using index condition; Using where |
+----+-------------+----------+-------+---------------+------+---------+------+--------+------------------------------------+
1.Read index records in the range
l_shipdate between '1993-01-01' and '1993-02-01'
2.Check the index condition
datediff(l_receiptdate,l_shipdate) > 25
3.Read full table rows
4.Check the WHERE condition
l_quantity > 40
← New!
← Filters out records before
table rows are read
18. 12:49:0918
Index Condition Pushdown - conclusions
Summary
● Applicable to any index-based access (ref, range, etc)
● Checks parts of WHERE after reading the index
● Reduces number of table records to be read
● Speedup can be like in “Using index”
– Great for IO-bound load (5x, 10x)
– Some for CPU-bound workload (2x)
Conclusions
● Have a selective condition on column?
– Put the column into index, at the end.
19. 12:49:0919
Extended keys
● Before: optimizer has limited support for “tail” columns
– 'Using index' supports it
– ORDER BY col1, col2, pk1 support it
● After MariaDB 5.5/ MySQL 5.6
– all parts of optimizer (ref access, range access, etc) can use the “tail”
CREATE TABLE tbl (
pk1 sometype,
pk2 sometype,
...
col1 sometype,
col2 sometype,
...
KEY indexA (col1, col2)
...
PRIMARY KEY (pk1, pk2)
) ENGINE=InnoDB
indexA col1 col2 pk1 pk2
● Secondary indexes in InnoDB have invisible “tail”
20. 12:49:0920
Subqueries Batched Key Access
(MRR)
Index Condition Pushdown
Extended Keys
EXPLAIN UPDATE/
DELETE
Subqueries
FROM IN Others
Engine-independent
statistics
InnoDB persistent statistics
PERFORMANCE_SCHEMA
21. 12:49:0921
Better EXPLAIN in MySQL 5.6
● EXPLAIN for UPDATE/DELETE/INSERT … SELECT
– shows query plan for the finding records to update/delete
mysql> explain update customer set c_acctbal = c_acctbal - 100 where c_custkey=12354;
+----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | customer | range | PRIMARY | PRIMARY | 4 | NULL | 1 | Using where |
+----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+
● EXPLAIN FORMAT=JSON
– Produces [big] JSON output
– Shows more information:
● Shows conditions attached to tables
● Shows whether “Using temporary; using filesort” is done to handle
GROUP BY or ORDER BY.
● Shows where subqueries are attached
– No other known additions
– Will be in MariaDB 10.0
The most useful addition!
22. 12:49:0922
EXPLAIN FORMAT=JSON
What are the “conditions attached to tables”?
explain
select
count(*)
from
orders, customer
where
customer.c_custkey=orders.o_custkey and
customer.c_mktsegment='BUILDING' and
orders.o_totalprice > customer.c_acctbal and
orders.o_orderpriority='1-URGENT'
+----+-------------+----------+------+---------------+-------------+---------+-----------------------------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+------+---------------+-------------+---------+-----------------------------+---------+-------------+
| 1 | SIMPLE | customer | ALL | PRIMARY | NULL | NULL | NULL | 1509871 | Using where |
| 1 | SIMPLE | orders | ref | i_o_custkey | i_o_custkey | 5 | dbt3sf10.customer.c_custkey | 7 | Using where |
+----+-------------+----------+------+---------------+-------------+---------+-----------------------------+---------+-------------+
?
24. 12:49:0924
EXPLAIN ANALYZE (kind of)
● Does EXPLAIN match the reality?
● Where is most of the time spent?
● MySQL/MariaDB don't have “EXPLAIN ANALYZE” ...
select
count(*)
from
orders, customer
where
customer.c_custkey=orders.o_custkey and
customer.c_mktsegment='BUILDING' and orders.o_orderpriority='1-URGENT'
+------+-------------+----------+------+---------------+-------------+---------+--------------------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+----------+------+---------------+-------------+---------+--------------------+--------+-------------+
| 1 | SIMPLE | customer | ALL | PRIMARY | NULL | NULL | NULL | 149415 | Using where |
| 1 | SIMPLE | orders | ref | i_o_custkey | i_o_custkey | 5 | customer.c_custkey | 7 | Using index |
+------+-------------+----------+------+---------------+-------------+---------+--------------------+--------+-------------+
26. 12:49:0926
Newer solution: userstat
● In Facebook patch, Percona, MariaDB:
mysql> set global userstat=1;
mysql> flush table_statistics;
mysql> flush index_statistics;
mysql> {query}
mysql> show table_statistics;
+--------------+------------+-----------+--------------+-------------------------+
| Table_schema | Table_name | Rows_read | Rows_changed | Rows_changed_x_#indexes |
+--------------+------------+-----------+--------------+-------------------------+
| dbt3sf1 | orders | 303959 | 0 | 0 |
| dbt3sf1 | customer | 150000 | 0 | 0 |
+--------------+------------+-----------+--------------+-------------------------+
mysql> show index_statistics;
+--------------+------------+-------------+-----------+
| Table_schema | Table_name | Index_name | Rows_read |
+--------------+------------+-------------+-----------+
| dbt3sf1 | orders | i_o_custkey | 303959 |
+--------------+------------+-------------+-----------+
● Counters are per-table
– Ok as long as you don't have self-joins
● Overhead is negligible
● Counters are server-wide (other queries affect them, too)
27. 12:49:0927
Latest addition: PERFORMANCE_SCHEMA
● Allows to measure *time* spent reading each table
● Has some visible overhead (Facebook's tests: 7%)
● Counters are system-wide
● Still no luck with self-joins
mysql> truncate performance_schema.table_io_waits_summary_by_table;
mysql> {query}
mysql> select
object_schema,
object_name,
count_read,
sum_timer_read, -- this is picoseconds
sum_timer_read / (1000*1000*1000*1000) as read_seconds -- this is seconds
from
performance_schema.table_io_waits_summary_by_table
where
object_schema = 'dbt3sf1' and object_name in ('orders','customer');
+---------------+-------------+------------+----------------+--------------+
| object_schema | object_name | count_read | sum_timer_read | read_seconds |
+---------------+-------------+------------+----------------+--------------+
| dbt3sf1 | orders | 334101 | 5739345397323 | 5.7393 |
| dbt3sf1 | customer | 150001 | 1273653046701 | 1.2737 |
+---------------+-------------+------------+----------------+--------------+
28. 12:49:0928
Subqueries Batched Key Access
(MRR)
Index Condition Pushdown
Extended Keys
EXPLAIN UPDATE/
DELETE
Subqueries
FROM IN Others
Engine-independent
statistics
InnoDB persistent statistics
PERFORMANCE_SCHEMA
29. 12:49:0929
What is table/index statistics?
select
count(*)
from
customer, orders
where
customer.c_custkey=orders.o_custkey and customer.c_mktsegment='BUILDING';
+------+-------------+----------+------+---------------+-------------+---------+--------------------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+----------+------+---------------+-------------+---------+--------------------+--------+-------------+
| 1 | SIMPLE | customer | ALL | PRIMARY | NULL | NULL | NULL | 148305 | Using where |
| 1 | SIMPLE | orders | ref | i_o_custkey | i_o_custkey | 5 | customer.c_custkey | 7 | Using index |
+------+-------------+----------+------+---------------+-------------+---------+--------------------+--------+-------------+
MariaDB > show table status like 'orders'G
*************************** 1. row ***************************
Name: orders
Engine: InnoDB
Version: 10
Row_format: Compact
Rows: 1495152
.............
MariaDB > show keys from orders where key_name='i_o_custkey'G
*************************** 1. row ***************************
Table: orders
Non_unique: 1
Key_name: i_o_custkey
Seq_in_index: 1
Column_name: o_custkey
Collation: A
Cardinality: 212941
Sub_part: NULL
.................
?
1495152 / 212941 = 7
“There are on average 7 orders
for a given c_custkey”
30. 12:49:0930
The problem with index statistics and InnoDB
MySQL 5.5, InnoDB
● Statistics is calculated on-the-fly
– When the table is opened (server restart, DDL)
– When sufficient number of records have been updated
– ...
● Calculation uses random sampling
– @@innodb_stats_sample_pages
● Result:
– Statistics changes without warning
=> Query plans change, without warning
● For example, DBT-3 benchmark
– 22 analytics queries
– Plans-per-query: avg=2.8, max=7.
31. 12:49:0931
Persistent table statistics
Persistent statistics v1
● Percona Server 5.5 (ported to MariaDB 5.5)
– Need to enable it: innodb_use_sys_stats_table=1
● Statistics is stored inside InnoDB
– User-visible through information_schema.innodb_sys_stats (read-only)
● Setting innodb_stats_auto_update=OFF prevents unexpected updates
Persistent statistics v2
● MySQL 5.6
– Enabled by default: innodb_stats_persistent=1
● Stored in regular InnoDB tables
– mysql.innodb_table_stats, mysql.innodb_index_stats
● Setting innodb_stats_auto_recalc=OFF prevents unexpected updates
● Can also specify persistence/auto-recalc as a table option
32. 12:49:0932
Persistent table statistics - summary
● Percona, then MySQL
– Made statistics persistent
– Disallowed automatic updates
● Remaining issue #1: it's still random sampling
– DBT-3 benchmark
– scale=30
– Re-ran EXPLAINS for
benchmark queries
– Counted different query
plans
● Remaining issue #2: limited amount of statistics
– Only on index columns
– Only AVG(#different_values)
33. 12:49:0933
Upcoming: Engine-independent statistics
MariaDB 10.0: Engine-independent statistics
● Collected/used on SQL layer
● No auto updates, only ANALYZE TABLE
– 100% precise statics
● More statistics
– Index statistics (like before)
– Table statistics (like before)
– Column statistics
● MIN/MAX values
● Number of NULL / not NULL values
● Histograms
● => Optimizer will be smarter and more reliable
34. 12:49:0934
Conclusions
● Lots of new query optimizer features recently
– Subqueries now just work
– Big joins are much faster
● Need to turn it on
– More diagnostics
● Even more is coming
● Releases with features
– MariaDB 5.5
– MySQL 5.6,
– (upcoming) MariaDB 10.0
35. 12:49:0935
New optimizer features
Subqueries Batched Key Access
(MRR)
Index Condition Pushdown
Extended Keys
EXPLAIN UPDATE/
DELETE
Subqueries
FROM IN Others
PERFORMANCE_SCHEMA
Engine-independent
statistics
InnoDB persistent statistics