Adrian Hardy's slides from PHPNW08
Once you have your query returning the correct results, speed becomes an important factor. Speed can either be an issue from the outset, or can creep in as your dataset grows. Understanding the EXPLAIN command is essential to helping you solve and even anticipate slow queries.
Associated video: http://blip.tv/file/1791781
2. Before we begin...
What you need to know
How and why we add indexes to tables
●
The benefits of correct field typing
●
Understanding of the ideals of 3NF
●
Basic understanding of SQL JOINs
●
This presentation
Very quick introduction to EXPLAIN
●
Improve understanding of MySQL and indexing
●
Simplified examples / results
●
3. Introduction - Using MySQL EXPLAIN
Prefix a SELECT query with EXPLAIN
MySQL won't actually execute the query, just analyse it
●
EXPLAIN helps us understand how and when MySQL
●
will use indexes
EXPLAIN returns a table of data from which you identify
●
potential improvements
Optimise queries in three ways
●
Modify or create indexes
●
Modify query structure
●
Modify data structure
●
Optimised queries = faster results, lower server load...
●
4. Introduction - Review of Indexing
Fast, compact structure for identifying row locations
●
Keep indexes in memory by trimming the fat:
●
Can I reduce the characters in that VARCHAR index?
●
Can I use a TINYINT instead of a BIGINT?
●
Can I use an INTEGER to describe a status or flag (rather
●
than a textual description)?
Chop down your result set as quickly as possible
●
MySQL will only use one index per query/table – it cannot
●
combine two separate indexes to make a useful one *
Understanding and preparation brings about Indexing Strategy
* Not strictly true - look up “Index Merge” operations
6. EXPLAIN – Worked Example
EXPLAIN SELECT * FROM attendees WHERE
conference_id = 123 AND registration_status > 0
table possible_keys key rows
attendees NULL NULL 14052
The three most important columns returned by EXPLAIN
1)Possible keys
All the possible indexes which MySQL could have used
●
Based on a series of very quick lookups and
●
calculations
2)Chosen key
3)Rows scanned
Indication of effort required to identify your result set
●
7. EXPLAIN – Worked Example
EXPLAIN SELECT * FROM attendees WHERE
conference_id = 123 AND registration_status > 0
table possible_keys key rows
attendees NULL NULL 14052
Interpreting the results
No suitable indexes for this query
●
MySQL had to do a full table scan
●
Full table scans are almost always the slowest query
●
Full table scans, while not always bad, are usually an
●
indication that an index is required
8. EXPLAIN – Worked Example
ALTER TABLE ADD INDEX conf (conference_id);
ALTER TABLE ADD INDEX reg (registration_status);
EXPLAIN SELECT * FROM attendees WHERE conference_id
= 123 AND registration_status > 1;
table possible_keys key rows
attendees conf, reg conf 331
MySQL had two indexes to choose from, but discarded “reg”
●
“reg” isn't sufficiently unique
●
The spread of values can also be a factor (e.g when 99% of
●
rows contain the same value)
Index “uniqueness” is called cardinality
●
There is scope for some performance increase...
●
Lower server load, quicker response
●
9. EXPLAIN – Worked Example
ALTER TABLE ADD INDEX reg_conf_index
(registration_status, conference_id);
EXPLAIN SELECT * FROM attendees WHERE conference_id =
123 AND registration_status > 1;
table possible_keys key rows
reg, conf,
attendees reg_conf_index 204
reg_conf_index
reg_conf_index is a much better choice
●
Note that the other two keys are still available, just
●
not as effective
Our query is now served well by the new index
●
10. EXPLAIN – Worked Example
DELETE INDEX conf; DELETE INDEX reg;
EXPLAIN SELECT * FROM attendees WHERE conference_id = 123
table possible_keys key rows
attendees NULL NULL 14052
Without the “conf” index, we're back to square one
●
The order in which fields were defined in a composite index
●
affects whether it is available for use in a query
● Remember, we defined our index : (registration_status,
conference_id)
Potential workaround:
EXPLAIN SELECT * FROM attendees WHERE conference_id = 123
AND registration_status >= -1
table possible_keys key rows
attendees reg_conf_index reg_conf_index 204
11. EXPLAIN – Example 2
EXPLAIN SELECT * FROM attendees WHERE surname LIKE 'har%';
table possible_keys key rows
attendees surname surname 234
MySQL uses an index on surname – which is good.
EXPLAIN SELECT * FROM attendees WHERE surname LIKE '%har%';
table possible_keys key rows
attendees NULL NULL 14052
MySQL doesn't even try to use an index!
12. EXPLAIN – Example 3
EXPLAIN SELECT * FROM conferences WHERE location_id = 2 OR
topic_id IN (4,6,1)
table possible_keys key rows
location_id,
conferences NULL 5043
topic_id
MySQL doesn't use an index, because of the OR
ALTER TABLE ADD INDEX location_topic (location_id,
topic_id);
EXPLAIN SELECT * FROM conferences WHERE location_id = 2
OR topic_id IN (4,6,1)
table possible_keys key rows
location_id,
conferences location_topic 15
topic_id,
location_topic
Full table scan avoided – could also use UNION (ALL) trick
13. EXPLAIN – Example 4
EXPLAIN SELECT * FROM attendees WHERE MD5(conference_id) =
MD5(123)
table possible_keys key rows
attendees NULL NULL 14052
Understandably, MySQL has to do a full table scan
A more realistic example?
EXPLAIN SELECT * FROM conferences WHERE
DATE_FORMAT(date,'%a') = 'Sat'
table possible_keys key rows
conferences NULL NULL 5043
A good candidate for Optimisation #3 – Modify Data Structure
14. JOINs
JOINing together large data sets (>= 100,000) is really
●
where EXPLAIN becomes useful
Each JOIN in a query gets its own row in EXPLAIN
●
Make sure each JOIN condition is FAST
●
Make sure each joined table is getting to its result set
●
as quickly as possible
● The benefits compound if each join requires less
effort
15. JOINs – Simple Example
EXPLAIN SELECT * FROM
conferences INNER JOIN attendees USING (conference_id)
WHERE conferences.location_id = 2 AND
conferences.topic_id IN (4,6,1) AND
attendees.registration_status > 1
table type possible_keys key rows
conferences ref conference_topic conference_topic 15
attendees ALL NULL NULL 14052
Looks like I need an index on attendees.conference_id
There are 13 different values for “type”
● Another indication of effort, aside from rows scanned
● Here, “ALL” is bad – we should be aiming for “ref”
● Common values are “const”, “ref”, and “all”
● http://dev.mysql.com/doc/refman/5.0/en/using-explain.html
16. The “extra” column
With every EXPLAIN, you get an “extra” column, which
shows additional operations invoked to get your result set.
table possible_keys key rows extra
Using where
attendees conf conf 331
Using filesort
Some example “extra” values:
Using where
●
Using temporary table
●
Using filesort
●
Using index
●
There are many more “extra” values which are discussed in
the MySQL manual.
17. “Using filesort”
Avoid, because:
● Doesn't use an index
● Involves a full scan of your result set
● Employs a generic (i.e. one size fits all)
algorithm
● Uses the filesystem (eeek)
● Will get slower with more data
It's not all bad...
Perfectly acceptable provided you get to your
●
result set as quickly as possible, and keep it
predictably small
Sometimes unavoidable - ORDER BY RAND()
●
ORDER BY operations can use indexes to do the
●
sorting!
18. “Using filesort” – Example
EXPLAIN SELECT * FROM attendees WHERE conference_id = 123
ORDER BY surname
table possible_keys key rows Extra
attendees conference_id conference_id 331 Using filesort
MySQL is using an index, but it's sorting the results slowly
ALTER TABLE attendees ADD INDEX conf_surname (conference_id,
surname);
EXPLAIN SELECT * FROM attendees WHERE conference_id = 123
ORDER BY surname
table possible_keys key rows Extra
conference_id,
attendees conf_surname 331
conf_surname
We've avoided a filesort
19. “Using index”
Celebrate, because:
MySQL got your results just by consulting the index,
●
● Which could well have been sat in memory
●MySQL didn't need to even look at the table to get you your
results
● Opening a table can be an expensive operation.
●MySQL can answer the next query more quickly
●The fastest way for you to get your data?
Particularly useful...
When you're just interested in a single date or an id
●
●Or the COUNT(), SUM(), AVG() etc. of a field
20. “Using index” – Example
EXPLAIN SELECT AVG(age) FROM attendees WHERE conference_id
= 123
table possible_keys key rows Extra
attendees conference_id conference_id 331
Nothing is actually wrong with this query – it could just be quicker!
ALTER TABLE attendees ADD INDEX conf_age (conference_id,
age);
EXPLAIN SELECT AVG(age) FROM attendees WHERE conference_id
= 123
table possible_keys key rows Extra
conference_id,
attendees conf_surname 331 Using index
conf_surname
Outside of caching, the fastest way to get your data *
*Not a guarantee
21. Moving forward...
Just because your queries are fast now, doesn't mean that they will stay
that way forever
Enable MySQL's Slow Query Log
● --log-slow-queries=/var/lib/mysql/slow-query.log
● Defaults to logging queries which take more than 10 seconds
● --long_query_time=1
● Use Percona's “microslow” patch for values < 1 second
● Find the query in the log, EXPLAIN it, improve it, rinse and repeat
22. Moving forward...
Use the command line to identify more general problems
● mysqladmin -u dbuser -p -r -i 10 extended-status
● Figures are relative, updated every 10 seconds
● Slow_queries = number of slow queries in last period
● Select_Scan = full table scans
● Select_full_join = full scans to complete join operations
● Created_tmp_disk_tables = filesorts
● Key_read_requests/Key_write_requests
● Determine write/read weighting of our application and alter your
indexes accordingly
23. MySQL Resources
http://dev.mysql.com/doc/refman/5.0/en/using-explain.html
●
High Performance MySQL - Baron Schwartz
●
ISBN 0596101716
–
£20 (Money well spent)
–
http://www.mysqlperformanceblog.com
●
Regular posts
–