1. InnoDB Performance and
Usability Patches
MySQL CE 2009
Vadim Tkachenko,
Ewen Fortune
Percona Inc
MySQLPerformanceBlog.com
2. Who are we ?
• Vadim Tkachenko
– Co-Founder of Percona Inc
• Lead of R&D department
– Co-Author MySQLPerformanceBlog.com
– Co-Author “High Performance MySQL” 2nd edition book
• Ewen Fortune
– Consultant, Percona Inc
• Special Thanks Yasufumi Kinoshita
– Not here, but author of most patches
3. What is this talk about?
• Patches made by Percona for InnoDB Storage
Engine
• Two main focuses
– Performance improvement patches
– “Usability” patches
• Make InnoDB a bit more friendly
• World changed since time of Pentium 100MHz and 8MB of RAM
– But many such assumptions still in InnoDB code
4. Why we do it
• Most requirements and changes come from
practical work with customers
• We need InnoDB fully utilizing modern hardware
today
– 16 cores
– RAIDs
– SSD / FusionIO / other storage technologies
• InnoDB team is “conservative” in making
improvements in this area
5. Future
• Why patches ? Why it can’t be included in InnoDB
?
– We are often asked about, but actually question is to
InnoDB team
• (empty space due to uncertainty of MySQL future in
Oracle)
• Anyway we will continue our work
6. Versions
• 5.0
– Set of patches
– SHOW PATCHES to see full list
• 5.1
– Storage engine XtraDB
– Based on InnoDB + patches, not real competitor of
InnoDB, but drop-in enhanced version
8. Scalability
• Enhanced read_write locks
– Improves InnoDB scalability on systems with 8-16 cores
– Similar on Google implementation, InnoDB-plugin-1.0.3
– Our implementation is alternative
• Topic to research which one is better
• InnoDB-plugin may be preferred, InnoDB team made hard job
porting it to many platforms
– And now in 5.4
• Split buffer_pool mutex even more
– Additional split of buffer_pool mutex to 5.0.33
9. IO patches
• InnoDB IO patches
– Part similar to Google’s InnoDB IO patches, but again
alternative
– Several parts – some of them now in 5.4
10. IO – multiple threads
• Read_io_threads
– Number of threads for reads requests (by default 1)
– Not really useful as used only for read-ahead requests
• Write_io_threads
– Number of threads for write requests (by default 1)
– This is one you may want to use on system with multiple
disks
• Io_capacity
– Amount of IO operations per second InnoDB assumes
server can do (by default 100, which is not right
assumptions for modern systems)
11. IO – Adaptive checkpoint
• InnoDB flushing of dirty buffer_pool pages may be
intensive
• Lack of free pages may be controlled by
innodb_max_dirty_pages_pct
• Flushing at the moment of checkpoint is not
controllable, intensive and may hurt
15. IO Control of Insert buffer
• Ibuf_max_size – maximal size of insert buffer (by
default can be half of buffer_pool)
• Ibuf_accel_rate – IO rate for background thread,
works in pair with io_capacity
16. IO – multiple pages
• Read_ahead = (both | linear | random)
– Control to use or not internal InnoDB read-ahead logic
• Flush_neighbor_pages = (yes|no)
– By default InnoDB also writes neighborhoods of flushing
pages
• All these operations were made for disks with
expensive (in time sense) random reads – may be
not needed for SSD / FusionIO / other devices with
cheap random reads
17. Extra rollback segments
• By default InnoDB uses single segment protected
by mutex
• Sensitive in intensive parallel insert load
18. Fix group commit
• “Broken” in 5.0
– Problem appears on slow disks with enabled binary-logs
19. Benchmark
• Tpcc-like workload
• 100 Warehouses (about 10GB of data)
• Buffer_pool=5GB
• System: Dell PowerEdge R900, RAID 10 on 8
disks, RAM 32GB
– O_DIRECT for InnoDB, xfs filesystem, mounted with
nobarrier
• 5.0.77 vs 5.0.77-percona
– Had no chance to test 5.4 yet
23. Limit data dictionary
• Problem:
– Data dictionary entry of once opened table kept in
memory forever (or while DELETE table)
– Is not problem for regular usage (100-1000 tables)
– Is problem for instances with 10K+ tables
• 10GB+ of memory just allocated for datadictionary entries
• Our solution:
– LRU based datadictionary entries
– Remove from memory oldest entries if limit reached
26. Show memory usage
• Extended information about memory consuming
---------------------- BUFFER POOL AND MEMORY
----------------------
Total memory allocated 328830416; in additional pool allocated
2117120
+ Internal hash tables (constant factor + variable factor)
+ Adaptive hash index 4839388 (4425628 + 413760) + Page hash
138716
+ Dictionary cache 3383508 (3320220 + 63288)
+ File system 41848 (41336 + 512)
+ Lock system 332788 (332468 + 320)
+ Recovery system 0 (0 + 0)
+ Threads 41900 (41348 + 552)
Buffer pool size 16384
+ Buffer pool size, bytes 268435456
Free buffers 12396
27. Show locks held
• ---TRANSACTION 0 163390, ACTIVE 0 sec, process no 15571, OS thread id 1159485776
inserting
• mysql tables in use 1, locked 1
• 7 lock struct(s), heap size 1216, undo log entries 4
• MySQL thread id 15, query id 15455 127.0.0.1 root update
• INSERT INTO history(h_c_d_id, h_c_w_id, h_c_id, h_d_id, h_w_id, h_date, h_amount, h_data)
VALUES(?, ?, ?, ?, ?, ?,
• ?, ?)
• Trx read view will not see trx with id >= 0 163391, sees < 0 163086
• TABLE LOCK table `test/warehouse` trx id 0 163390 lock mode IX
• RECORD LOCKS space id 10 page no 3 n bits 168 index `PRIMARY` of table `test/warehouse`
trx id 0 163390 lock_mode X
• locks rec but not gap
• TABLE LOCK table `test/district` trx id 0 163390 lock mode IX
• RECORD LOCKS space id 18 page no 7 n bits 216 index `PRIMARY` of
table `test/district` trx id 0 163390 lock_mode X locks rec but not
gap
• TABLE LOCK table `test/customer` trx id 0 163390 lock mode IX
• RECORD LOCKS space id 19 page no 57918 n bits 96 index `PRIMARY` of table `test/customer`
trx id 0 163390 lock_mode X locks rec but not gap
• TABLE LOCK table `test/history` trx id 0 163390 lock mode IX
28. Extra undo slots
• By default 1024 slots to store transaction undo
information, that may limit count of concurrent
transactions to 512
• We increase to 4072
– Only on 5.1 XtraDB
– Use it only if you need, breaks compatibility with InnoDB
29. TransactionalReplication
• Similar to Google’s patch
• Information in relay-log.info is not consistent with
InnoDB state.
– When server crash MySQL will repeat several transaction
• You are lucky if replication fails on “Duplicate key error”
• In worst case you will have several transactions executed twice
• Our solution: store information of binary log name
and position and relay-log name and position in
InnoDB transactional log file
30. Plans
• Still hunt performance improvements
• Operations tasks:
– Fast recovery
• There is reported bug http://bugs.mysql.com/bug.php?id=29847
– Preload table / index into buffer_pool.
– Copy single .ibd table from one server to different
– Open InnoDB tables in parallel
• Currently serialized
– Different improvements on statistics
• Some patches already published (not by us)
31. To finalize
• Most of patches is not rocket science
– Could be developed or included in official tree long time
ago
• Even more, for some patches we just only uncommented few
lines of code
– Expect most of them in MariaDB 5.1