Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
MySQL Storage Engines Landscape
1. The MySQL Storage
Engines Landscape
Colin Charles, Monty Program Ab
colin@montyprogram.com
http://montyprogram.com / http://mariadb.org/
http://bytebot.net/blog / @bytebot on Twitter
SCALE10x, Los Angeles, CA, USA
20 January 2012
2. whoami
• Chief Evangelist, MariaDB
• Formerly of MySQL AB/Sun Microsystems
• Past lives included FESCO (Fedora
Project), OpenOffice.org
3. Agenda
• What is a storage engine?
• What makes storage engines different?
• What storage engines are available?
• Usage examples
• The Storage Engine API
7. Value Proposition
• No other database offers this capability
• Unmatched flexibility + customisation potential
• MEMORY engine for performance/routine
lookup data
• Right storage engine can improve performance in
many applications
• ARCHIVE compresses data, up to 80%
• Partners & community benefit from this
8. What makes storage
engines different?
• Storage: how the data is stored on disk
• Indexes: improves search operations
• Memory usage: improves data access for
speed
• Transactions: protects the integrity of your
data (ACID)
9. Native engines in
MySQL 5
mysql> show engines;
+------------+---------+----------------------------------------------------------------+
| Engine | Support | Comment |
+------------+---------+----------------------------------------------------------------+
| MyISAM | DEFAULT | Default engine as of MySQL 3.23 with great performance |
| MEMORY | YES | Hash based, stored in memory, useful for temporary tables |
| InnoDB | YES | Supports transactions, row-level locking, and foreign keys |
| BerkeleyDB | YES | Supports transactions and page-level locking |
| BLACKHOLE | NO | /dev/null storage engine (anything you write to it disappears) |
| EXAMPLE | NO | Example storage engine |
| ARCHIVE | NO | Archive storage engine |
| CSV | NO | CSV storage engine |
| ndbcluster | NO | Clustered, fault-tolerant, memory-based tables |
| FEDERATED | NO | Federated MySQL storage engine |
| MRG_MYISAM | YES | Collection of identical MyISAM tables |
| ISAM | NO | Obsolete storage engine |
+------------+---------+----------------------------------------------------------------+
12 rows in set (0.01 sec)
10. Native engines in
MariaDB 5.1
MariaDB [(none)]> show engines;
+------------+---------+--------------------------------------------------------------------------------------------------+--------------+------+------------+
| Engine | Support | Comment | Transactions | XA | Savepoints |
+------------+---------+--------------------------------------------------------------------------------------------------+--------------+------+------------+
| BLACKHOLE | YES | /dev/null storage engine (anything you write to it disappears) | NO | NO | NO |
| MRG_MYISAM | YES | Collection of identical MyISAM tables | NO | NO | NO |
| FEDERATED | YES | FederatedX pluggable storage engine | YES | NO | YES |
| MARIA | YES | Crash-safe tables with MyISAM heritage | YES | NO | NO |
| CSV | YES | CSV storage engine | NO | NO | NO |
| MEMORY | YES | Hash based, stored in memory, useful for temporary tables | NO | NO | NO |
| ARCHIVE | YES | Archive storage engine | NO | NO | NO |
| MyISAM | DEFAULT | Default engine as of MySQL 3.23 with great performance | NO | NO | NO |
| InnoDB | YES | XtraDB engine based on InnoDB plugin. Supports transactions, row-level locking, and foreign keys | YES | YES | YES |
| PBXT | YES | High performance, multi-versioning transactional engine | YES | YES | NO |
+------------+---------+--------------------------------------------------------------------------------------------------+--------------+------+------------+
10 rows in set (0.00 sec)
11. Some notes post-5.1
• dynamically loadable storage engines - i.e.
they are pluggable
• Embedded InnoDB (now, see HailDB)
• e.g. you can load SphinxSE for FTS
• Federated disabled (FederatedX in MariaDB)
• Native partitioning (RANGE, LIST, HASH,
KEY) available (no need for MERGE tables)
12. Commercial engines in
the ecosystem
• InfoBright - data warehousing applications
• TokuDB - uses Fractal Trees, for Big Data
analysis, online schema changes, hot
indexes/columns
• ScaleDB - cloud-friendly scalability, with
load balancing+high availability
13. Using different engines
• Create a table with a specified engine
• CREATE TABLE t1 (..) ENGINE=XtraDB;
• Changing existing tables
• ALTER TABLE t1 ENGINE=Aria;
• SHOW ENGINES;
14. Transactional vs. non-
transactional
• Transaction-safe tables • Update fails? Changes
(InnoDB) have advantages reverted
over non-transaction safe
tables (MyISAM): • Concurrency - tables w/
many update +
• server crash? Automatic concurrent reads
recovery, or a backup
+transaction log • Disadvantages in today’s
environments (transaction
• ROLLBACK can be overhead = slower), more
executed to ignore disk space requirements,
changes more memory to perform
updates don’t seem like they
apply any longer
15. Indexes
• Tree Indexes • Hash Indexes (MEMORY, NDB,
InnoDB)
• B-Trees
• If table fits entirely in
• B+Trees (InnoDB) memory, fastest way to
perform queries is a hash
• T-Trees (NDB) index
• Red-black binary trees • InnoDB has an internal
(MEMORY) adaptive hash index. InnoDB
monitors index searches,
• R-Trees (MyISAM for spatial and if it notices that it will
benefit from a hash index,
indexes)
InnoDB automatically builds
one. (5.1.24 and greater)
16. Does the storage engine
really make a difference?
MyISAM Inserts XtraDB Inserts ARCHIVE Inserts
User Load Per Second Per Second Per Second
1 3,203.00 2,670.00 3,576.00
4 9,123.00 5,280.00 11,038.00
8 9,361.00 5,044.00 13,202.00
16 8,957.00 4,424.00 13,066.00
32 8,470.00 3,934.00 12,921.00
64 8,382.00 3,541.00 12,571.00
Using mysqlslap, against MariaDB 5.1.55, the ARCHIVE
engine has 50% more INSERT throughput compared to
MyISAM, and 255% more than XtraDB.
17. MyISAM
• Pros? • no transactions
• excellent INSERT • no foreign key
performance support
• small footprint • Typical uses
• supports full-text • logging
search (FTS)
• auditing
• Cons?
• data warehousing
18. MyISAM II
• In my.cnf, remember to • myisam_use_mmap
set the key_buffer_size. enables MyISAM to use
This is memory*0.40, as memory mapping (7-40%
MyISAM uses the OS speed improvement)
cache for tables
• key_cache_segments = 1
• Using DRBD (which enables segmented key
requires transactional caches in MariaDB -
engine) but still want ~250% improvements, as
FTS? Create a FTS it mitigates thread
“slave” MyISAM table contention for key cache
lock
19. InnoDB
• Maintains its own buffer pool (does
aggressive memory caching)
• Uses tablespaces (several files on disk, raw
disk support)
• Typically used for OLTP operations
20. InnoDB: Tuning
• innodb_file_per_table - • innodb_flush_log_at_trx
split InnoDB data into a _commit=1 - logs
file per table, rather than flushed to disk at each
one contiguous file transaction commit.
Guarantees ACID.
• allows optimize table
`table` to run • innodb_log_file_size -
keep it high
• innodb_buffer_pool_size (64-512MB), though
recovery time increases
=(memory*0.8)
(largest=4GB)
21. InnoDB: Tuning II
• SHOW ENGINE INNODB STATUS;
• InnoDB supports both row & statement
based replication
• Assign a primary key, otherwise InnoDB
assigns one for you. Also keep it small - its
appended to every record in secondary
index
22. XtraDB
• InnoDB data dictionary as I_S tables
• Enhanced InnoDB data statistics & SHOW
ENGINE INNODB STATUS
• Lots more... see: http://www.percona.com/
software/percona-server/feature-
comparison/
23. ARCHIVE
• Store large amounts of data without
indexes, in small disk footprint
• SELECT and INSERT operations only
• Good for data audit use
• Uses AZIO (zlib) compression
24. FederatedX
• Create logical pointers to tables that exist on other MySQL servers;
these can then be linked together to form one logical database
• A federated table pointing to an InnoDB table on another server, will
have transaction support (in 5.1)
• Capabilities limited to underlying engine on remote server
• CREATE TABLE t1 (...) ENGINE=FEDERATED
CONNECTION='mysql://username:pwd@myhost:3306/db_name/
tbl_name
• Can also be used for synchronous replication
• Federated table on master server pointing to slave; triggers on
master table to write all changes to remote table once applied to
the master
25. Memory
• Previously known as HEAP tables
• In-memory engine
• Hash index used by default, B-Tree available too
• Typical uses?
• Lookup tables
• Session data
• Temporary tables
• Calculation tables
• Cons?
• Server dies, all rows are lost
26. Aria
• Based off the 5.1 code
• 1.0 – crash-safe MyISAM, with cacheable row format
• 1.5 – concurrent INSERT/SELECT
• Soon to be merged, then...
• 2.0: transactional + ACID compliance
• 3.0: high concurrency, online backup
• Goal: ACID compliant, MVCC transactional storage engine, based on
MyISAM
• Target? Data warehousing
• Uses big log files (1GB by default)
• 8K pages used by default (MyISAM uses 1K pages)
27. PBXT
• MVCC, transactional, ACID compliant, foreign key
support
• row-level locking for updates, so maximum concurrency
• immediate notification if client processes are
deadlocked
• write-once, as it uses a log-based architecture (write
data to DB without first writing to transaction log)
• support for BLOB streaming with Blob Streaming
engine
28. SphinxSE
• CREATE TABLE t1 (..) • monitor it - SHOW
ENGINE=SPHINX ENGINE SPHINX
CONNECTION=”sphin STATUS;
x://localhost:9312/test”;
• can JOIN a SphinxSE
• SELECT * from t1 search table and tables
WHERE query=‘test using other engines as
it;mode=any’; well
• matching modes, • https://kb.askmonty.org/
limits, filters, ranges v/about-sphinxse
supported
29. Storage Engine API
• http://forge.mysql.com/wiki/
MySQL_Internals_Custom_Engine
• http://dev.mysql.com/tech-resources/articles/
creating-new-storage-engine.html
• SHOW PLUGINS;
• https://kb.askmonty.org/v/extending-create-table
• storage/example/ha_example.cc and storage/
example/ha_example.h
30. Writing your own
• Find the plugin path - show variables like “%plugins%”;
+---------------+-----------------------------------+
| Variable_name | Value |
+---------------+-----------------------------------+
| plugin_dir | /opt/maria/5.2.6/lib/mysql/plugin |
+---------------+-----------------------------------+
• note that this is also where you store UDFs
• Copy the relevant engine (eg. myengine.so)
• INSTALL PLUGIN myengine SONAME 'myengine.so';
• Server registers plugin to mysql.plugin table, and now ENGINE=myengine will
work
31. Things to think about
• Backup is not engine-independent
• MyISAM, InnoDB, PBXT
• LVM/ZFS snapshots solve this
• Different engines have different monitoring
options
• Mix and match; use summary tables
32. What others are using
• Wordpress (blog): uses default engine, MyISAM is fine
• MediaWiki (wiki): prefers InnoDB, except for
“searchindex” table, which is MyISAM
• http://svn.wikimedia.org/viewvc/mediawiki/trunk/
phase3/maintenance/tables.sql?view=markup
• vBulletin (forum): MyISAM
• SugarCRM (CRM): MyISAM (with conversion script to
InnoDB provided)
• Zimbra Collaboration Suite: InnoDB