SlideShare una empresa de Scribd logo
1 de 38
Descargar para leer sin conexión
2
Introducing MessageBird
MessageBird is a cloud communications platform
founded in Amsterdam 2011.
Examples of our messaging and voice SaaS:
SMS in and out, call in (IVR) and out (alert), SIP,
WhatsApp, Facebook, Telegram, Twitter, WeChat, …
Omni-Channel Conversation
Details at www.messagebird.com
225+ Direct-to-Carrier Agreements
With operators from around the world
15,000+ Customers
In over 60+ countries
180+ Employees
Engineering office in Amsterdam
Sales and support offices worldwide
We are expanding :
{Software, Front-End, Infrastructure, Data, Security, Telecom, QA} Engineers
{Team, Tech, Product} Leads, Product Owners, Customer Support
{Commercial, Connectivity, Partnership} Managers
www.messagebird.com/careers
3
Summary
(Demystifying MySQL Replication Crash Safety – HLMoscow2018)
• Helicopter view of – and then Zoom in – Replication and Crash Safety
• MySQL 5.6 solution (and its problems)
• Complexifying things with GTIDs and Multi-Threaded Slave (MTS)
• Impacts of reducing / compromising durability
(sync_binlog != 1 and trx_commit != 1)
• Overview of related subjects: Semi-Sync, MariaDB & Pseudo-GTIDs
• Closing, links, bugs and questions
Overview of MySQL Replication
(Demystifying MySQL Replication Crash Safety – HLMoscow2018)
One master with one or more slaves:
• The master records transactions in a journal (binary logs); each slave:
• Downloads the journal and saves it locally in the relay logs (IO thread)
• Executes the relay logs on its local database (SQL thread)
• Could also produce binary logs to be a master (log-slave-updates – lsu)
Replication Crash Safety
(Demystifying MySQL Replication Crash Safety – HLMoscow2018)
What do I mean by Replication Crash Safety ?
• When a slave crashes, it is able to resume replication after recovery
(OK if rewinds its state after recovery, as long as it is eventually consistent)
• When a master crashes, slaves are able to resume replicating from it
• All above without sacrificing data consistency
• In other words: ACID is not compromised by a slave or a master crash
(Discussion limited to transactional SE: InnoDB, TokuDB, MyRocks; obviously not MyISAM)
Intermediate masters (IM) qualify both as master and slave
Slaves are potential master (and IM) in some failover strategy
(Proving replication crash un-safety is easy, proving safety is hard) 5
6
State of the Dolphin and of the Sea Lion
(Demystifying MySQL Replication Crash Safety – HLMoscow2018)
State of the Dolphin in Replication Crash Safety:
• MySQL 5.5 is not crash safe
• MySQL 5.6 can be made crash safe (it is not by default)
• MySQL 5.7 is mostly the same as 5.6
(with complexity added by Logical Lock parallel replication)
• MySQL 8.0 is crash safe by default
(but it can be made unsafe by “tuning” the configuration)
Quick state of the Sea Lion:
• MariaDB 5.5 is not replication crash safe
• MariaDB 10.x can be made crash safe
7
Zoom in the details [1 of 3]
(Demystifying MySQL Replication Crash Safety – HLMoscow2018)
More details about replication:
• The IO Thread stores its state in master info (also configuration stored there)
• The SQL Thread in relay log info
slave1 [localhost] {msandbox} ((none)) > show slave statusG
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
[...]
Master_Log_File: mysql-bin.000001 <-------+-- master info (persisted state)
Read_Master_Log_Pos: 25489 <-------+
Relay_Log_File: mysql-relay.000002 <--+
Relay_Log_Pos: 10788 <--+
Relay_Master_Log_File: mysql-bin.000001 <--+-- relay log info (persisted state)
[...] |
Exec_Master_Log_Pos: 10575 <--+
[...]
1 row in set (0.00 sec)
More parameters: sync_master_info, sync_relay_log and sync_relay_log_info
In MySQL 5.5, master info and relay log info are files:
• No atomicity of “making progress” and “state tracking” for IO & SQL Threads
• Consistency of actual vs registered state is compromised after a crash
Ø This is why replication is not crash-safe in MySQL 5.5 8
Zoom in the details [2 of 3]
(Demystifying MySQL Replication Crash Safety – HLMoscow2018)
9
Zoom in the details [3 of 3]
(Demystifying MySQL Replication Crash Safety – HLMoscow2018)
Even more parameters:
• sync_binlog (and innodb_flush_log_at_trx_commit – trx_commit):
• Binlogs are synchronised to disk after every N writes/transactions
(default 0 in My|SQL 5.5 and 5.6; and in 5.7 and 8.0 it is 1 which is full ACID)
• trx_commit = 1: logs written and flushed each trx (full ACID and default)
= 0: written and flushed once per second (not crash safe)
= 2: written after each trx and flushed once per second
(mysqld crash safe, not OS crash safe)
MySQL 5.6 solution [1 of 4]
(Demystifying MySQL Replication Crash Safety – HLMoscow2018)
Reminder: problems making MySQL 5.5 Replication Crash Un-Safe:
• The position of the SQL Thread cannot be trusted
• The position of the IO Thread cannot be trusted
• The content of the Relay Logs cannot be trusted
10
MySQL 5.6 solution [2 of 4]
(Demystifying MySQL Replication Crash Safety – HLMoscow2018)
The MySQL 5.6 solution
• Atomicity for SQL Thread: relay-log-info-repository = TABLE (default = FILE)
• Useless for crash safety: a parameter to store master info in a table:
• master-info-repository = TABLE (default = FILE)
• Providing a way to “fix” the relay logs: relay-log-recovery = 1 (default = 0)
12
MySQL 5.6 solution [3 of 4]
(Demystifying MySQL Replication Crash Safety – HLMoscow2018)
More details about Relay Log Recovery:
• relay-log-recovery is only used on mysqld startup (dynamic would be useless)
• If relay-log-recovery = 0, nothing special done (and a new relay log is created)
• If relay-log-recovery = 1:
• The position of the IO Thread is set to the position of the SQL Thread
• The position of the SQL Thread is set to the newly created relay log
• If relay-log-purge = 1: the old relay logs will be deleted on SQL Thread startup
(relay-log-recovery does not delete anything: easy to test with skip-slave-start)
Ø Said otherwise, the previous relay logs are skipped !
(those relay logs are considered improper for SQL Thread consumption)
• This will happen even if MySQL (or the IO Thread) did not crash
OK for 1st implementation but a waste of perfectly good relay logs
13
MySQL 5.6 solution [4 of 4]
(Demystifying MySQL Replication Crash Safety – HLMoscow2018)
In MySQL 5.7:
• No change of defaults (for replication crash safety)
• Relay log recovery still simplistic K
In MySQL 8.0:
• Still simplistic relay log recovery L
• New defaults:
• relay-log-info-repository = TABLE J
• relay-log-recovery = 1 J
• master-info-repository = TABLE (not sure this is very useful)
Bug#74323: Avoid overloading the master NIC on relay-log-recovery of a lagging slave
Bug#74321: Execute relay-log-recovery only when needed
Adding complexity with GTIDs [1 of 2]
(Demystifying MySQL Replication Crash Safety – HLMoscow2018)
Not only MySQL 5.6 introduces replication crash safety,
it also introduced Global Transaction IDs (GTIDs)
• This tags every transaction with an ID when writing to the binlogs
• The GTID state of the master and slaves are tracked in the binlogs
Ø IO and SQL Thread states are now partially in the binlogs (and relay logs)
• Optionally, slaves can use GTID to replicate (instead of file+position)
• This allows easier repointing of slaves to a new master (including fail over)
• This heavily relies on precise tracking of GTID states on master and slaves
Ø As this tracking is in the binlogs, this is impeded when sync_binlog != 1
Bug#70659: Make crash safe slave with gtid + less durable settings
14
Adding complexity with GTIDs [2 of 2]
(Demystifying MySQL Replication Crash Safety – HLMoscow2018)
To make replication crash safe with GTIDs in MySQL 5.6:
• relay-log-info-repository = TABLE (default = FILE)
• relay-log-recovery = 1 (default = 0) – (Bug#92093)
• sync_binlog = 1 (default = 0)
• In 5.7, the default is sync_binlog = 1 J (two other unchanged K)
• In 8.0, all the defaults are good for crash safe replication with GTID J J
• MySQL 5.7 adds a table for storing the GTID state of slaves:
• Allows GTIDS slaves without log-slave-updates (lsu)
• With lsu, this table (mysql.gtid_executed) is not updated after each trx
Ø Missed opportunity for OS crash safety with sync_binlog != 1 L L L
Bug#92109: Make GTID replication crash safe with less durable setting 15
16
Master Replication Crash Safety [1 of 5]
(Demystifying MySQL Replication Crash Safety – HLMoscow2018)
Relaxing durability of the binlogs implies losing GTID state (after an OS crash)
• What about the consequence on the master ? With and without GTID ?
• If sync_binlog != 1 on the master, an OS crash will lose binlogs
• With sync_binlog != 1, usually trx_commit != 1 (normally 2, but can be 0)
• trx_commit = 2 preserves data on mysqld crashes, 0 does not (à 2 is better)
Ø InnoDB will also lose transactions on an OS crash
Ø After an OS crash, InnoDB will be out-of-sync with the binlogs
Ø And we cannot trust the binlogs on such master (trx gap or ghost trx)
The failure mode will be different depending on the configuration
17
Master Replication Crash Safety [2 of 5]
(Demystifying MySQL Replication Crash Safety – HLMoscow2018)
With file+position
• IO Thread in vanished binlogs
• So slaves executed phantom trx
(ghost in binlogs, maybe not in InnoDB)
• When the master is restarted:
• It records trx in new binlog file
• Most slaves are broken, and they might be out-of-sync with each-others
• Some lagging slave might skip vanished binlogs
18
Master Replication Crash Safety [2 of 5]
(Demystifying MySQL Replication Crash Safety – HLMoscow2018)
With file+position
• IO Thread in vanished binlogs
• So slaves executed phantom trx
(ghost in binlogs, maybe not in InnoDB)
• When the master is restarted:
• It records trx in new binlog file
• Most slaves are broken, and they might be out-of-sync with each-others
• Some lagging slave might skip vanished binlogs
Ø Broken slaves have more data than the master (à data drift)
Ø And different data drift on “lucky” lagging slaves that might not break
19
Master Replication Crash Safety [3 of 5]
(Demystifying MySQL Replication Crash Safety – HLMoscow2018)
With GTID enabled
• Slave also executed ghost
trx vanished from binlogs
• But those are in their GTID state
• A recovered master reuses
GTIDs of the vanished trx
• Slaves magically reconnect to the master (MASTER_AUTO_POSITION = 1)
1. If master has not reused all ghost GTIDs, then the slave breaks
2. If it has, then the slave skips the new transactions à more data drift
(in illustration, the slave will skip new 50 to 58 as it has the old one)
20
Master Replication Crash Safety [4 of 5]
(Demystifying MySQL Replication Crash Safety – HLMoscow2018)
With GTID enabled but MASTER_AUTO_POSITION = 0
• Left as an exercise to the audience…
On the consequences of sync_binlog != 1 (part #1)
https://jfg-mysql.blogspot.com/2018/10/consequences-sync-binlog-neq-1-part-1.html
(more posts to be published in the series)
Master Replication Crash Safety [5 of 5]
(Demystifying MySQL Replication Crash Safety – HLMoscow2018)
Summary of running with sync_binlog != 1:
• The binlogs – of the master or slave – cannot be trusted after an OS crash
• On a master, having mysqld normally restarts after such a crash leads to data drift
Ø After an OS crash, make sure no slaves reconnect to the recovered master
(OFFLINE_MODE = ON in config file – failing-over to a slave is the way forward)
• On slaves, having mysqld restarts after such a crash leads to truncated binlogs
Ø After an OS crash, consider purging all binlogs on the recovered slave
• Intermediate Masters (IM) are both master and slaves
Ø After an OS crash make sure no slaves reconnect to the recovered IM
Ø And consider purging all binary logs on it
• Remember: GTID state corrupted on slaves after OS crash (Bug#92109) 21
22
Adding complexity with MTS [1 of 4]
(Demystifying MySQL Replication Crash Safety – HLMoscow2018)
Multi-Threaded Slave (MTS) in MySQL 5.6 is doing out-of-order committing
• Same for MySQL 5.7 with DATABASE and LOGICAL_CLOCK types
• LOGICAL_CLOCK also has the slave_preserve_commit_order option
(OFF by default in 5.7 and 8.0 K, with ON requiring log-slave-updates L)
(Bug#75396: Allow slave_preserve_commit_order without log-slave-updates)
Example: transactions A, B, C, D, E on the master
• On a slave, SHOW SLAVE STATUS points to B, so A is committed
• C and E are also committed, B is running and D is pending scheduling
(maybe B and D are in the same schema with DATABASE type)
With out-of-order commit, a file+position in relay log info is not enough
• GTID allows tracking complex position (generating temporary holes on slaves)
• And there is the mysql.slave_worker_info table
(https://dev.mysql.com/worklog/task/?id=5599: for more details)
23
Adding complexity with MTS [2 of 4]
(Demystifying MySQL Replication Crash Safety – HLMoscow2018)
Without GTID, resuming replication after a crash needs filling the gap in trx
• Manual, error-prone, and not always possible before 5.6.31 and 5.7.13 (Bug#77496)
• Now, automated by doing START SLAVE UNTIL SQL_AFTER_MTS_GAPS
• But this needs relay logs, which might have vanished after an OS crash (Bug#81840)
24
Adding complexity with MTS [3 of 4]
(Demystifying MySQL Replication Crash Safety – HLMoscow2018)
• Bug#81840 makes MTS with File+Position OS crash unsafe (safe for mysqld crash)
• Hard to accept workaround: sync_relay_log = 1 (performance killer)
• Full state in mysql.slave_worker_info à recovery possible with a lot of effort
• The good solution would be a better relay log recovery (Bug#93081)
25
Adding complexity with MTS [4 of 4]
(Demystifying MySQL Replication Crash Safety – HLMoscow2018)
With GTID, MTS in MySQL 5.6, 5.7 & 8.0 is replication crash safe:
• But it needs MASTER_AUTO_POSITION = 1 (and relay log recovery Bug#92093)
• And it comes with all the GTID “goodies” (rogue transactions, lsu for 5.6, …)
• Also needs sync_binlog = 1 (if 5.7+, also works without binlogs or lsu off)
• And care with sync_binlog != 1 on the master (need to fail over if OS crash)
(sync_binlog != 1 should not be needed in 95% of cases)
(Group Commit and MTS make this optimisation almost obsolete)
Example: A, B, C, D, E on the master with GTID 10, 11, 12, 13, 14:
• GTID executed on the slave is 1-10:12:14 before a crash
• Replication resumes by fetching 11:13:15… (after relay log recovery)
26
Adding complexity with MTS [4 of 4]
(Demystifying MySQL Replication Crash Safety – HLMoscow2018)
With GTID, MTS in MySQL 5.6, 5.7 & 8.0 is should be crash safe :
• But it needs MASTER_AUTO_POSITION = 1 (and relay log recovery Bug#92093)
• And it comes with all the GTID “goodies” (rogue transactions, lsu for 5.6, …)
• Also needs sync_binlog = 1 (if 5.7+, also works without binlogs or lsu off)
• And care with sync_binlog != 1 on the master (need to fail over if OS crash)
Bug#92882: MTS not replication crash-safe with GTID and all the right parameters
(Only applies to Operating System crashes)
Example: A, B, C, D, E on the master with GTID 10, 11, 12, 13, 14:
• GTID executed on the slave is 1-10:12:14 before Operating System crash
• Relay log recovery tries to “fill the gaps” but fails because relay logs are gone
(This might be a regression from the fix of Bug#77496)
(Easy workaround: stop slave; reset slave; start slave;)
27
Related subjects – Semi-Sync
(Demystifying MySQL Replication Crash Safety – HLMoscow2018)
In this talk, we did not cover master failover explicitly,
when a master crashes in an unrecoverable way, failover needs to happen
When failing-over to a slave, committed transactions can be lost
(Some transactions on the crashed master might not have reached slaves)
à violation of durability (ACID) in the replication topology (distributed system)
Except if lossless semi-sync is used, more details in:
Question about Semi-Synchronous Replication: the Answer with All the Details
https://www.percona.com/community-blog/2018/08/23/question-about-semi-synchronous-
replication-answer-with-all-the-details/
28
Related subjects – MariaDB
(Demystifying MySQL Replication Crash Safety – HLMoscow2018)
MariaDB still stores its master info and relay log info in files
• But it stores GTID state of slaves in the mysql.gtid_slave_pos table
Ø MariaDB is replication crash safe when using GTID slave positioning
Also, it has an interesting feature:
• If using more than one storage engine, a single state table is not optimal
• Having one such table per storage engine could be better
Improving replication with multiple storage engines (MariaDB 10.3)
https://kristiannielsen.livejournal.com/19223.html
29
Related subjects – Pseudo GTIDs
(Demystifying MySQL Replication Crash Safety – HLMoscow2018)
Pseudo-GTIDs:
• A way to get GTID-like features without GTIDs
• They work with any version of MySQL/MariaDB (even 5.5)
• But they assume in-order-commit à does not work with MTS
They can provide slave replication crash safety:
• With log-slave-updates and sync_binlog = 1
• Even on MySQL 5.5 or MariaDB 5.5
https://github.com/github/orchestrator/blob/master/docs/pseudo-gtid.md
Conclusion
(Demystifying MySQL Replication Crash Safety – HLMoscow2018)
• It is complicated and it depends…
• It has many edge cases
• It might still change as bugs are fixed
• And hopefully improvements will be made
• So sorry: there is no short version
Conclusion [2 of 5]
(Demystifying MySQL Replication Crash Safety – HLMoscow2018)
Some parameters never impact/improve Replication Crash Safety:
• master-info-repository, sync_master_info, sync_relay_log_info
Some parameters are always needed for Replication Crash Safety:
• relay-log-info-repository = TABLE
• relay-log-recovery = 1
32
Conclusion [3 of 5]
(Demystifying MySQL Replication Crash Safety – HLMoscow2018)
MySQL 5.6 with GTID (with and without MTS) à crash safe slave if:
• All above with sync_binlog = 1 (not default) and MASTER_AUTO_POSITION = 1
and maybe a “stop slave; reset slave; start slave;” (Bug#92882)
MySQL 5.6 without GTID and with MTS à not always crash safe slaves:
• OK for MySQL crashes as relay logs are not lost
• For OS crashes, losing the relay logs leads to replication breakage (Bug#81840)
• Possible to recover with some voodoo and dark magic (Bug#93081)
For master and slaves, binlogs cannot be trusted if sync_binlog != 1
33
Conclusion [4 of 5]
(Demystifying MySQL Replication Crash Safety – HLMoscow2018)
MySQL 5.7 is mostly the same as 5.6:
• sync_binlog = 1 is the default J
• Will be crash safe with GTID and sync_binlog != 1 when Bug#92109 fixed
• LOGICAL_CLOCK with slave_preserve_commit_order like single-threaded
• Without slave_preserve_commit_order, same as MTS in 5.6
MySQL 8.0 is mostly the same as 5.7 with safer defaults:
• relay-log-info-repository = TABLE J
• relay-log-recovery = 1 J
• But default for slave_preserve_commit_order is still 0 K
34
Conclusion [5 of 5]
(Demystifying MySQL Replication Crash Safety – HLMoscow2018)
Care with MTS as it has many traps
And in all cases:
• Relay log recovery needs to re-download relay logs from the master
• High load in case of lagging (or delayed) slaves L
• Will fail if the binary logs were purged from the master L
• Relay log recovery also fails for MTS and OS crashes (vanished relay logs)
L L L
We need a better Relay Log Recovery !
Bug#74321, Bug#74323, Bug#74324, Bug#81840
Bug#92882, Bug#93081
35
Links [1 of 3]
(Demystifying MySQL Replication Crash Safety – HLMoscow2018)
Crash-Safe MySQL Replication - A Visual Guide
https://hackmongo.com/post/crash-safe-mysql-replication-a-visual-guide/
(diagrams in this talk are inspired by this post)
Jean-François’s blog posts about Replication Crash Safety:
• Better Crash-safe replication for MySQL
https://medium.com/booking-com-infrastructure/better-crash-safe-replication-for-mysql-a336a69b317f
• Replication crash safety with MTS in MySQL 5.6 and 5.7: reality or illusion?
https://jfg-mysql.blogspot.com/2016/01/replication-crash-safety-with-mts.html
• A discussion about sync-master-info and other replication parameters
https://jfg-mysql.blogspot.com/2016/08/discussion-about-sync-master-info-and-replication-parameters.html
• On the consequences of sync_binlog != 1 (part #1)
https://jfg-mysql.blogspot.com/2018/10/consequences-sync-binlog-neq-1-part-1.html
36
Links [2 of 3]
(Demystifying MySQL Replication Crash Safety – HLMoscow2018)
Directly related bugs:
• Bug#70669: Slave can't continue repl. after master's recovery (old – 5.6.14, and fixed – 5.6.17)
• Bug#70659: Make crash safe slave work with gtid + less durable settings
• Bug#74321: Execute relay-log-recovery only when needed
• Bug#74323: Avoid overloading the master NIC on relay-log-recovery of a lagging slave
• Bug#74324: Make keeping relay logs (relay_log_purge = 0) crash safe
• Bug#77496: Replication position lost after crash on MTS configured slave (really fixed ?)
• Bug#81840: Automatic Replication Recovery Does Not Handle Lost Relay Log Events
• Bug#92093: Replication crash safety needs relay_log_recovery even with GTID
• Bug#92109: Please make replication crash safe with GITD and less durable setting (bis)
• Bug#92882: MTS not replication crash-safe with GTID and all the right parameters
• Bug#93081: Please implement a better relay log recovery
Somehow related bugs:
• Bug#75396: Allow slave_preserve_commit_order without log-slave-updates
• Bug#92891: Please make relay_log_space_limit dynamic
Links [3 of 3]
(Demystifying MySQL Replication Crash Safety – HLMoscow2018)
• Question about Semi-Synchronous Replication: the Answer with All the Details
https://www.percona.com/community-blog/2018/08/23/question-about-semi-synchronous-replication-answer-with-all-the-details/
• Improving replication with multiple storage engines (in MariaDB 10.3)
https://kristiannielsen.livejournal.com/19223.html
• Pseudo-GTID and Orchestrator:
https://github.com/github/orchestrator/blob/master/docs/pseudo-gtid.md
https://speakerdeck.com/shlominoach/pseudo-gtid-and-easy-mysql-replication-topology-management
• The Full MySQL and MariaDB Parallel Replication Tutorial
https://www.slideshare.net/JeanFranoisGagn/the-full-mysql-and-mariadb-parallel-replication-tutorial
• Arg: relay_log_space_limit is (still) not dynamic !
https://jfg-mysql.blogspot.com/2018/10/arg-relay-log-space-limit-is-still-not-dynamic.html
• Evaluating MySQL Parallel Replication Part 2: Slave Group Commit
https://medium.com/booking-com-infrastructure/evaluating-mysql-parallel-replication-part-2-slave-group-commit-459026a141d2
• Evaluating MySQL Parallel Replication Part 4: More Benchmarks in Production
https://medium.com/booking-com-infrastructure/evaluating-mysql-parallel-replication-part-4-more-benchmarks-in-production-49ee255043ab
Thanks !
Presented at HighLoad++ 2018 in Moscow
by Jean-François Gagné
Senior Infrastructure Engineer / System and MySQL Expert
jeanfrancois AT messagebird DOT com

Más contenido relacionado

La actualidad más candente

MySQL Multi-Source Replication for PL2016
MySQL Multi-Source Replication for PL2016MySQL Multi-Source Replication for PL2016
MySQL Multi-Source Replication for PL2016
Wagner Bianchi
 

La actualidad más candente (20)

MySQL Parallel Replication (LOGICAL_CLOCK): all the 5.7 (and some of the 8.0)...
MySQL Parallel Replication (LOGICAL_CLOCK): all the 5.7 (and some of the 8.0)...MySQL Parallel Replication (LOGICAL_CLOCK): all the 5.7 (and some of the 8.0)...
MySQL Parallel Replication (LOGICAL_CLOCK): all the 5.7 (and some of the 8.0)...
 
MySQL 5.6 GTID in a nutshell
MySQL 5.6 GTID in a nutshellMySQL 5.6 GTID in a nutshell
MySQL 5.6 GTID in a nutshell
 
MySQL/MariaDB Parallel Replication: inventory, use-case and limitations
MySQL/MariaDB Parallel Replication: inventory, use-case and limitationsMySQL/MariaDB Parallel Replication: inventory, use-case and limitations
MySQL/MariaDB Parallel Replication: inventory, use-case and limitations
 
MySQL Parallel Replication: inventory, use-case and limitations
MySQL Parallel Replication: inventory, use-case and limitationsMySQL Parallel Replication: inventory, use-case and limitations
MySQL Parallel Replication: inventory, use-case and limitations
 
How Booking.com avoids and deals with replication lag
How Booking.com avoids and deals with replication lagHow Booking.com avoids and deals with replication lag
How Booking.com avoids and deals with replication lag
 
Yahoo: Experiences with MySQL GTID and Multi Threaded Replication
Yahoo: Experiences with MySQL GTID and Multi Threaded ReplicationYahoo: Experiences with MySQL GTID and Multi Threaded Replication
Yahoo: Experiences with MySQL GTID and Multi Threaded Replication
 
MySQL GTID Concepts, Implementation and troubleshooting
MySQL GTID Concepts, Implementation and troubleshooting MySQL GTID Concepts, Implementation and troubleshooting
MySQL GTID Concepts, Implementation and troubleshooting
 
Best practices for MySQL High Availability
Best practices for MySQL High AvailabilityBest practices for MySQL High Availability
Best practices for MySQL High Availability
 
MySQL 5.6 Replication Webinar
MySQL 5.6 Replication WebinarMySQL 5.6 Replication Webinar
MySQL 5.6 Replication Webinar
 
MySQL Parallel Replication: inventory, use-cases and limitations
MySQL Parallel Replication: inventory, use-cases and limitationsMySQL Parallel Replication: inventory, use-cases and limitations
MySQL Parallel Replication: inventory, use-cases and limitations
 
MySQL Parallel Replication: inventory, use-case and limitations
MySQL Parallel Replication: inventory, use-case and limitationsMySQL Parallel Replication: inventory, use-case and limitations
MySQL Parallel Replication: inventory, use-case and limitations
 
M|18 Under the Hood: Galera Cluster
M|18 Under the Hood: Galera ClusterM|18 Under the Hood: Galera Cluster
M|18 Under the Hood: Galera Cluster
 
High Availability with Galera Cluster - SkySQL Road Show 2013 in Berlin
High Availability with Galera Cluster - SkySQL Road Show 2013 in BerlinHigh Availability with Galera Cluster - SkySQL Road Show 2013 in Berlin
High Availability with Galera Cluster - SkySQL Road Show 2013 in Berlin
 
Riding the Binlog: an in Deep Dissection of the Replication Stream
Riding the Binlog: an in Deep Dissection of the Replication StreamRiding the Binlog: an in Deep Dissection of the Replication Stream
Riding the Binlog: an in Deep Dissection of the Replication Stream
 
MySQL Multi-Source Replication for PL2016
MySQL Multi-Source Replication for PL2016MySQL Multi-Source Replication for PL2016
MySQL Multi-Source Replication for PL2016
 
Running gtid replication in production
Running gtid replication in productionRunning gtid replication in production
Running gtid replication in production
 
M|18 Battle of the Online Schema Change Methods
M|18 Battle of the Online Schema Change MethodsM|18 Battle of the Online Schema Change Methods
M|18 Battle of the Online Schema Change Methods
 
What you wanted to know about MySQL, but could not find using inernal instrum...
What you wanted to know about MySQL, but could not find using inernal instrum...What you wanted to know about MySQL, but could not find using inernal instrum...
What you wanted to know about MySQL, but could not find using inernal instrum...
 
MariaDB Enterprise & MariaDB Enterprise Cluster - MariaDB Webinar July 2014
MariaDB Enterprise & MariaDB Enterprise Cluster - MariaDB Webinar July 2014MariaDB Enterprise & MariaDB Enterprise Cluster - MariaDB Webinar July 2014
MariaDB Enterprise & MariaDB Enterprise Cluster - MariaDB Webinar July 2014
 
Galera Cluster 3.0 Features
Galera Cluster 3.0 FeaturesGalera Cluster 3.0 Features
Galera Cluster 3.0 Features
 

Similar a Demystifying MySQL Replication Crash Safety

Profiling the logwriter and database writer
Profiling the logwriter and database writerProfiling the logwriter and database writer
Profiling the logwriter and database writer
Enkitec
 
Replication features, technologies and 3rd party Extinction
Replication features, technologies and 3rd party ExtinctionReplication features, technologies and 3rd party Extinction
Replication features, technologies and 3rd party Extinction
Ben Mildren
 
Buytaert kris my_sql-pacemaker
Buytaert kris my_sql-pacemakerBuytaert kris my_sql-pacemaker
Buytaert kris my_sql-pacemaker
kuchinskaya
 
2012 replication
2012 replication2012 replication
2012 replication
sqlhjalp
 
MySQL replication best practices 105-232-931
MySQL replication best practices 105-232-931MySQL replication best practices 105-232-931
MySQL replication best practices 105-232-931
Baruch Osoveskiy
 
Webinar Slides: Migrating to Galera Cluster
Webinar Slides: Migrating to Galera ClusterWebinar Slides: Migrating to Galera Cluster
Webinar Slides: Migrating to Galera Cluster
Severalnines
 

Similar a Demystifying MySQL Replication Crash Safety (20)

MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)
 
M|18 How Facebook Migrated to MyRocks
M|18 How Facebook Migrated to MyRocksM|18 How Facebook Migrated to MyRocks
M|18 How Facebook Migrated to MyRocks
 
MySQL highav Availability
MySQL highav AvailabilityMySQL highav Availability
MySQL highav Availability
 
Errant GTIDs breaking replication @ Percona Live 2019
Errant GTIDs breaking replication @ Percona Live 2019Errant GTIDs breaking replication @ Percona Live 2019
Errant GTIDs breaking replication @ Percona Live 2019
 
Scalar DB: A library that makes non-ACID databases ACID-compliant
Scalar DB: A library that makes non-ACID databases ACID-compliantScalar DB: A library that makes non-ACID databases ACID-compliant
Scalar DB: A library that makes non-ACID databases ACID-compliant
 
Profiling the logwriter and database writer
Profiling the logwriter and database writerProfiling the logwriter and database writer
Profiling the logwriter and database writer
 
Profiling the logwriter and database writer
Profiling the logwriter and database writerProfiling the logwriter and database writer
Profiling the logwriter and database writer
 
RocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesRocksDB Performance and Reliability Practices
RocksDB Performance and Reliability Practices
 
DBCC - Dubi Lebel
DBCC - Dubi LebelDBCC - Dubi Lebel
DBCC - Dubi Lebel
 
[db tech showcase Tokyo 2014] B15: Scalability with MariaDB and MaxScale by ...
[db tech showcase Tokyo 2014] B15: Scalability with MariaDB and MaxScale  by ...[db tech showcase Tokyo 2014] B15: Scalability with MariaDB and MaxScale  by ...
[db tech showcase Tokyo 2014] B15: Scalability with MariaDB and MaxScale by ...
 
Best practices for MySQL/MariaDB Server/Percona Server High Availability
Best practices for MySQL/MariaDB Server/Percona Server High AvailabilityBest practices for MySQL/MariaDB Server/Percona Server High Availability
Best practices for MySQL/MariaDB Server/Percona Server High Availability
 
Replication features, technologies and 3rd party Extinction
Replication features, technologies and 3rd party ExtinctionReplication features, technologies and 3rd party Extinction
Replication features, technologies and 3rd party Extinction
 
Buytaert kris my_sql-pacemaker
Buytaert kris my_sql-pacemakerBuytaert kris my_sql-pacemaker
Buytaert kris my_sql-pacemaker
 
2012 replication
2012 replication2012 replication
2012 replication
 
Spil Games @ FOSDEM: Galera Replicator IRL
Spil Games @ FOSDEM: Galera Replicator IRLSpil Games @ FOSDEM: Galera Replicator IRL
Spil Games @ FOSDEM: Galera Replicator IRL
 
Tungsten Use Case: How Gittigidiyor (a subsidiary of eBay) Replicates Data In...
Tungsten Use Case: How Gittigidiyor (a subsidiary of eBay) Replicates Data In...Tungsten Use Case: How Gittigidiyor (a subsidiary of eBay) Replicates Data In...
Tungsten Use Case: How Gittigidiyor (a subsidiary of eBay) Replicates Data In...
 
MySQL replication best practices 105-232-931
MySQL replication best practices 105-232-931MySQL replication best practices 105-232-931
MySQL replication best practices 105-232-931
 
Circonus: Design failures - A Case Study
Circonus: Design failures - A Case StudyCirconus: Design failures - A Case Study
Circonus: Design failures - A Case Study
 
Webinar Slides: Migrating to Galera Cluster
Webinar Slides: Migrating to Galera ClusterWebinar Slides: Migrating to Galera Cluster
Webinar Slides: Migrating to Galera Cluster
 
Replication skeptic
Replication skepticReplication skeptic
Replication skeptic
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 

Demystifying MySQL Replication Crash Safety

  • 1.
  • 2. 2 Introducing MessageBird MessageBird is a cloud communications platform founded in Amsterdam 2011. Examples of our messaging and voice SaaS: SMS in and out, call in (IVR) and out (alert), SIP, WhatsApp, Facebook, Telegram, Twitter, WeChat, … Omni-Channel Conversation Details at www.messagebird.com 225+ Direct-to-Carrier Agreements With operators from around the world 15,000+ Customers In over 60+ countries 180+ Employees Engineering office in Amsterdam Sales and support offices worldwide We are expanding : {Software, Front-End, Infrastructure, Data, Security, Telecom, QA} Engineers {Team, Tech, Product} Leads, Product Owners, Customer Support {Commercial, Connectivity, Partnership} Managers www.messagebird.com/careers
  • 3. 3 Summary (Demystifying MySQL Replication Crash Safety – HLMoscow2018) • Helicopter view of – and then Zoom in – Replication and Crash Safety • MySQL 5.6 solution (and its problems) • Complexifying things with GTIDs and Multi-Threaded Slave (MTS) • Impacts of reducing / compromising durability (sync_binlog != 1 and trx_commit != 1) • Overview of related subjects: Semi-Sync, MariaDB & Pseudo-GTIDs • Closing, links, bugs and questions
  • 4. Overview of MySQL Replication (Demystifying MySQL Replication Crash Safety – HLMoscow2018) One master with one or more slaves: • The master records transactions in a journal (binary logs); each slave: • Downloads the journal and saves it locally in the relay logs (IO thread) • Executes the relay logs on its local database (SQL thread) • Could also produce binary logs to be a master (log-slave-updates – lsu)
  • 5. Replication Crash Safety (Demystifying MySQL Replication Crash Safety – HLMoscow2018) What do I mean by Replication Crash Safety ? • When a slave crashes, it is able to resume replication after recovery (OK if rewinds its state after recovery, as long as it is eventually consistent) • When a master crashes, slaves are able to resume replicating from it • All above without sacrificing data consistency • In other words: ACID is not compromised by a slave or a master crash (Discussion limited to transactional SE: InnoDB, TokuDB, MyRocks; obviously not MyISAM) Intermediate masters (IM) qualify both as master and slave Slaves are potential master (and IM) in some failover strategy (Proving replication crash un-safety is easy, proving safety is hard) 5
  • 6. 6 State of the Dolphin and of the Sea Lion (Demystifying MySQL Replication Crash Safety – HLMoscow2018) State of the Dolphin in Replication Crash Safety: • MySQL 5.5 is not crash safe • MySQL 5.6 can be made crash safe (it is not by default) • MySQL 5.7 is mostly the same as 5.6 (with complexity added by Logical Lock parallel replication) • MySQL 8.0 is crash safe by default (but it can be made unsafe by “tuning” the configuration) Quick state of the Sea Lion: • MariaDB 5.5 is not replication crash safe • MariaDB 10.x can be made crash safe
  • 7. 7 Zoom in the details [1 of 3] (Demystifying MySQL Replication Crash Safety – HLMoscow2018) More details about replication: • The IO Thread stores its state in master info (also configuration stored there) • The SQL Thread in relay log info slave1 [localhost] {msandbox} ((none)) > show slave statusG *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event [...] Master_Log_File: mysql-bin.000001 <-------+-- master info (persisted state) Read_Master_Log_Pos: 25489 <-------+ Relay_Log_File: mysql-relay.000002 <--+ Relay_Log_Pos: 10788 <--+ Relay_Master_Log_File: mysql-bin.000001 <--+-- relay log info (persisted state) [...] | Exec_Master_Log_Pos: 10575 <--+ [...] 1 row in set (0.00 sec)
  • 8. More parameters: sync_master_info, sync_relay_log and sync_relay_log_info In MySQL 5.5, master info and relay log info are files: • No atomicity of “making progress” and “state tracking” for IO & SQL Threads • Consistency of actual vs registered state is compromised after a crash Ø This is why replication is not crash-safe in MySQL 5.5 8 Zoom in the details [2 of 3] (Demystifying MySQL Replication Crash Safety – HLMoscow2018)
  • 9. 9 Zoom in the details [3 of 3] (Demystifying MySQL Replication Crash Safety – HLMoscow2018) Even more parameters: • sync_binlog (and innodb_flush_log_at_trx_commit – trx_commit): • Binlogs are synchronised to disk after every N writes/transactions (default 0 in My|SQL 5.5 and 5.6; and in 5.7 and 8.0 it is 1 which is full ACID) • trx_commit = 1: logs written and flushed each trx (full ACID and default) = 0: written and flushed once per second (not crash safe) = 2: written after each trx and flushed once per second (mysqld crash safe, not OS crash safe)
  • 10. MySQL 5.6 solution [1 of 4] (Demystifying MySQL Replication Crash Safety – HLMoscow2018) Reminder: problems making MySQL 5.5 Replication Crash Un-Safe: • The position of the SQL Thread cannot be trusted • The position of the IO Thread cannot be trusted • The content of the Relay Logs cannot be trusted 10
  • 11. MySQL 5.6 solution [2 of 4] (Demystifying MySQL Replication Crash Safety – HLMoscow2018) The MySQL 5.6 solution • Atomicity for SQL Thread: relay-log-info-repository = TABLE (default = FILE) • Useless for crash safety: a parameter to store master info in a table: • master-info-repository = TABLE (default = FILE) • Providing a way to “fix” the relay logs: relay-log-recovery = 1 (default = 0)
  • 12. 12 MySQL 5.6 solution [3 of 4] (Demystifying MySQL Replication Crash Safety – HLMoscow2018) More details about Relay Log Recovery: • relay-log-recovery is only used on mysqld startup (dynamic would be useless) • If relay-log-recovery = 0, nothing special done (and a new relay log is created) • If relay-log-recovery = 1: • The position of the IO Thread is set to the position of the SQL Thread • The position of the SQL Thread is set to the newly created relay log • If relay-log-purge = 1: the old relay logs will be deleted on SQL Thread startup (relay-log-recovery does not delete anything: easy to test with skip-slave-start) Ø Said otherwise, the previous relay logs are skipped ! (those relay logs are considered improper for SQL Thread consumption) • This will happen even if MySQL (or the IO Thread) did not crash OK for 1st implementation but a waste of perfectly good relay logs
  • 13. 13 MySQL 5.6 solution [4 of 4] (Demystifying MySQL Replication Crash Safety – HLMoscow2018) In MySQL 5.7: • No change of defaults (for replication crash safety) • Relay log recovery still simplistic K In MySQL 8.0: • Still simplistic relay log recovery L • New defaults: • relay-log-info-repository = TABLE J • relay-log-recovery = 1 J • master-info-repository = TABLE (not sure this is very useful) Bug#74323: Avoid overloading the master NIC on relay-log-recovery of a lagging slave Bug#74321: Execute relay-log-recovery only when needed
  • 14. Adding complexity with GTIDs [1 of 2] (Demystifying MySQL Replication Crash Safety – HLMoscow2018) Not only MySQL 5.6 introduces replication crash safety, it also introduced Global Transaction IDs (GTIDs) • This tags every transaction with an ID when writing to the binlogs • The GTID state of the master and slaves are tracked in the binlogs Ø IO and SQL Thread states are now partially in the binlogs (and relay logs) • Optionally, slaves can use GTID to replicate (instead of file+position) • This allows easier repointing of slaves to a new master (including fail over) • This heavily relies on precise tracking of GTID states on master and slaves Ø As this tracking is in the binlogs, this is impeded when sync_binlog != 1 Bug#70659: Make crash safe slave with gtid + less durable settings 14
  • 15. Adding complexity with GTIDs [2 of 2] (Demystifying MySQL Replication Crash Safety – HLMoscow2018) To make replication crash safe with GTIDs in MySQL 5.6: • relay-log-info-repository = TABLE (default = FILE) • relay-log-recovery = 1 (default = 0) – (Bug#92093) • sync_binlog = 1 (default = 0) • In 5.7, the default is sync_binlog = 1 J (two other unchanged K) • In 8.0, all the defaults are good for crash safe replication with GTID J J • MySQL 5.7 adds a table for storing the GTID state of slaves: • Allows GTIDS slaves without log-slave-updates (lsu) • With lsu, this table (mysql.gtid_executed) is not updated after each trx Ø Missed opportunity for OS crash safety with sync_binlog != 1 L L L Bug#92109: Make GTID replication crash safe with less durable setting 15
  • 16. 16 Master Replication Crash Safety [1 of 5] (Demystifying MySQL Replication Crash Safety – HLMoscow2018) Relaxing durability of the binlogs implies losing GTID state (after an OS crash) • What about the consequence on the master ? With and without GTID ? • If sync_binlog != 1 on the master, an OS crash will lose binlogs • With sync_binlog != 1, usually trx_commit != 1 (normally 2, but can be 0) • trx_commit = 2 preserves data on mysqld crashes, 0 does not (à 2 is better) Ø InnoDB will also lose transactions on an OS crash Ø After an OS crash, InnoDB will be out-of-sync with the binlogs Ø And we cannot trust the binlogs on such master (trx gap or ghost trx) The failure mode will be different depending on the configuration
  • 17. 17 Master Replication Crash Safety [2 of 5] (Demystifying MySQL Replication Crash Safety – HLMoscow2018) With file+position • IO Thread in vanished binlogs • So slaves executed phantom trx (ghost in binlogs, maybe not in InnoDB) • When the master is restarted: • It records trx in new binlog file • Most slaves are broken, and they might be out-of-sync with each-others • Some lagging slave might skip vanished binlogs
  • 18. 18 Master Replication Crash Safety [2 of 5] (Demystifying MySQL Replication Crash Safety – HLMoscow2018) With file+position • IO Thread in vanished binlogs • So slaves executed phantom trx (ghost in binlogs, maybe not in InnoDB) • When the master is restarted: • It records trx in new binlog file • Most slaves are broken, and they might be out-of-sync with each-others • Some lagging slave might skip vanished binlogs Ø Broken slaves have more data than the master (à data drift) Ø And different data drift on “lucky” lagging slaves that might not break
  • 19. 19 Master Replication Crash Safety [3 of 5] (Demystifying MySQL Replication Crash Safety – HLMoscow2018) With GTID enabled • Slave also executed ghost trx vanished from binlogs • But those are in their GTID state • A recovered master reuses GTIDs of the vanished trx • Slaves magically reconnect to the master (MASTER_AUTO_POSITION = 1) 1. If master has not reused all ghost GTIDs, then the slave breaks 2. If it has, then the slave skips the new transactions à more data drift (in illustration, the slave will skip new 50 to 58 as it has the old one)
  • 20. 20 Master Replication Crash Safety [4 of 5] (Demystifying MySQL Replication Crash Safety – HLMoscow2018) With GTID enabled but MASTER_AUTO_POSITION = 0 • Left as an exercise to the audience… On the consequences of sync_binlog != 1 (part #1) https://jfg-mysql.blogspot.com/2018/10/consequences-sync-binlog-neq-1-part-1.html (more posts to be published in the series)
  • 21. Master Replication Crash Safety [5 of 5] (Demystifying MySQL Replication Crash Safety – HLMoscow2018) Summary of running with sync_binlog != 1: • The binlogs – of the master or slave – cannot be trusted after an OS crash • On a master, having mysqld normally restarts after such a crash leads to data drift Ø After an OS crash, make sure no slaves reconnect to the recovered master (OFFLINE_MODE = ON in config file – failing-over to a slave is the way forward) • On slaves, having mysqld restarts after such a crash leads to truncated binlogs Ø After an OS crash, consider purging all binlogs on the recovered slave • Intermediate Masters (IM) are both master and slaves Ø After an OS crash make sure no slaves reconnect to the recovered IM Ø And consider purging all binary logs on it • Remember: GTID state corrupted on slaves after OS crash (Bug#92109) 21
  • 22. 22 Adding complexity with MTS [1 of 4] (Demystifying MySQL Replication Crash Safety – HLMoscow2018) Multi-Threaded Slave (MTS) in MySQL 5.6 is doing out-of-order committing • Same for MySQL 5.7 with DATABASE and LOGICAL_CLOCK types • LOGICAL_CLOCK also has the slave_preserve_commit_order option (OFF by default in 5.7 and 8.0 K, with ON requiring log-slave-updates L) (Bug#75396: Allow slave_preserve_commit_order without log-slave-updates) Example: transactions A, B, C, D, E on the master • On a slave, SHOW SLAVE STATUS points to B, so A is committed • C and E are also committed, B is running and D is pending scheduling (maybe B and D are in the same schema with DATABASE type) With out-of-order commit, a file+position in relay log info is not enough • GTID allows tracking complex position (generating temporary holes on slaves) • And there is the mysql.slave_worker_info table (https://dev.mysql.com/worklog/task/?id=5599: for more details)
  • 23. 23 Adding complexity with MTS [2 of 4] (Demystifying MySQL Replication Crash Safety – HLMoscow2018) Without GTID, resuming replication after a crash needs filling the gap in trx • Manual, error-prone, and not always possible before 5.6.31 and 5.7.13 (Bug#77496) • Now, automated by doing START SLAVE UNTIL SQL_AFTER_MTS_GAPS • But this needs relay logs, which might have vanished after an OS crash (Bug#81840)
  • 24. 24 Adding complexity with MTS [3 of 4] (Demystifying MySQL Replication Crash Safety – HLMoscow2018) • Bug#81840 makes MTS with File+Position OS crash unsafe (safe for mysqld crash) • Hard to accept workaround: sync_relay_log = 1 (performance killer) • Full state in mysql.slave_worker_info à recovery possible with a lot of effort • The good solution would be a better relay log recovery (Bug#93081)
  • 25. 25 Adding complexity with MTS [4 of 4] (Demystifying MySQL Replication Crash Safety – HLMoscow2018) With GTID, MTS in MySQL 5.6, 5.7 & 8.0 is replication crash safe: • But it needs MASTER_AUTO_POSITION = 1 (and relay log recovery Bug#92093) • And it comes with all the GTID “goodies” (rogue transactions, lsu for 5.6, …) • Also needs sync_binlog = 1 (if 5.7+, also works without binlogs or lsu off) • And care with sync_binlog != 1 on the master (need to fail over if OS crash) (sync_binlog != 1 should not be needed in 95% of cases) (Group Commit and MTS make this optimisation almost obsolete) Example: A, B, C, D, E on the master with GTID 10, 11, 12, 13, 14: • GTID executed on the slave is 1-10:12:14 before a crash • Replication resumes by fetching 11:13:15… (after relay log recovery)
  • 26. 26 Adding complexity with MTS [4 of 4] (Demystifying MySQL Replication Crash Safety – HLMoscow2018) With GTID, MTS in MySQL 5.6, 5.7 & 8.0 is should be crash safe : • But it needs MASTER_AUTO_POSITION = 1 (and relay log recovery Bug#92093) • And it comes with all the GTID “goodies” (rogue transactions, lsu for 5.6, …) • Also needs sync_binlog = 1 (if 5.7+, also works without binlogs or lsu off) • And care with sync_binlog != 1 on the master (need to fail over if OS crash) Bug#92882: MTS not replication crash-safe with GTID and all the right parameters (Only applies to Operating System crashes) Example: A, B, C, D, E on the master with GTID 10, 11, 12, 13, 14: • GTID executed on the slave is 1-10:12:14 before Operating System crash • Relay log recovery tries to “fill the gaps” but fails because relay logs are gone (This might be a regression from the fix of Bug#77496) (Easy workaround: stop slave; reset slave; start slave;)
  • 27. 27 Related subjects – Semi-Sync (Demystifying MySQL Replication Crash Safety – HLMoscow2018) In this talk, we did not cover master failover explicitly, when a master crashes in an unrecoverable way, failover needs to happen When failing-over to a slave, committed transactions can be lost (Some transactions on the crashed master might not have reached slaves) à violation of durability (ACID) in the replication topology (distributed system) Except if lossless semi-sync is used, more details in: Question about Semi-Synchronous Replication: the Answer with All the Details https://www.percona.com/community-blog/2018/08/23/question-about-semi-synchronous- replication-answer-with-all-the-details/
  • 28. 28 Related subjects – MariaDB (Demystifying MySQL Replication Crash Safety – HLMoscow2018) MariaDB still stores its master info and relay log info in files • But it stores GTID state of slaves in the mysql.gtid_slave_pos table Ø MariaDB is replication crash safe when using GTID slave positioning Also, it has an interesting feature: • If using more than one storage engine, a single state table is not optimal • Having one such table per storage engine could be better Improving replication with multiple storage engines (MariaDB 10.3) https://kristiannielsen.livejournal.com/19223.html
  • 29. 29 Related subjects – Pseudo GTIDs (Demystifying MySQL Replication Crash Safety – HLMoscow2018) Pseudo-GTIDs: • A way to get GTID-like features without GTIDs • They work with any version of MySQL/MariaDB (even 5.5) • But they assume in-order-commit à does not work with MTS They can provide slave replication crash safety: • With log-slave-updates and sync_binlog = 1 • Even on MySQL 5.5 or MariaDB 5.5 https://github.com/github/orchestrator/blob/master/docs/pseudo-gtid.md
  • 30. Conclusion (Demystifying MySQL Replication Crash Safety – HLMoscow2018) • It is complicated and it depends… • It has many edge cases • It might still change as bugs are fixed • And hopefully improvements will be made • So sorry: there is no short version
  • 31. Conclusion [2 of 5] (Demystifying MySQL Replication Crash Safety – HLMoscow2018) Some parameters never impact/improve Replication Crash Safety: • master-info-repository, sync_master_info, sync_relay_log_info Some parameters are always needed for Replication Crash Safety: • relay-log-info-repository = TABLE • relay-log-recovery = 1
  • 32. 32 Conclusion [3 of 5] (Demystifying MySQL Replication Crash Safety – HLMoscow2018) MySQL 5.6 with GTID (with and without MTS) à crash safe slave if: • All above with sync_binlog = 1 (not default) and MASTER_AUTO_POSITION = 1 and maybe a “stop slave; reset slave; start slave;” (Bug#92882) MySQL 5.6 without GTID and with MTS à not always crash safe slaves: • OK for MySQL crashes as relay logs are not lost • For OS crashes, losing the relay logs leads to replication breakage (Bug#81840) • Possible to recover with some voodoo and dark magic (Bug#93081) For master and slaves, binlogs cannot be trusted if sync_binlog != 1
  • 33. 33 Conclusion [4 of 5] (Demystifying MySQL Replication Crash Safety – HLMoscow2018) MySQL 5.7 is mostly the same as 5.6: • sync_binlog = 1 is the default J • Will be crash safe with GTID and sync_binlog != 1 when Bug#92109 fixed • LOGICAL_CLOCK with slave_preserve_commit_order like single-threaded • Without slave_preserve_commit_order, same as MTS in 5.6 MySQL 8.0 is mostly the same as 5.7 with safer defaults: • relay-log-info-repository = TABLE J • relay-log-recovery = 1 J • But default for slave_preserve_commit_order is still 0 K
  • 34. 34 Conclusion [5 of 5] (Demystifying MySQL Replication Crash Safety – HLMoscow2018) Care with MTS as it has many traps And in all cases: • Relay log recovery needs to re-download relay logs from the master • High load in case of lagging (or delayed) slaves L • Will fail if the binary logs were purged from the master L • Relay log recovery also fails for MTS and OS crashes (vanished relay logs) L L L We need a better Relay Log Recovery ! Bug#74321, Bug#74323, Bug#74324, Bug#81840 Bug#92882, Bug#93081
  • 35. 35 Links [1 of 3] (Demystifying MySQL Replication Crash Safety – HLMoscow2018) Crash-Safe MySQL Replication - A Visual Guide https://hackmongo.com/post/crash-safe-mysql-replication-a-visual-guide/ (diagrams in this talk are inspired by this post) Jean-François’s blog posts about Replication Crash Safety: • Better Crash-safe replication for MySQL https://medium.com/booking-com-infrastructure/better-crash-safe-replication-for-mysql-a336a69b317f • Replication crash safety with MTS in MySQL 5.6 and 5.7: reality or illusion? https://jfg-mysql.blogspot.com/2016/01/replication-crash-safety-with-mts.html • A discussion about sync-master-info and other replication parameters https://jfg-mysql.blogspot.com/2016/08/discussion-about-sync-master-info-and-replication-parameters.html • On the consequences of sync_binlog != 1 (part #1) https://jfg-mysql.blogspot.com/2018/10/consequences-sync-binlog-neq-1-part-1.html
  • 36. 36 Links [2 of 3] (Demystifying MySQL Replication Crash Safety – HLMoscow2018) Directly related bugs: • Bug#70669: Slave can't continue repl. after master's recovery (old – 5.6.14, and fixed – 5.6.17) • Bug#70659: Make crash safe slave work with gtid + less durable settings • Bug#74321: Execute relay-log-recovery only when needed • Bug#74323: Avoid overloading the master NIC on relay-log-recovery of a lagging slave • Bug#74324: Make keeping relay logs (relay_log_purge = 0) crash safe • Bug#77496: Replication position lost after crash on MTS configured slave (really fixed ?) • Bug#81840: Automatic Replication Recovery Does Not Handle Lost Relay Log Events • Bug#92093: Replication crash safety needs relay_log_recovery even with GTID • Bug#92109: Please make replication crash safe with GITD and less durable setting (bis) • Bug#92882: MTS not replication crash-safe with GTID and all the right parameters • Bug#93081: Please implement a better relay log recovery Somehow related bugs: • Bug#75396: Allow slave_preserve_commit_order without log-slave-updates • Bug#92891: Please make relay_log_space_limit dynamic
  • 37. Links [3 of 3] (Demystifying MySQL Replication Crash Safety – HLMoscow2018) • Question about Semi-Synchronous Replication: the Answer with All the Details https://www.percona.com/community-blog/2018/08/23/question-about-semi-synchronous-replication-answer-with-all-the-details/ • Improving replication with multiple storage engines (in MariaDB 10.3) https://kristiannielsen.livejournal.com/19223.html • Pseudo-GTID and Orchestrator: https://github.com/github/orchestrator/blob/master/docs/pseudo-gtid.md https://speakerdeck.com/shlominoach/pseudo-gtid-and-easy-mysql-replication-topology-management • The Full MySQL and MariaDB Parallel Replication Tutorial https://www.slideshare.net/JeanFranoisGagn/the-full-mysql-and-mariadb-parallel-replication-tutorial • Arg: relay_log_space_limit is (still) not dynamic ! https://jfg-mysql.blogspot.com/2018/10/arg-relay-log-space-limit-is-still-not-dynamic.html • Evaluating MySQL Parallel Replication Part 2: Slave Group Commit https://medium.com/booking-com-infrastructure/evaluating-mysql-parallel-replication-part-2-slave-group-commit-459026a141d2 • Evaluating MySQL Parallel Replication Part 4: More Benchmarks in Production https://medium.com/booking-com-infrastructure/evaluating-mysql-parallel-replication-part-4-more-benchmarks-in-production-49ee255043ab
  • 38. Thanks ! Presented at HighLoad++ 2018 in Moscow by Jean-François Gagné Senior Infrastructure Engineer / System and MySQL Expert jeanfrancois AT messagebird DOT com