SlideShare una empresa de Scribd logo
1 de 21
Descargar para leer sin conexión
How B.com avoids and deals
with replication lag
Jean-François Gagné – Friday, February 3, 2017
Pre-FOSDEM MySQL day
Booking.com
● Based in Amsterdam since 1996
● Online Hotel/Accommodation/Travel Agent (OTA):
● +1.340.000 properties in 225 countries
● +1.200.000 room nights reserved daily
● +40 languages (website and customer service)
● +13.000 people working in 187 offices worldwide
● Part of the Priceline Group
● And we use MySQL:
● Thousands (1000s) of servers, ~90% replicating
● >150 masters: ~30 >50 slaves & ~10 >100 slaves
2
Booking.com’
● And we are hiring !
● MySQL Engineer / DBA
● System Administrator
● System Engineer
● Site Reliability Engineer
● Developer / Designer
● Technical Team Lead
● Product Owner
● Data Scientist
● And many more…
● https://workingatbooking.com/ 3
Session Summary
1. MySQL replication at Booking.com
2. Replication lag: what/how/why
3. Bad solutions to cope with lag
4. Booking.com solution to cope with lag
5. Improving Booking.com solution
4
MySQL replication at Booking.com
● Typical Booking.com MySQL replication deployment:
+---+
| M |
+---+
|
+------+-- ... --+---------------+-------- ...
| | | |
+---+ +---+ +---+ +---+
| S1| | S2| | Sn| | M1|
+---+ +---+ +---+ +---+
|
+-- ... --+
| |
+---+ +---+
| T1| | Tm|
+---+ +---+
5
Why does lag happen ?
● In which condition can lag be experienced ?
● Too many transactions for replication to keep up:
capacity problem, fix by scaling (sharding, parallel replication, …)
● Long transactions:
self induced, to fix by a developer in the application
● Too aggressive “batch” workload on the master:
optimize the batches or slow them down
6
Lag consequences
● What are the consequences of lag ?
● Stale reads on slaves (but this is not necessarily a problem)
● When do stale reads become a problem ?
● A user changes his email address but still sees the old one
● A hotel changes its inventory but still sees old availability
● A user books a hotel but does not see it in his reservations
7
Bad solution to cope with lag
● Bad solution #1: falling back to reading from the master
● If slaves are lagging, maybe we should read from the master
● This looks like an attractive solution to avoid stale reads
● But this does not scale (why are you reading from slaves…)
● This will cause a sudden load on the master (in case of lag)
● And it might cause an outage on the master (and this would be bad)
● It might be better to fail a read than to fallback to (and kill) the master
8
Bad solution to cope with lag (bis)
● Bad solution #2: retry on another slave
● When reading from a slave: if lag, then retry on another slave
● This scales better and is OK-ish (when few slaves are lagging)
● But what happens if all slaves are lagging ?
● Increased load (retries) can slowdown replication
● This might overload the slaves and cause a good slave to start lagging
● In the worst case, this might kill slaves and cause a domino effect
● Again: probably better to fail a read than to cause a bigger problem
9
Coping with lag @ Booking.com
● Booking.com solution: “waypoint”
● Creating a waypoint is similar to creating a “read view”
● Waiting for a waypoint is similar to waiting for a slave to catch-up
● Booking.com implementation:
● Table: db_waypoint (a waypoint is a row in that table)
● API function: commit_wait(timeout)  (err_code, waypoint)
● INSERTs a waypoint and waits – until timeout – for its arrival on a slave
● This is the same a creating a “read view” and “forcing” it on a slave
● API function: waypoint_wait(waypoint, timeout)  err_code
● Waits for a waypoint – until timeout – on a slave
● This is the same as “waiting for a slave to catch-up”
● Garbage collection: cleanup job that DELETEs old waypoints
10
Coping with lag @ Booking.com’
● Booking.com deployment:
● Throttling batches:
● use commit_wait with a high timeout
● use “small” transactions (chunks of 100 to 1000 rows)
● and sleep between chunks
● Protect from stale reads after writing:
● commit_wait with zero timeout
● store the waypoint in web session
● and waypoint_wait when reading
11
Improving B.com waypoints
● The waypoint design and implementation still suits us.
● Sometime, we have a “fast” slave problem:
● Throttling batches on a fast slave is sub-optimal
● But this does not happen often in practice though
● And it would be easy to fix: “find the slowest slave (or a slow slave)”
● But starting from scratch, we might do things differently:
● Inserting, deleting and purging waypoint could be simplified
● And we could get rid of the waypoint table
12
Improving B.com waypoints’
● GTIDs as waypoint
● Get the GTID of the last transaction:
● last_gtid session variable in MariaDB Server
From https://mariadb.com/kb/en/mariadb/master_gtid_wait/:
MASTER_GTID_WAIT() can also be used in client applications together with the last_gtid session variable. This is
useful in a read-scaleout replication setup, where the application writes to a single master but divides the reads out to
a number of slaves to distribute the load. In such a setup, there is a risk that an application could first do an update on
the master, and then a bit later do a read on a slave, and if the slave is not fast enough, the data read from the slave
might not include the update just made, possibly confusing the application and/or the end-user. One way to avoid this
is to request the value of last_gtid on the master just after the update. Then before doing the read on the slave, do a
MASTER_GTID_WAIT() on the value obtained from the master; this will ensure that the read is not performed until the
slave has replicated sufficiently far for the update to have become visible.
13
Improving B.com waypoints’
● GTIDs as waypoint:
● Get the GTID of the last transaction :
● last_gtid session variable in MariaDB Server
● gtid_executed global variable in Oracle MySQL (get all executed GTIDs)
● the last GTID can also be requested in the OK packet (only Oracle MySQL)
(session_track_gtids variable and mysql_session_track_get_{first,next} API functions)
● Waiting for GTID:
● MASTER_GTID_WAIT in MariaDB Server
● WAIT_FOR_EXECUTED_GTID_SET in Oracle MySQL
● But not portable (replicating from MySQL to MariaDB or vice-versa)
14
Improving B.com waypoints’’
● Binary log file and position as waypoint:
● MASTER_POS_WAIT
● However this breaks using intermediate masters
● But it is OK with Binlog Servers[1]
(in a Binlog Server deployment, the binlog file and position is a GTID)
● But currently no way of getting file and position after committing
[1]: https://blog.booking.com/
abstracting_binlog_servers_and_mysql_master_promotion_wo_reconfiguring_slaves.html
15
Improving B.com waypoints’’’
● Feature requests:
● Bug#84747: Expose last transaction GTID in a session variable.
● Bug#84748: Request transaction GTID in OK packet on COMMIT
(without needing a round-trip).
● MDEV-11956: Get last_gtid in OK packet.
● Bug#84779: Expose binlog file and position of last transaction.
● MDEV-11970: Expose binlog file and position of last transaction.
16
Improving B.com waypoints’’’’
● Better solution for throttling:
● Connecting to a (the right) slave is a hurdle
● Having the information about slave state on the master would be great
● A plugin exists for something close to that: semi-sync
● Using this to track transaction execution on slaves would be great
● This is the No-Slave-Left-Behind MariaDB Server Patch
17
No-Slave-Left-Behind
● No-Slave-Left-Behind MariaDB Server patch[1]:
● the semi-sync reply also reports SQL-thread position
● transactions are kept in the master plugin until executed by one slave
● the slave lag can be estimated from above
● client-threads wait before commit until lag is acceptable
● (Thanks Jonas Oreland and Google)
● This could easily be modified to implement commit_wait
[1]: https://jira.mariadb.org/browse/MDEV-8112
18
Links
● Booking.com:
● https://blog.booking.com/
● https://workingatbooking.com/
● MariaDB Server last_gtid (thanks Kristian Nielsen for implementing this):
● https://mariadb.com/kb/en/mariadb/master_gtid_wait/
● Binlog Server:
● https://blog.booking.com/
abstracting_binlog_servers_and_mysql_master_promotion_wo_reconfiguring_slaves.html
● No-Slave-Left-Behind MariaDB Server patch:
● https://jira.mariadb.org/browse/MDEV-8112 (thanks Jonas Oreland and Google)
19
Links’
● Pull request to extent Perl-DBI for reading GTID in OK packet:
● https://github.com/perl5-dbi/DBD-mysql/pull/77 (thanks Daniël van Eeden)
● Bug reports/Feature requests:
● Bug#84747: Expose last transaction GTID in a session variable.
● Bug#84748: Request transaction GTID in OK packet on COMMIT
(without needing a round-trip).
● Bug#84779: Expose binlog file and position of last transaction.
● MDEV-11956: Get last_gtid in OK packet.
● MDEV-11970: Expose binlog file and position of last transaction.
20
Thanks
Jean-François Gagné
jeanfrancois DOT gagne AT booking.com

Más contenido relacionado

La actualidad más candente

Mongo dbを知ろう
Mongo dbを知ろうMongo dbを知ろう
Mongo dbを知ろう
CROOZ, inc.
 

La actualidad más candente (20)

サーバーが完膚なきまでに死んでもMySQLのデータを失わないための表技
サーバーが完膚なきまでに死んでもMySQLのデータを失わないための表技サーバーが完膚なきまでに死んでもMySQLのデータを失わないための表技
サーバーが完膚なきまでに死んでもMySQLのデータを失わないための表技
 
InnoDB MVCC Architecture (by 권건우)
InnoDB MVCC Architecture (by 권건우)InnoDB MVCC Architecture (by 권건우)
InnoDB MVCC Architecture (by 권건우)
 
binary log と 2PC と Group Commit
binary log と 2PC と Group Commitbinary log と 2PC と Group Commit
binary log と 2PC と Group Commit
 
後悔しないもんごもんごの使い方 〜アプリ編〜
後悔しないもんごもんごの使い方 〜アプリ編〜後悔しないもんごもんごの使い方 〜アプリ編〜
後悔しないもんごもんごの使い方 〜アプリ編〜
 
Parallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDBParallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDB
 
Galera cluster for high availability
Galera cluster for high availability Galera cluster for high availability
Galera cluster for high availability
 
MySQL 8.0で憶えておいてほしいこと
MySQL 8.0で憶えておいてほしいことMySQL 8.0で憶えておいてほしいこと
MySQL 8.0で憶えておいてほしいこと
 
MySQL Replication Performance Tuning for Fun and Profit!
MySQL Replication Performance Tuning for Fun and Profit!MySQL Replication Performance Tuning for Fun and Profit!
MySQL Replication Performance Tuning for Fun and Profit!
 
MySQL Performance Schema in Action
MySQL Performance Schema in ActionMySQL Performance Schema in Action
MySQL Performance Schema in Action
 
MariaDB 10: The Complete Tutorial
MariaDB 10: The Complete TutorialMariaDB 10: The Complete Tutorial
MariaDB 10: The Complete Tutorial
 
ProxySQL High Availability (Clustering)
ProxySQL High Availability (Clustering)ProxySQL High Availability (Clustering)
ProxySQL High Availability (Clustering)
 
MySQL innoDB split and merge pages
MySQL innoDB split and merge pagesMySQL innoDB split and merge pages
MySQL innoDB split and merge pages
 
ブレソルでテラバイト級データのALTERを短時間で終わらせる
ブレソルでテラバイト級データのALTERを短時間で終わらせるブレソルでテラバイト級データのALTERを短時間で終わらせる
ブレソルでテラバイト級データのALTERを短時間で終わらせる
 
Mongo dbを知ろう
Mongo dbを知ろうMongo dbを知ろう
Mongo dbを知ろう
 
TiDB Introduction
TiDB IntroductionTiDB Introduction
TiDB Introduction
 
5.6 以前の InnoDB Flushing
5.6 以前の InnoDB Flushing5.6 以前の InnoDB Flushing
5.6 以前の InnoDB Flushing
 
MySQL Failover and Orchestrator
MySQL Failover and OrchestratorMySQL Failover and Orchestrator
MySQL Failover and Orchestrator
 
Demystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash SafetyDemystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash Safety
 
MySQLトラブル解析入門
MySQLトラブル解析入門MySQLトラブル解析入門
MySQLトラブル解析入門
 
Webinar - Key Reasons to Upgrade to MySQL 8.0 or MariaDB 10.11
Webinar - Key Reasons to Upgrade to MySQL 8.0 or MariaDB 10.11Webinar - Key Reasons to Upgrade to MySQL 8.0 or MariaDB 10.11
Webinar - Key Reasons to Upgrade to MySQL 8.0 or MariaDB 10.11
 

Destacado

20141211 Booking.com Introduction
20141211 Booking.com Introduction20141211 Booking.com Introduction
20141211 Booking.com Introduction
Yaskania Mejia
 

Destacado (20)

MySQL 8.0: Common Table Expressions
MySQL 8.0: Common Table Expressions MySQL 8.0: Common Table Expressions
MySQL 8.0: Common Table Expressions
 
What you wanted to know about MySQL, but could not find using inernal instrum...
What you wanted to know about MySQL, but could not find using inernal instrum...What you wanted to know about MySQL, but could not find using inernal instrum...
What you wanted to know about MySQL, but could not find using inernal instrum...
 
SQL window functions for MySQL
SQL window functions for MySQLSQL window functions for MySQL
SQL window functions for MySQL
 
MySQL 8.0: GIS — Are you ready?
MySQL 8.0: GIS — Are you ready?MySQL 8.0: GIS — Are you ready?
MySQL 8.0: GIS — Are you ready?
 
MySQL Server Defaults
MySQL Server DefaultsMySQL Server Defaults
MySQL Server Defaults
 
MySQL 8.0 & Unicode: Why, what & how
MySQL 8.0 & Unicode: Why, what & howMySQL 8.0 & Unicode: Why, what & how
MySQL 8.0 & Unicode: Why, what & how
 
Proxysql use case scenarios fosdem17
Proxysql use case scenarios    fosdem17Proxysql use case scenarios    fosdem17
Proxysql use case scenarios fosdem17
 
MySQL Group Replication
MySQL Group ReplicationMySQL Group Replication
MySQL Group Replication
 
Using Optimizer Hints to Improve MySQL Query Performance
Using Optimizer Hints to Improve MySQL Query PerformanceUsing Optimizer Hints to Improve MySQL Query Performance
Using Optimizer Hints to Improve MySQL Query Performance
 
Autopsy of an automation disaster
Autopsy of an automation disasterAutopsy of an automation disaster
Autopsy of an automation disaster
 
Jeudis du Libre - MySQL InnoDB Cluster
Jeudis du Libre - MySQL InnoDB ClusterJeudis du Libre - MySQL InnoDB Cluster
Jeudis du Libre - MySQL InnoDB Cluster
 
Jeudis du Libre - MySQL comme Document Store
Jeudis du Libre - MySQL comme Document StoreJeudis du Libre - MySQL comme Document Store
Jeudis du Libre - MySQL comme Document Store
 
ConnectIn Amsterdam 2014 - Beter beslissen met data - Booking.com & Netwerven
ConnectIn Amsterdam 2014 - Beter beslissen met data - Booking.com & NetwervenConnectIn Amsterdam 2014 - Beter beslissen met data - Booking.com & Netwerven
ConnectIn Amsterdam 2014 - Beter beslissen met data - Booking.com & Netwerven
 
20141211 Booking.com Introduction
20141211 Booking.com Introduction20141211 Booking.com Introduction
20141211 Booking.com Introduction
 
Polyglot Database - Linuxcon North America 2016
Polyglot Database - Linuxcon North America 2016Polyglot Database - Linuxcon North America 2016
Polyglot Database - Linuxcon North America 2016
 
How to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better PerformanceHow to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better Performance
 
How to analyze and tune sql queries for better performance percona15
How to analyze and tune sql queries for better performance percona15How to analyze and tune sql queries for better performance percona15
How to analyze and tune sql queries for better performance percona15
 
How to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better PerformanceHow to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better Performance
 
MySQL EXPLAIN Explained-Norvald H. Ryeng
MySQL EXPLAIN Explained-Norvald H. RyengMySQL EXPLAIN Explained-Norvald H. Ryeng
MySQL EXPLAIN Explained-Norvald H. Ryeng
 
What Your Database Query is Really Doing
What Your Database Query is Really DoingWhat Your Database Query is Really Doing
What Your Database Query is Really Doing
 

Similar a How Booking.com avoids and deals with replication lag

Webinar Slides: Migrating to Galera Cluster
Webinar Slides: Migrating to Galera ClusterWebinar Slides: Migrating to Galera Cluster
Webinar Slides: Migrating to Galera Cluster
Severalnines
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High load
Krivoy Rog IT Community
 
Tales from the Field
Tales from the FieldTales from the Field
Tales from the Field
MongoDB
 

Similar a How Booking.com avoids and deals with replication lag (20)

MySQL Parallel Replication: inventory, use-case and limitations
MySQL Parallel Replication: inventory, use-case and limitationsMySQL Parallel Replication: inventory, use-case and limitations
MySQL Parallel Replication: inventory, use-case and limitations
 
MySQL Parallel Replication: inventory, use-cases and limitations
MySQL Parallel Replication: inventory, use-cases and limitationsMySQL Parallel Replication: inventory, use-cases and limitations
MySQL Parallel Replication: inventory, use-cases and limitations
 
Auto Europe's ongoing journey with MariaDB and open source
Auto Europe's ongoing journey with MariaDB and open sourceAuto Europe's ongoing journey with MariaDB and open source
Auto Europe's ongoing journey with MariaDB and open source
 
Webinar Slides: Migrating to Galera Cluster
Webinar Slides: Migrating to Galera ClusterWebinar Slides: Migrating to Galera Cluster
Webinar Slides: Migrating to Galera Cluster
 
Pseudo GTID and Easy MySQL Replication Topology Management
Pseudo GTID and Easy MySQL Replication Topology ManagementPseudo GTID and Easy MySQL Replication Topology Management
Pseudo GTID and Easy MySQL Replication Topology Management
 
MySQL Parallel Replication by Booking.com
MySQL Parallel Replication by Booking.comMySQL Parallel Replication by Booking.com
MySQL Parallel Replication by Booking.com
 
MySQL GTID Concepts, Implementation and troubleshooting
MySQL GTID Concepts, Implementation and troubleshooting MySQL GTID Concepts, Implementation and troubleshooting
MySQL GTID Concepts, Implementation and troubleshooting
 
Experiences testing dev versions of MySQL and why it is good for you
Experiences testing dev versions of MySQL and why it is good for youExperiences testing dev versions of MySQL and why it is good for you
Experiences testing dev versions of MySQL and why it is good for you
 
Gdb basics for my sql db as (percona live europe 2019)
Gdb basics for my sql db as (percona live europe 2019)Gdb basics for my sql db as (percona live europe 2019)
Gdb basics for my sql db as (percona live europe 2019)
 
MySQL/MariaDB Parallel Replication: inventory, use-case and limitations
MySQL/MariaDB Parallel Replication: inventory, use-case and limitationsMySQL/MariaDB Parallel Replication: inventory, use-case and limitations
MySQL/MariaDB Parallel Replication: inventory, use-case and limitations
 
MySQL Parallel Replication (LOGICAL_CLOCK): all the 5.7 (and some of the 8.0)...
MySQL Parallel Replication (LOGICAL_CLOCK): all the 5.7 (and some of the 8.0)...MySQL Parallel Replication (LOGICAL_CLOCK): all the 5.7 (and some of the 8.0)...
MySQL Parallel Replication (LOGICAL_CLOCK): all the 5.7 (and some of the 8.0)...
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High load
 
MySQL Scalability and Reliability for Replicated Environment
MySQL Scalability and Reliability for Replicated EnvironmentMySQL Scalability and Reliability for Replicated Environment
MySQL Scalability and Reliability for Replicated Environment
 
AWS Techniques and lessons writing low cost autoscaling GitLab runners
AWS Techniques and lessons writing low cost autoscaling GitLab runnersAWS Techniques and lessons writing low cost autoscaling GitLab runners
AWS Techniques and lessons writing low cost autoscaling GitLab runners
 
Tales from the Field
Tales from the FieldTales from the Field
Tales from the Field
 
How to serve 2500 Ad requests per second
How to serve 2500 Ad requests per secondHow to serve 2500 Ad requests per second
How to serve 2500 Ad requests per second
 
MongoDb scalability and high availability with Replica-Set
MongoDb scalability and high availability with Replica-SetMongoDb scalability and high availability with Replica-Set
MongoDb scalability and high availability with Replica-Set
 
Rubyslava + PyVo #48
Rubyslava + PyVo #48Rubyslava + PyVo #48
Rubyslava + PyVo #48
 
Server fleet management using Camunda by Akhil Ahuja
Server fleet management using Camunda by Akhil AhujaServer fleet management using Camunda by Akhil Ahuja
Server fleet management using Camunda by Akhil Ahuja
 
Distributed Queue System using Gearman
Distributed Queue System using GearmanDistributed Queue System using Gearman
Distributed Queue System using Gearman
 

Más de Jean-François Gagné

Más de Jean-François Gagné (7)

The consequences of sync_binlog != 1
The consequences of sync_binlog != 1The consequences of sync_binlog != 1
The consequences of sync_binlog != 1
 
Autopsy of a MySQL Automation Disaster
Autopsy of a MySQL Automation DisasterAutopsy of a MySQL Automation Disaster
Autopsy of a MySQL Automation Disaster
 
MySQL Scalability and Reliability for Replicated Environment
MySQL Scalability and Reliability for Replicated EnvironmentMySQL Scalability and Reliability for Replicated Environment
MySQL Scalability and Reliability for Replicated Environment
 
Demystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash SafetyDemystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash Safety
 
Demystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash SafetyDemystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash Safety
 
The two little bugs that almost brought down Booking.com
The two little bugs that almost brought down Booking.comThe two little bugs that almost brought down Booking.com
The two little bugs that almost brought down Booking.com
 
Riding the Binlog: an in Deep Dissection of the Replication Stream
Riding the Binlog: an in Deep Dissection of the Replication StreamRiding the Binlog: an in Deep Dissection of the Replication Stream
Riding the Binlog: an in Deep Dissection of the Replication Stream
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

How Booking.com avoids and deals with replication lag

  • 1. How B.com avoids and deals with replication lag Jean-François Gagné – Friday, February 3, 2017 Pre-FOSDEM MySQL day
  • 2. Booking.com ● Based in Amsterdam since 1996 ● Online Hotel/Accommodation/Travel Agent (OTA): ● +1.340.000 properties in 225 countries ● +1.200.000 room nights reserved daily ● +40 languages (website and customer service) ● +13.000 people working in 187 offices worldwide ● Part of the Priceline Group ● And we use MySQL: ● Thousands (1000s) of servers, ~90% replicating ● >150 masters: ~30 >50 slaves & ~10 >100 slaves 2
  • 3. Booking.com’ ● And we are hiring ! ● MySQL Engineer / DBA ● System Administrator ● System Engineer ● Site Reliability Engineer ● Developer / Designer ● Technical Team Lead ● Product Owner ● Data Scientist ● And many more… ● https://workingatbooking.com/ 3
  • 4. Session Summary 1. MySQL replication at Booking.com 2. Replication lag: what/how/why 3. Bad solutions to cope with lag 4. Booking.com solution to cope with lag 5. Improving Booking.com solution 4
  • 5. MySQL replication at Booking.com ● Typical Booking.com MySQL replication deployment: +---+ | M | +---+ | +------+-- ... --+---------------+-------- ... | | | | +---+ +---+ +---+ +---+ | S1| | S2| | Sn| | M1| +---+ +---+ +---+ +---+ | +-- ... --+ | | +---+ +---+ | T1| | Tm| +---+ +---+ 5
  • 6. Why does lag happen ? ● In which condition can lag be experienced ? ● Too many transactions for replication to keep up: capacity problem, fix by scaling (sharding, parallel replication, …) ● Long transactions: self induced, to fix by a developer in the application ● Too aggressive “batch” workload on the master: optimize the batches or slow them down 6
  • 7. Lag consequences ● What are the consequences of lag ? ● Stale reads on slaves (but this is not necessarily a problem) ● When do stale reads become a problem ? ● A user changes his email address but still sees the old one ● A hotel changes its inventory but still sees old availability ● A user books a hotel but does not see it in his reservations 7
  • 8. Bad solution to cope with lag ● Bad solution #1: falling back to reading from the master ● If slaves are lagging, maybe we should read from the master ● This looks like an attractive solution to avoid stale reads ● But this does not scale (why are you reading from slaves…) ● This will cause a sudden load on the master (in case of lag) ● And it might cause an outage on the master (and this would be bad) ● It might be better to fail a read than to fallback to (and kill) the master 8
  • 9. Bad solution to cope with lag (bis) ● Bad solution #2: retry on another slave ● When reading from a slave: if lag, then retry on another slave ● This scales better and is OK-ish (when few slaves are lagging) ● But what happens if all slaves are lagging ? ● Increased load (retries) can slowdown replication ● This might overload the slaves and cause a good slave to start lagging ● In the worst case, this might kill slaves and cause a domino effect ● Again: probably better to fail a read than to cause a bigger problem 9
  • 10. Coping with lag @ Booking.com ● Booking.com solution: “waypoint” ● Creating a waypoint is similar to creating a “read view” ● Waiting for a waypoint is similar to waiting for a slave to catch-up ● Booking.com implementation: ● Table: db_waypoint (a waypoint is a row in that table) ● API function: commit_wait(timeout)  (err_code, waypoint) ● INSERTs a waypoint and waits – until timeout – for its arrival on a slave ● This is the same a creating a “read view” and “forcing” it on a slave ● API function: waypoint_wait(waypoint, timeout)  err_code ● Waits for a waypoint – until timeout – on a slave ● This is the same as “waiting for a slave to catch-up” ● Garbage collection: cleanup job that DELETEs old waypoints 10
  • 11. Coping with lag @ Booking.com’ ● Booking.com deployment: ● Throttling batches: ● use commit_wait with a high timeout ● use “small” transactions (chunks of 100 to 1000 rows) ● and sleep between chunks ● Protect from stale reads after writing: ● commit_wait with zero timeout ● store the waypoint in web session ● and waypoint_wait when reading 11
  • 12. Improving B.com waypoints ● The waypoint design and implementation still suits us. ● Sometime, we have a “fast” slave problem: ● Throttling batches on a fast slave is sub-optimal ● But this does not happen often in practice though ● And it would be easy to fix: “find the slowest slave (or a slow slave)” ● But starting from scratch, we might do things differently: ● Inserting, deleting and purging waypoint could be simplified ● And we could get rid of the waypoint table 12
  • 13. Improving B.com waypoints’ ● GTIDs as waypoint ● Get the GTID of the last transaction: ● last_gtid session variable in MariaDB Server From https://mariadb.com/kb/en/mariadb/master_gtid_wait/: MASTER_GTID_WAIT() can also be used in client applications together with the last_gtid session variable. This is useful in a read-scaleout replication setup, where the application writes to a single master but divides the reads out to a number of slaves to distribute the load. In such a setup, there is a risk that an application could first do an update on the master, and then a bit later do a read on a slave, and if the slave is not fast enough, the data read from the slave might not include the update just made, possibly confusing the application and/or the end-user. One way to avoid this is to request the value of last_gtid on the master just after the update. Then before doing the read on the slave, do a MASTER_GTID_WAIT() on the value obtained from the master; this will ensure that the read is not performed until the slave has replicated sufficiently far for the update to have become visible. 13
  • 14. Improving B.com waypoints’ ● GTIDs as waypoint: ● Get the GTID of the last transaction : ● last_gtid session variable in MariaDB Server ● gtid_executed global variable in Oracle MySQL (get all executed GTIDs) ● the last GTID can also be requested in the OK packet (only Oracle MySQL) (session_track_gtids variable and mysql_session_track_get_{first,next} API functions) ● Waiting for GTID: ● MASTER_GTID_WAIT in MariaDB Server ● WAIT_FOR_EXECUTED_GTID_SET in Oracle MySQL ● But not portable (replicating from MySQL to MariaDB or vice-versa) 14
  • 15. Improving B.com waypoints’’ ● Binary log file and position as waypoint: ● MASTER_POS_WAIT ● However this breaks using intermediate masters ● But it is OK with Binlog Servers[1] (in a Binlog Server deployment, the binlog file and position is a GTID) ● But currently no way of getting file and position after committing [1]: https://blog.booking.com/ abstracting_binlog_servers_and_mysql_master_promotion_wo_reconfiguring_slaves.html 15
  • 16. Improving B.com waypoints’’’ ● Feature requests: ● Bug#84747: Expose last transaction GTID in a session variable. ● Bug#84748: Request transaction GTID in OK packet on COMMIT (without needing a round-trip). ● MDEV-11956: Get last_gtid in OK packet. ● Bug#84779: Expose binlog file and position of last transaction. ● MDEV-11970: Expose binlog file and position of last transaction. 16
  • 17. Improving B.com waypoints’’’’ ● Better solution for throttling: ● Connecting to a (the right) slave is a hurdle ● Having the information about slave state on the master would be great ● A plugin exists for something close to that: semi-sync ● Using this to track transaction execution on slaves would be great ● This is the No-Slave-Left-Behind MariaDB Server Patch 17
  • 18. No-Slave-Left-Behind ● No-Slave-Left-Behind MariaDB Server patch[1]: ● the semi-sync reply also reports SQL-thread position ● transactions are kept in the master plugin until executed by one slave ● the slave lag can be estimated from above ● client-threads wait before commit until lag is acceptable ● (Thanks Jonas Oreland and Google) ● This could easily be modified to implement commit_wait [1]: https://jira.mariadb.org/browse/MDEV-8112 18
  • 19. Links ● Booking.com: ● https://blog.booking.com/ ● https://workingatbooking.com/ ● MariaDB Server last_gtid (thanks Kristian Nielsen for implementing this): ● https://mariadb.com/kb/en/mariadb/master_gtid_wait/ ● Binlog Server: ● https://blog.booking.com/ abstracting_binlog_servers_and_mysql_master_promotion_wo_reconfiguring_slaves.html ● No-Slave-Left-Behind MariaDB Server patch: ● https://jira.mariadb.org/browse/MDEV-8112 (thanks Jonas Oreland and Google) 19
  • 20. Links’ ● Pull request to extent Perl-DBI for reading GTID in OK packet: ● https://github.com/perl5-dbi/DBD-mysql/pull/77 (thanks Daniël van Eeden) ● Bug reports/Feature requests: ● Bug#84747: Expose last transaction GTID in a session variable. ● Bug#84748: Request transaction GTID in OK packet on COMMIT (without needing a round-trip). ● Bug#84779: Expose binlog file and position of last transaction. ● MDEV-11956: Get last_gtid in OK packet. ● MDEV-11970: Expose binlog file and position of last transaction. 20