Recorded webinar can be found at https://galeracluster.com/videos/running-galera-cluster-on-amazon-ec2-and-comparing-it-to-rds-and-aurora/
Do you want to run Galera Cluster in the cloud? Why not learn to setup a 3-node Galera Cluster using Amazon Web Services (AWS) Elastic Compute Cloud (EC2), and run it yourself?
In this webinar, we will cover the steps to do this, with a demonstration of how easy it is for you to do. We will also cover how you may want to load balance traffic to your Galera Cluster using a proxy like ProxySQL.
In addition, we will cover why you may want to run a 3-node (or more) Galera Cluster (multi-master synchronous clusters) instead of using Amazon Relational Database Service (RDS) MySQL, which has the ability to have asynchronous replication (Read Replicas), and High Availability provided by DRBD (Multi-AZ). We will also cover where Galera Cluster differs from Amazon Aurora RDS for MySQL.
Learn about storage options, backup & recovery, as well as monitoring & metrics options for the “roll your own Galera Cluster” in EC2.
Running galera cluster effectively on amazon web services (aws) and comparing it to rds and aurora
1. Running Galera Cluster effectively
on Amazon Web Services (AWS)
and comparing it to RDS and Aurora
Colin Charles, colin.charles@galeracluster.com
30 July 2019
https://twitter.com/galeracluster | www.galeracluster.com
Codership Webinar
2. Agenda
• Running Galera Cluster in the the Amazon Web Services cloud
• Using a proxy (e.g. ProxySQL) to load balance traffic and more
• Galera Cluster vs Amazon AWS RDS for MySQL / MariaDB Server
• Galera Cluster vs Amazon AWS RDS Aurora
3. Codership
• The developers and experts of Galera Cluster
• Established in 2007, 3 founders, all engineers
• Seppo Jaakola, CEO, Teemu Ollakka, CTO, Alex Yurchenko, architect
• Services business model, producing 100% open source software
• Thousands of users in various industries: e-commerce, betting/gambling,
telecoms, banking, insurance, gaming, healthcare, media, marketing,
advertising, travel, education, software as a service, PaaS, IaaS, etc.
!3
6. Galera Cluster is all about multi-master replication
• Can be described as virtually synchronous replication
• High Availability with no data loss, and consistent data across all nodes — no Single Point of Failure (SPoF)
• Quorum based failure handling
• Optimistic concurrency control
• 100% multi-master cluster (all nodes are equal in terms of having the data, so no lagging secondaries, 24/7
availability, etc.)
• This is a core feature of the product by design, has automatic transaction conflict detection and management,
and your application can issue any transaction to any Galera Cluster node. Works well in WAN/Clouds
• Parallel replication
• You do not need automatic failovers via a framework, no need to designate single nodes for writes and the rest for
reads, configuration is simple, easier handling of scheduled downtime
!6
7. Galera Cluster optimised for the cloud
• Optimised network protocol as packets are only exchanged over the WAN at
transaction commit time
• Topology aware replication, so each transaction is sent to the data centre only
once
• Detection and automatic eviction of unreliable nodes
• eviction if due to network flaps or node failure, will not be able to rejoin without
manual intervention
• Split brain recovery/management
• Traffic encryption (key in the cloud)
!7
8. Regions & Availability Zones
• Region: a data centre location, containing multiple Availability Zones
• Availability Zone (AZ): isolated from failures from other AZs + low-latency
network connectivity to other zones in same region
!8
9. RDS: Multi-AZ
• Provides enhanced durability (synchronous data replication)
• Increased availability (automatic failover)
• Warning: can be slow with large database size
• Easy GUI administration
• Doesn’t give you another usable “read-replica” though
!9
10. Running Galera Cluster in the AWS EC2 cloud
• The why’s tend to be simple: Galera Cluster is not available on Amazon
Web Services
• For regular MySQL, you tend to get more configurability out of EC2 since
it is a base OS Linux
• You may however have to pay a little more than just a standard RDS
instance
!10
11. Other considerations
• Location, location, location: AWS RDS: US East (N. Virginia, Ohio), US
West (Oregon, Northern California, California), EU (Ireland, Frankfurt,
London, Paris), APAC (Singapore, Tokyo, Sydney, Seoul, Mumbai),
South America (São Paulo), GovCloud, Canada (Central), China (Beijing)
• SLA’s: at least 99.95% in a calendar month, less than, 10% service credit
• Support: active forums; $100+ (or a % of AWS usage) phone #
• Management: self-management, Enterprise ($15k+)
!11
12. Other considerations II
• Backups:
• Amazon has automated backups (with point-in-time recovery), with full
daily snapshots (has a backup window).
• Multi-AZ? Backup taken from the standby!
• Backup retention default? 1 day. Increase it
• Monitoring: AWS CloudWatch
!12
13. Costs!
• RDS M5 — db.m5.xlarge, 4 vCPUs, 16 GB RAM
• $0.171/hour, and $0.684/hour for multi-AZ
• EC2 M5 - m5.xlarge, 4 vCPUs, 16GB RAM
• $0.192/hour
• But for a fairer Aurora comparison, we should look at the memory-optimised R5 instances
(r5.large) — 2 vCPUs, 16GB RAM
• EC2: $0.126/hour
• RDS: $0.24/hour, and $0.48/hour for multi-AZ
• Aurora: $0.58/hour
!13
14. Costs II
• Monthly cost of running 3 EC2 r5.large instances with 100GB of storage:
$303.72 (minimal Galera Cluster, no proxies, etc.)
• Monthly cost of running 3 RDS db.r5.large instances with 100GB of
storage: $559.24 (one master, two secondaries)
• Monthly cost of running 1 RDS db.r5.large instance with 100GB of storage
with Multi-AZ: $374.36 (one master, one passive failover target)
• Monthly cost of running 1 RDS Aurora db.r5.large instance with 100GB of
storage: $219.98
!14
15. So…
• Who runs/manages Galera Cluster in an EC2 instance? You do.
• What does it take to run Galera Cluster in an EC2 instance? Not much
beyond the usual setup.
• Where do you run Galera Cluster in AWS? In an EC2 instance.
• When do you run Galera Cluster in AWS? When you feel the need for virtually
synchronous replication, automatic node management, etc.
• Why do you run Galera Cluster in AWS? For the features, of course!
• How do you run Galera Cluster in AWS? We’ll show you now
!15
17. Security Groups
• SSH for login
• TCP for MySQL is 3306
• TCP ports for Galera Cluster is 4444 (incremental state transfers), 4567
(communications), 4568 (state snapshot transfers)
• UDP port for Galera Cluster is 4567 (communications)
!17
19. SELinux & Firewalls
• setenforce 0
• However you can open ports for Galera Cluster:
• semanage port -a -t mysqld_port_t -p tcp 3306 / semange permissive -
a mysqld_t
• Similar for the firewall
• firewall-cmd --zone=public --add-service=mysql --permanent (add the
ports via —add-port=3306/tcp) / firewall-cmd —reload
!19
20. Proxies like ProxySQL
• This will take another instance
• Has native Galera Cluster hostgroup support
• Works with RDS & Aurora too
• https://aws.amazon.com/blogs/database/how-to-use-
proxysql-with-open-source-platforms-to-split-sql-
reads-and-writes-on-amazon-aurora-clusters/
• https://aws.amazon.com/blogs/database/supercharge-
your-amazon-rds-for-mysql-deployment-with-proxysql-
and-percona-monitoring-and-management/
!20
21. Amazon RDS
• Offers MySQL and MariaDB Server
• Limited feature set, i.e. you don’t get the MySQL 8 X Protocol, mysqlsh, in
MariaDB Server you don’t get the encryption, storage engines, etc.
• It is however fully managed for you
• It has an automatic upgrade window
• Everything can be done easily within a GUI… you tradeoff control of the
database for ease of use & management
!21
22. AWS Aurora
• Bigger instances work better
• Zero-downtime migration from RDS
• Metrics via CloudWatch, Connectors via
MariaDB
• 99.99% uptime
• MySQL 5.6.10 “fork”, no optimiser, not
traditional replication (but Aurora <->MySQL
works of course)
• MySQL 5.7.12 Aurora launched Feb
2018, with JSON support, spatial
indexes, generated columns, etc.
• Auto scaling - compute, memory, storage
• Replicas (15) for reads
• Automated backups in S3, DB snapshots
• Encryption with key server being Amazon
KMS
• Spatial data support - like InnoDB 5.7!
• Lab mode (hash joins, scan batching, etc.)
!22
23. Amazon RDS Aurora
• A little more compelling considering it replaces the replication layer
• Has Serverless function support
• Can do parallel query
• Beware that not all features exist in the 5.6 and 5.7 releases
• https://mariadb.com/resources/blog/four-things-you-didnt-know-about-
amazon-aurora/
• ageing outdated database, required downtime & interruption, lack of
enterprise security, least common denominator
!23
25. Resources
• AWS Cost Calculator: https://calculator.s3.amazonaws.com/index.html
• 1 hour tutorial video on running Galera Cluster on AWS: https://
galeracluster.com/library-media/videos/galera-on-aws.mp4
!25
26. Things we should think about in the future…
• AWS Marketplace image to roll this out easier?
• Kubernetes support?
• What else would you like?
• Benchmarks? (we got questions about RDS vs Multi-AZ vs Aurora vs roll-
your-own Galera Cluster)
!26