Amazon Aurora_Deep Dive

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Kevin Jernigan
kmj@amazon.com
Senior Product Manager, Aurora
SRV308
Deep Dive on Amazon Aurora

What is Amazon Aurora?
Datab ase reimag in ed for th e c lou d
 Speed and availability of high-end commercial databases
 Simplicity and cost-effectiveness of open-source databases
 Drop-in compatibility with MySQL and PostgreSQL
 Simple pay-as-you-go pricing
Delivered as a managed service

Re-imagining the relational database
1
2
3
Scale-out, distributed architecture
Service-oriented architecture leveraging AWS services
Automate administrative tasks — fully managed service

Scale-out, distributed architecture
▪ Purpose-built log-structured distributed
storage system designed for databases
▪ Storage volume is striped across hundreds
of storage nodes distributed over 3
different Availability Zones
▪ Six copies of data, two copies in each
Availability Zone to protect against AZ+1
failures
▪ Plan to apply same principles to other
layers of the stack
Master Replica
Availability
Zone 1
Shared storage volume
Availability
Zone 2
Availability
Zone 3
Storage nodes with SSDs
SQL
Transactions
Caching
SQL
Transactions
Caching
SQL
Transactions
Caching

Leveraging AWS services
AWS Lambda
Amazon S3
IAM
Amazon
CloudWatch
Invoke Lambda events from stored procedures/triggers
Load data from Amazon S3, store snapshots and
backups in Amazon S3
Use IAM roles to manage database access control
Upload systems metrics and audit logs to CloudWatch

Automate administrative tasks
Schema design
Query construction
Query optimization
Automatic failover
Backup & recovery
Isolation & security
Industry compliance
Push-button scaling
Automated patching
Advanced monitoring
Routine maintenance
Takes care of your time-consuming database management tasks,
freeing you to focus on your applications and business
You
AWS

Aurora is used by
¾ of the top 100
AWS customers
Fastest growing
service in AWS history
Aurora customer adoption

Who is moving to Aurora and why?
Customers using
open-source engines
▪ Higher performance – up to 5x
▪ Better availability and durability
▪ Reduces cost – up to 60%
▪ Easy migration; no application change
▪ One tenth of the cost; no licenses
▪ Integration with cloud ecosystem
▪ Comparable performance and availability
▪ Migration tooling and services
Customers using
Commercial engines

AURORA PERFORMANCE
5x faster than MySQL; 3x faster than PostgreSQL

Aurora MySQL performance
WRITE PERFORMANCE READ PERFORMANCE
MySQL SysBench results; R4.16XL: 64 cores / 488-GB RAM
Aurora read/write throughput compared to MySQL 5.6
based on industry standard benchmarks.
Aurora MySQL 5.6
0
100000
200000
300000
400000
500000
600000
700000
0
50000
100000
150000
200000
250000

Aurora PostgreSQL performance
0
5000
10000
15000
20000
25000
30000
35000
40000
45000
10 15 20 25 30 35 40 45 50 55 60
Throughput,tps
Minutes
pgbench throughput over time, 150 GiB, 1024 clients
PostgreSQL (Single AZ) Amazon Aurora (Three AZs)
W h i l e r u n n i n g p g b e n c h a t l o a d , t h ro u g h p u t i s 3 x
m o re c o n s i s t e n t t h a n Po s t g re S Q L

How did we achieve this?
Do fewer I/Os
Minimize network packets
Cache prior results
Offload the database engine
DO LESS WORK
Process asynchronously
Reduce latency path
Use lock-free data structures
Batch operations together
BE MORE EFFICIENT
DATABASES ARE ALL ABOUT I/O
NETWORK-ATTACHED STORAGE IS ALL ABOUT PACKETS/SECOND
HIGH-THROUGHPUT PROCESSING IS ALL ABOUT CONTEXT SWITCHES

Aurora I/O profile
BINLOG DATA DOUBLE-WRITELOG FRM FILES
TYPE OF WRITE
MYSQL WITH REPLICA
EBS mirrorEBS mirror
AZ 1 AZ 2
Amazon S3
EBS
Amazon Elastic
Block Store (EBS)
Primary
Instance
Replica
Instance
1
2
3
4
5
AZ 1 AZ 3
Primary
Instance
Amazon S3
AZ 2
Replica
Instance
ASYNC
4/6 QUORUM
DISTRIBUT
ED WRITES
Replica
Instance
AMAZON AURORA
780K transactions
7,388K I/Os per million txns (excludes mirroring, standby)
Average 7.4 I/Os per transaction
MySQL I/O profile for 30 min Sysbench run
27,378K transactions 35X MORE
0.95 I/Os per transaction (6X amplification) 7.7X LESS
Aurora I/O profile for 30 min Sysbench run

I/O traffic in Aurora (storage node)
LOG RECORDS
Primary
Instance
INCOMING QUEUE
STORAGE NODE
S3 BACKUP
1
2
3
4
5
6
7
8
UPDATE
QUEUE
ACK
HOT
LOG
DATA
BLOCKS
POINT IN TIME
SNAPSHOT
GC
SCRUB
COALESCE
SORT
GROUP
PEER TO PEER GOSSIPPeer
Storage
Nodes
All steps are asynchronous
Only steps 1 and 2 are in foreground latency path
Input queue is 46X less than MySQL (unamplified, per node)
Favor latency-sensitive operations
Use disk space to buffer against spikes in activity
OBSERVATIONS
I/O FLOW
① Receive record and add to in-memory queue
② Persist record and ACK
③ Organize records and identify gaps in log
④ Gossip with peers to fill in holes
⑤ Coalesce log records into new data block versions
⑥ Periodically stage log and new block versions to S3
⑦ Periodically garbage collect old versions
⑧ Periodically validate CRC codes on blocks

Aurora lock management
Scan
Delete
Scan
Delete
Insert
Scan Scan
Insert
Delete
Scan
Insert
Insert
MySQL lock manager Aurora lock manager
▪ Same locking semantics as MySQL
▪ Concurrent access to lock chains
Needed to support many concurrent sessions, high update throughput
▪ Multiple scanners in individual lock chains
▪ Lock-free deadlock detection

Asynchronous key prefetch
Latency improvement factor vs. Batched Key Access (BKA) join
algorithm. Decision support benchmark, R3.8xlarge
14.57
-
2.00
4.00
6.00
8.00
10.00
12.00
14.00
16.00
Query-1
Query-2
Query-3
Query-4
Query-5
Query-6
Query-7
Query-8
Query-9
Query-10
Query-11
Query-12
Query-13
Query-14
Query-15
Query-16
Query-17
Query-18
Query-19
Query-20
Query-21
Query-22
Cold buffer
Works for queries using Batched Key
Access (BKA) join algorithm + Multi-Range
Read (MRR) Optimization
Performs a secondary to primary index
lookup during JOIN evaluation
Uses background thread to
asynchronously load pages into memory
AKP used in queries 2, 5, 8, 9, 11, 13, 17, 18, 19, 20, 21

Batched scans
Standard MySQL processes rows one-at-a-time
resulting in high overhead due to:
▪ Repeated function calls
▪ Locking and latching
▪ cursor store and restore
▪ InnoDB to MySQL format conversion
Amazon Aurora scan tuples from the InnoDB
buffer pool in batches for
▪ Table full scans
▪ Index full scans
▪ Index range scans Latency improvement factor vs. Batched Key Access (BKA) join
algorithm Decision support benchmark, R3.8xlarge
1.78X
-
0.20
0.40
0.60
0.80
1.00
1.20
1.40
1.60
1.80
2.00
Query-1
Query-2
Query-3
Query-4
Query-5
Query-6
Query-7
Query-8
Query-9
Query-10
Query-11
Query-12
Query-13
Query-14
Query-15
Query-16
Query-17
Query-18
Query-19
Query-20
Query-21
Query-22

WHAT ABOUT AVAILABILITY?
“Performance only matters if your database is up”

Six-way replicated storage
S u r vives catastrop h ic failu res
Six copies across three Availability Zones
4 out 6 write quorum; 3 out of 6 read quorum
Peer-to-peer replication for repairs
Volume striped across hundreds of storage nodes
SQL
Transaction
AZ 1 AZ 2 AZ 3
Caching
SQL
Transaction
AZ 1 AZ 2 AZ 3
Caching
Read and write availabilityRead availability

Up to 15 promotable read replicas
Master
Read
Replica
Read
Replica
Read
Replica
Shared distributed storage volume
Reader endpoint
► Up to 15 promotable read replicas across multiple Availability Zones
► Re-do log-based replication leads to low replica lag – typically < 10ms
► Reader endpoint with load balancing and auto-scaling * NEW *

Instant crash recovery
TRADITIONAL DATABASE
Have to replay logs since the last checkpoint
Typically 5 minutes between checkpoints
Single-threaded in MySQL; requires a large
number of disk accesses
AMAZON AURORA
Underlying storage replays redo records on
demand as part of a disk read
Parallel, distributed, asynchronous
No replay for startup
Checkpointed Data Redo Log
Crash at T0 requires
a re-application of the
SQL in the redo log since
last checkpoint
T0 T0
Crash at T0 will result in redo logs being applied
to each segment on demand, in parallel,
asynchronously

Crash recovery in Aurora
CRASH
Log records Gaps
Volume Complete
LSN (VCL)
AT CRASH
IMMEDIATELY AFTER CRASH RECOVERY
Consistency Point
LSN (CPL)
Consistency Point
LSN (CPL)
▪ Instance opens all storage nodes for the
volume
▪ Volume Complete LSN (VCL) is the highest
point where all records have met quorum
▪ Consistency Point LSN (CPL) is the highest
commit record below VCL.
▪ Everything past CPL is deleted at crash recovery
▪ Removes the need for 2PC at each commit spanning
storage nodes.
▪ No redo or undo processing is required before the
database is opened for processing

Database failover time
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
30.00%
35.00%
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
0 – 5s: 30% of failovers
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
30.00%
35.00%
40.00%
45.00%
50.00%
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
0%
10%
20%
30%
40%
50%
60%
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
0%
5%
10%
15%
20%
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35

MONITORING DATABASE PERFORMANCE

Performance Insights
Dashboard showing load on DB
• Easy
• Powerful
Identifies source of bottlenecks
• Top SQL
Adjustable time frame
• Hour, day, week , month
• Up to 35 days of data
Max CPU

Performance Insights
What if you could find out what
your databases were doing, or
had been doing?
Performance Insights is the
answer:
▪ Lets you see your overall
instance load
▪ Drill down by SQL statement,
by time, or by calling host

RECENT INNOVATIONS

Availability is about more than HW failures
A u rora solu tion s for availab ility d isru ption s
1. Patching your database software
→ Zero downtime patching
2. Large-scale copying and reorganizations
→ Fast cloning
3. DBA errors requiring database restores
→ Backtrack
4. Disasters
→ Global replication

Zero downtime patching
Networking
state
Application
state
Storage Service
App
state
Net
state
App
state
Net
state
BeforeZDP
New DB
Engine
Old DB
Engine
New DB
Engine
Old DB
Engine
WithZDP
User sessions terminate
during patching
User sessions remain
active through patching
Storage Service

Fast database cloning
Clone database without copying data
▪ Creation of a clone is nearly instantaneous
▪ Data copy happens only on write – when original
and cloned volume data differ
Example use cases
▪ Clone a production DB to run tests
▪ Reorganize a database
▪ Save a point in time snapshot for analysis without
impacting production system.
PRODUCTION DATABASE
CLONE CLONE
CLONE
DEV/TEST
APPLICATIONS
BENCHMARKS
PRODUCTION
APPLICATIONS
PRODUCTION
APPLICATIONS

Database backtrack
Backtrack brings the database to a point in time without requiring restore from backups
• Backtracking from an unintentional DML or DDL operation
• Backtrack is not destructive. You can backtrack multiple times to find the right point in time
t0 t1 t2
t0 t1
t2
t3 t4
t3
t4
Rewind to t1
Rewind to t3
Invisible Invisible

How does backtrack work?
We keep periodic snapshot of each segment; we also preserve the redo logs
For backtrack, we identify the appropriate segment snapshots
Apply log streams to segment snapshots in parallel and asynchronously
SEGMENT
SNAPSHOT
LOG
RECORDS
RECOVERY
POINT
SEGMENT 1
SEGMENT 2
SEGMENT 3
TIME

Global replication - logical
Faster d isaster recover y an d en h an c ed d ata locality
Promote read-replica to a master
for faster recovery in the event of
disaster
Bring data close to your
customer’s applications in
different regions
Promote to a master for easy
migration

Global replication – physical
AZ 1 AZ 3
Primary
Instance
AZ 2
Aurora
Replica
Aurora
Replica
ReplicationServer
ReplicationAgentREDO LOG FRM FILES
TYPE OF WRITE
Aurora
Replica
(optional)
Region 1: Primary Aurora Cluster Region 2: Read Replica
AZ 1
Consistently fast, low-lag, high-performance replication for global relational databases
• Global-scale replication in seconds or less
• Dedicated replication infrastructure ensures unconstrained performance
• Local reads, faster recovery, tighter DR objectives, and seamless cross-region migration
Async.

Online DDL: MySQL vs. Aurora
▪ Full Table copy in the background
▪ Rebuilds all indexes – can take hours or days
▪ DDL operation impacts DML throughput
▪ Table lock applied to apply DML changes
INDEX
LEAFLEAFLEAF LEAF
INDEX
ROOT
table name operation column-name time-stamp
Table 1
Table 2
Table 3
add-col
add-col
add-col
column-abc
column-qpr
column-xyz
t1
t2
t3
▪ Use schema versioning to decode the block.
▪ Modify-on-write primitive to upgrade to latest schema
▪ Currently support add NULLable column at end of table
▪ Add column anywhere and with default coming soon.
MySQL Aurora

Online DDL performance
On r3.large
On r3.8xlarge
Aurora MySQL 5.6 MySQL 5.7
10-GB table 0.27 sec 3,960 sec 1,600 sec
Aurora MySQL 5.6 MySQL 5.7
10-GB table 0.06 sec 900 sec 1,080 sec

AURORA SERVERLESS

Aurora Serverless use cases
• Infrequently used applications (e.g., a low-volume blog
site)
• Applications with intermittent or unpredictable load (e.g.,
a news site)
• Development or test databases not needed on nights or
weekends
• Consolidated fleets of multi-tenant SaaS applications

Aurora Serverless
• Starts up on demand, shuts
down when not in use
• Scales up & down
automatically
• No application impact when
scaling
• Pay per second, 1 minute
minimum
WARM POOL
OF INSTANCES
APPLICATION
DATABASE STORAGE
SCALABLE DB CAPACITY
REQUEST ROUTER
DATABASE ENDPOINT

Database endpoint provisioning
• When you provision a database,
Aurora Serverless:
▪ Provisions VPC endpoints for the
application connectors
▪ Initializes request routers to accept
connections
▪ Creates an Aurora storage volume
• A database instance is only
provisioned when the first request
arrives
APPLICATION
CUSTOMER VPC
VPC
ENDPOINTS
VPC
ENDPOINTS
NETWORK LOAD BALANCER
STORAGE
VOLUME
REQUEST
ROUTERS

Instance provisioning and scaling
• First request triggers instance
provisioning. Usually 1 – 3 seconds
• Instance auto-scales up and down as
workload changes. Usually 1 – 3
seconds
• Instances hibernate after user-
defined period of inactivity
• Scaling operations are transparent to
the application – user sessions are
not terminated
• Database storage is persisted until
explicitly deleted by user
DATABASE STORAGE
WARM POOL
APPLICATION
REQUEST
ROUTER
CURRENT
INSTANCE
NEW
INSTANCE

Scaling to and from zero
• If wanted, instances are removed
after user-defined period of
inactivity
• While database is stopped, you
only pay for storage
• A database request triggers
instance provisioning (usually
under 5 seconds)
• Scaling to zero is more expensive as
there is no running server from
which to copy the buffer cache
WARM POOL
INSTANCE
DATABASE STORAGE
REQUEST
ROUTER
APPLICATION
PROXY ENDPOINT

AURORA MULTI-MASTER

Distributed Lock Manager
GLOBAL
RESOURCE
MANAGER
SQL
TRANSACTIONS
CACHING
LOGGING
SQL
TRANSACTIONS
CACHING
LOGGING
SHARED DISK CLUSTER
STORAGE
APPLICATION
LOCKING PROTOCOL MESSAGES
SHARED STORAGE
M1 M2 M3
M1 M1 M1M2 M3 M2
Cons
Heavyweight cache coherency traffic, on per-lock basis
Networking can be expensive
Negative scaling when hot blocks
Pros
All data available to all nodes
Easy to build applications
Similar cache coherency as in multi-processors

Consensus with two phase or Paxos commit
DATA
RANGE #1
DATA
RANGE #2
DATA
RANGE #4
DATA
RANGE #3
DATA
RANGE #5
L
L L
L
L
SHARED NOTHING
SQL
TRANSACTIONS
CACHING
LOGGING
SQL
TRANSACTIONS
CACHING
LOGGING
APPLICATION
STORAGE STORAGE
Cons
Heavyweight commit and membership change protocols
Range partitioning can result in hot partitions, not just hot
blocks. Re-partitioning expensive.
Cross partition operations expensive. Better at small requests
Pros
Query broken up and sent to data node
Less coherence traffic – just for commits
Can scale to many nodes

Conflict resolution using distributed ledgers
• There are many “oases” of
consistency in Aurora
• The database nodes know
transaction orders from that
node
• The storage nodes know
transactions orders applied
at that node
• Only have conflicts when
data changed at both
multiple database nodes
AND multiple storage nodes
• Much less coordination
required
2 3 4 5 61
BT1 [P1]
BT2 [P1]
BT3 [P1]
BT4 [P1]
BT1
BT2
BT3
BT4
Page1
Quorum
OT1
OT2
OT3
OT4
Page 1 Page 2
2 3 4 5 61
OT1[P2]
OT2[P2]
OT3[P2]
OT4[P2]
PAGE1 PAGE2
MASTER
MASTER
Page 2

Hierarchical conflict resolution
• Both masters are writing to
two pages P1 and P2
• BLUE master wins the
quorum at page P1;
ORANGE master wins
quorum at P2
• Both masters recognize the
conflict and have two
choices: (1) roll back the
transactions or (2) escalate
to regional resolver
• Regional arbitrator decides
who wins the tie breaker .
2 3 4 5 61
BT1 [P1]
OT1 [P1]
2 3 4 5 61
OT1[P2]
BT1[P2]
PAGE1 PAGE2
MASTER
MASTER
BT1 OT1
Regional
resolver
Page 1 Page 2 Page 1 Page 2
Quorum
X X

Crash recovery in multi-master
CRASH
MULTI-MASTERSINGLE MASTER
Log records Gaps
Volume Complete
LSN (VCL)
AT CRASH
IMMEDIATELY AFTER RECOVERY
MASTER 1
CRASHES
GAPS
AT CRASH
Consistency Point
LSN(CPL)
VCLVCL
CPL CPL
MASTER 1
MASTER 2
IMMEDIATELY AFTER RECOVERY
Gap filled New LSNs
and Gaps
Master 1
Recovery Point
Consistency Point
LSN(CPL)

Multi-region Multi-Master
Write accepted locally
Optimistic concurrency control – no distributed lock
manager, no chatty lock management protocol
REGION 1 REGION 2
HEAD NODES HEAD NODES
MULTI-AZ STORAGE VOLUME MULTI-AZ STORAGE VOLUME
LOCAL PARTITION LOCAL PARTITIONREMOTE PARTITION REMOTE PARTITION
Conflicts handled hierarchically – at head nodes,
at storage nodes, at AZ and region level arbitrators
Near-linear performance scaling when there is no
or low levels of conflicts

AURORA PARALLEL QUERY

Parallel query processing
Aurora storage has thousands of CPUs
▪ Presents opportunity to push down and parallelize query
processing using the storage fleet
▪ Moving processing close to data reduces network traffic and
latency
However, there are significant challenges
▪ Data stored in storage node is not range partitioned –
require full scans
▪ Data may be in-flight
▪ Read views may not allow viewing most recent data
▪ Not all functions can be pushed down to storage nodes
DATABASE NODE
STORAGE NODES
PUSH DOWN
PREDICATES
AGGREGATE
RESULTS

Processing at head node
STORAGE NODES
OPTIMIZER
EXECUTOR
INNODB
NETWORK STORAGE DRIVER
AGGREGATOR
APPLICATION
PARTIAL
RESULTS
STREAM
RESULTS
IN-FLIGHT
DATA
PQ CONTEXT
PQ PLAN
Query Optimizer produces PQ Plan and
creates PQ context based on leaf page
discovery
PQ request is sent to storage node along with
PQ context
Storage node produces
▪ Partial results streams with processed
stable rows
▪ Raw stream of unprocessed rows with
pending undos
Head node aggregates these data streams to
produce final results

Processing at storage node
Each storage node runs up to 16 PQ
processes, each associated with a parallel
query
PQ process receives PQ context
▪ List of pages to scan
▪ Read view and projections
▪ Expression evaluation byte code
PQ process makes two passes through the
page list
▪ Pass 1: Filter evaluation on InnoDB
formatted raw data
▪ Pass 2: Expression evaluation on
MySQL formatted data
PQ PROCESS
PQ PROCESS
Up to 16
TO/FROM HEAD NODE
STORAGE
NODE PROCESS
PAGE LISTS

Example: Parallel hash join
Head node sends hash join context to storage
node consisting of:
▪ List of pages – Page ID and LSNs
▪ Join expression byte code
▪ Read view – what rows to include
▪ Projections – what columns to select
PQ process performs the following steps:
▪ Scans through the lists of pages
▪ Builds partial bloom filters; collect stats
▪ Sends two streams to head node
▪ Rows with matching hashes
▪ Rows with pending undo processing
Head node performs the Join function after
merging data streams from different storage
nodes
JOIN CONTEXT
PAGE SCAN
STABLE ROWS
ROWS WITH
PENDING UNDO
BLOOM
FILTER
FILTERED AND PROJECTED
ROW STREAM
UNPROCESSED
ROW STREAM

Parallel query performance
1.71 1.31
100.16
142.4
41.39
11.58
117.06
1.77
290.66
1.15
5.61
48.87
29.95
4.29
2.76
83.47
14.35
2.27
1.27
1
2
4
8
16
32
64
128
256

Enjoy trying new features?
AURORA SERVERLESS AURORA MULTI-MASTER
AURORA PARALLEL QUERY
PERFORMANCE
INSIGHTS
Scales DB capacity
to match app needs
Scales out
writes across
multiple AZs
Improves
performance of
large queries
PERFORMANCE INSIGHTS
Assess DB load &
tune
performance
Sign up for previews: aws.amazon.com/ rds/aurora/

Submit Session Feedback
1. Tap the Schedule icon.
2. Select the session you
attended.
3. Tap Session Evaluation to
submit your feedback.

Thank you!

Amazon Aurora_Deep Dive

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Amazon Aurora_Deep Dive

Similar a Amazon Aurora_Deep Dive (20)

Más de Amazon Web Services

Más de Amazon Web Services (20)

Amazon Aurora_Deep Dive