AWS reinvent 2019 recap - Riyadh - Database and Analytics - Assif Abbasi

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
re:Invent recaps :
Databases & Analytics
Asif Abbasi, Sr. Specialist Solutions Architecvt

What do these companies have in common?

Data
The world’s most
valuable resource is
no longer oil, but data.*
*Copyright: The Economist, 2017, David Parkins
“
”

*Source: Forbes Online; New Vantage Partners - Big Data Executive Survey
85% of businesses want to be data driven
but only 37% have been successful.

010010010
01010001
100010100
Data
1 Break free from
legacy databases
Move to managed2
ü Save time and cost
ü Remove undifferentiated heavy lifting
Turn data to insights5
ü Better experiences
ü Deeper engagement
ü Efficient processes
Build
data-driven apps
4
Modernize your
data warehouse
3
ü Agility
ü Global distribution
ü Performance at scale
ü Increase scale
ü Improve performance
ü Lower cost
ü Better and faster insights
ü Broader access to analytics
How do you build momentum?

010010010
01010001
100010100
1 Break free from
legacy databases
Move to managed2
Build
data-driven apps
4
Modernize your
data warehouse
3
The Data Flywheel
Data

010010010
01010001
100010100
Data
1 Break free from
legacy databases
Move to managed2
Build
data-driven apps
4
Modernize your
data warehouse
3
The Data Flywheel
Modernize your data infrastructure
Get the most value from your data

010010010
01010001
100010100
Data
1 Break free from
legacy databases
Move to managed2
Build
data-driven apps
4
Modernize your
data warehouse
3
The Data FlywheelModernizeyour
datainfrastructure
Getthemostvalue
fromyourdata

Old-guard database providers
Very
expensive
Proprietary Lock-in
Punitive
licensing
You’ve
got mail

The picture can't be displayed.
Customers are moving to open databases

The picture can't be displayed.
+
Commercial-grade performance and reliability?
Customers are moving to open databases

Amazon Aurora
MySQL and PostgreSQL compatible relational database built for the cloud
Performance and availability of commercial-grade databases at 1/10th the cost
Performance
and scalability
5x throughput of MySQL
3x throughput of PostgreSQL
Up to15 read replicas
Scale out reads and writes
across multiple data centers
Fully managed
Managed by RDS:
no hardware provisioning,
software patching, setup,
configuration, or backups
Availability
and durability
Fault-tolerant self-healing storage
Six copies of data across three AZs
Continuous backup to S3
Single Global database with cross-
region replication
Network isolation
Encryption at rest/transit
Highly secure

Challenges with integrating ML with your database
Typical steps of incorporating ML into an application
Write application
code to read data
from the database
2
Query and format the
data for the ML
algorithm
3 Call an ML service to
run the algorithm4
Select and train
the model
1 Format the
output
5
Retrieve the
results back to
the application
6

Generate predictions directly from Aurora queries
Models run in SageMaker & Comprehend
Use standard SQL, no ML expertise required
Suitable for low-latency, high-volume use cases
Amazon
SageMaker
ML
Aurora
Database
Athena
Interactive
analytics
SQL
Select
From
Where
ML in Amazon Aurora and Athena
Bringing machine learning to data developers and data analysts

>200,000 databases migrated with DMS
More in 2019 than all of 2016-2018 combined

Hardware and software installation
Configuration, patching, and backups
Cluster setup and data replication for high availability
Capacity planning, and scaling clusters for compute and storage
Managing software on-premises
is time consuming and complex

Customers moving to fully managed services
Relational databases
Aurora RDS EMR
Hadoop
and Spark
Elasticsearch
Service
Operational
analytics
Managed
Streaming
for Kafka
Real-time
analytics
DynamoDB DocumentDB ElastiCache
Managed
Cassandra
Service
Non-relational databases

Amazon RDS
Managed relational database service with a choice of popular databases
Easy to administer
No infrastructure provisioning
No software installation and
patching
Built-in monitoring
Performant & scalable
Scale with an API call or a few
clicks
Read replicas for increased
throughput
Automatic Multi-AZ
data replication
Automated backup,
snapshots, and failover
Available & durable Secure and compliant
Encryption at rest and in transit
Network isolation and
resource-level permissions

How do you scale your relational database to support
tens of thousands of connections?
Serverless applications
open and close tens of
thousands of connections
within seconds
Leads to longer query
response times that limits
application scalability
A database proxy server
are difficult to deploy,
patch, and manage

Amazon RDS Proxy
Fully managed, highly available database proxy
Supports new scale of serverless application connections
Pools and shares database connections
Preserve connections during database failovers
Manages DB credentials with Secrets Manager and IAM
Fully managed—No provisioning, patching, management
RDS
Proxy
Applications
RDS
Database Instance
Connection Pooling
PREVIEW
NEW

Amazon RDS on AWS Outposts
RDS
MySQL, PostgreSQL,
AWS
Outposts
Launch RDS in your data centers with AWS Outpost
Integrate with on-premises databases and applications
Deploy secure, managed, RDS in minutes
Store data without moving to cloud
Automates provisioning, patching, backup, restoring,
scaling, and failover
PREVIEW
NEW

Operational Analytics: Amazon Elasticsearch Service
Fully managed, scalable, secure, Elasticsearch service
Open source Elasticsearch
APIs, Kibana, and
Logstash
Open-source Elasticsearch APIs
Managed Kibana
Integration with Logstash
Scale clusters up/down via a
single API call or a few clicks
Secured network isolation
with VPC, encrypt data
at-rest and in-transit
Compliant: HIPPA, PCI DSS,
and ISO
Scalable, secure,
and compliant
Pay only for
what you use
Cost-optimized workloads
No upfront fee or
usage requirement
Critical features built-in:
encryption, VPC support,
24x7 monitoring
Fully managed
Deploy Elasticsearch clusters
in minutes: simplified hardware
provisioning, software
installation/patching, failure
recovery, backups, and monitoring

Challenges with analyzing high volumes of data in real-time
Storing data is
expensive at scale
Limits the amount of
data retained for analysis
Miss out on
valuable insights

UltraWarm for Amazon Elasticsearch Service
A new warm storage tier for Elasticsearch service
Kibana
Dashboard
Amazon Elasticsearch Service domain
Application
Load
Balancer
Seamlessly extends Elasticsearch service
Reduces cost by 90% to store the same amount of data
Scale up to 3 PB of log data per cluster
Analyze years of operational data
Amazon S3
UltraWarm
Node
UltraWarm
Node
UltraWarm
Node
Active
Master Node
Queries
PREVIEW
NEW

Data Warehouse: Amazon Redshift

Most widely used Cloud Data Warehouse
Tens of thousands of customers use Redshift & process over 2EB of data per day

Robust result set
caching
Large # of tables support
~20000
Copy command support for ORC,
Parquet
IAM role chaining Elastic resize Groups
Redshift Spectrum: date formats,
scalar json and ION file formats
support, region expansion,
predicate filtering
Auto analyze
Health and performance
monitoring w/Amazon Cloud
watch
Automatic table
distribution style
Cloud watch support for
WLM queues
Performance enhancements—
hash join, vacuum, window
functions, resize ops,
aggregations, console, union all,
efficient compile code cache
Unload
to CSV
Auto WLM
~25 Query Monitoring
Rules (QMR) support
200+
new features in the past 18
months
AQUA
Concurrency Scaling DC1 migration to DC2
Resiliency of ROLLBACK
processing
Manage multi-part
query in AWS console
Auto analyze for
incremental changes on
table
Spectrum Request
Accelerator
Apply new distribution key
Redshift Spectrum: Row
group filtering in Parquet
and ORC, Nested data
support, Enhanced VPC
Routing, Multiple partitions
Faster Classic resize
with optimized data
transfer protocol
Performance: Bloom filters in
joins, complex queries that
create internal table,
communication layer
Redshift Spectrum:
Concurrency scaling
Amazon Lake Formation
integration
Auto-Vacuum sort,
Auto-Analyze and Auto
Table Sort
Auto WLM with query
priorities
Snapshot scheduler
Performance: join pushdowns
to subquery,, mixed workloads
temporary tables, rank functions,
null handling in join, single row
insert
Advisor recommendations
for distribution keys
AZ64 compression
encoding
Console redesign
Stored procedures
Spatial Processing
Column level access
control
with AWS lake formation
RA3
Performance of Inter-
Region Snapshot
Transfers
Federated
Query
Materialized
Views
Manual Pause and Resume
Amazon Redshift has been innovating quickly

Amazon Redshift Materialized Views
Defined by a SQL query, precomputed results, incrementally refreshed
Orders-of-magnitude query acceleration
Recommended for predictable and repeated queries used in
dashboarding and interactive analysis
C1 C2 C3
R1
R2
R3
C1 C2 C3 C4
R1
R2
R3
C1
R1
R2
R9
C1 C2 C3
R1
R2
R3
C1
R1
R7
R9
Materialized Views
PREVIEW
NEW

Amazon Redshift Data Lake Export
Export data directly to Amazon S3 in Apache Parquet
Save results of data transformation into S3 data lake
Export with the UNLOAD command and specify Parquet
Redshift formats, partitions, and moves data into S3
Analyze with Amazon SageMaker, Athena, and EMR
S3
Redshift
NEW

Amazon Redshift Federated Query
Analyze data across data warehouse, data lakes, and operational database
Query across multiple systems from Redshift
Combine data warehouse and transactional data
Compatible with Amazon RDS and Aurora (PostgreSQL)
SQ L
A M A Z O N
R D S
A M A Z O N
A U R O R A
A M A Z O N
R E D S H I F T
S 3 D A T A L A K E
PREVIEW
NEW

How do you scale cost-effectively for
diverse data warehouse workloads?

Amazon Redshift on RA3 instances
Optimize your data warehouse by paying for compute and storage separately
Delivers 3x the performance of existing cloud DWs
Automatically scales your DW storage capacity
DS2 customers can migrate and get 2x performance
and 2x storage for the same cost
Supports workloads up to 8 PB (compressed)
COMPUTE NODE
(RA3)
SSD Cache
S3 STORAGE
COMPUTE NODE
(RA3)
SSD Cache
COMPUTE NODE
(RA3)
SSD Cache
COMPUTE NODE
(RA3)
SSD Cache
Managed storage
$/node/hour
$/TB/month
GA
NEW

CE N TR A L SHA R E D STOR A G E
D W
C L U S T E R
It’s hard to cost-effectively
scale without compromising
performance
N E TWOR K
D W
C L U S T E R
D W
C L U S T E R
1
2
3

TODAY
CPU-DRAM THROUGHPUT
STORAGE (SSDs) THROUGHPUT
2012
12x
IMPROVEMENT
2x
IMPROVEMENT
Taking advantage of growing storage throughput
Processing more data closer to
SSD storage could dramatically
change how fast large amounts
of data can be processed over
today’s existing systems.

AQUA
(Advanced Query Accelerator)
for Amazon Redshift
An innovative new hardware-accelerated cache that delivers up
to 10x better query performance than other cloud data
warehouses
NVMe SSDs
CUSTOM ANALYTICS PROCESSORS
AWS NITRO SYSTEM
COMING IN
2020
NEW

AQUA – Advanced Query Accelerator
Redshift runs 10x faster than any other cloud data warehouse without increasing cost
AQUA brings compute to the storage layer so data
doesn’t have to move back and forth
High-speed cache on top of S3 scales out to process data
in parallel across many nodes
AWS custom-designed analytics processors accelerate data
compression, encryption, and data processing
100% compatible with the current version of RedshiftS3 STORAGE
AQUA
ADVANCED QUERY ACCELERATOR
RA3 COMPUTE CLUSTER
COMING IN
2020
NEW

Data warehousing: Amazon Redshift
Best performance,
most scalable
3x faster with RA3*
10x faster with AQUA*
Adds unlimited compute capacity
on-demand to meet unlimited
concurrent access
Lowest cost
Cost-optimized workloads
by paying compute and
storage separately
1/10th cost of Traditional
DW at $1000/TB/year
Up to 75% less than other
cloud data warehouses &
predictable costs
Data lake &
AWS integration
Analyze exabytes of data across
data warehouse, data lakes, and
operational database
Query data across various
analytics services
Most secure
& compliant
AWS-grade security (eg. VPC,
encryption with KMS, CloudTrail)
All major certifications such
as SOC, PCI, DSS, ISO,
FedRAMP, HIPPA
First and most popular cloud data warehouse
*vs other cloud DWs

Characteristics of modern applications
Internet-scale and transactional
Users: 1M+
Data volume: TB–PB–EB
Locality: Global
Performance: Milliseconds–microseconds
Request Rate: Millions
Access: Web, Mobile, IoT, devices
Scale: Up-down, Out-in
Economics: Pay for what you use
Developer access: Instant API accessSocial mediaRide hailing Media streaming Dating

Break complex apps into smaller pieces and pick the
best tool to solve each problem
This ensures that the apps are well architected and
scale effectively
Developers are now building highly distributed apps using
purpose-built databases and micro-services architecture
Developers are doing what they do best

AWS Databases – for all your application use cases

It’s challenging to manage large Cassandra clusters
Specialized expertise to setup, configure, and maintain infrastructure and software
Scaling clusters is time-consuming, manual, and prone to over-provisioning
Manual backups and error-prone restore process to maintain integrity
Unreliable upgrades with clunky rollback and debugging capabilities

Amazon Managed (Apache) Cassandra Service
Scalable, highly available, and managed Cassandra-compatible database service
No need to provision, configure,
and operate large Cassandra
clusters or add and remove
nodes manually
No servers to manage
Single-digit millisecond
performance
Scale tables up and down
automatically based on
application traffic
Virtually unlimited
throughput and storage
Single-digit millisecond
performance at scale
Apache
Cassandra-compatible
Use the same application code,
licensed drivers, and tools
built on Cassandra
Simple migration
Simple migration to Managed
Cassandra Service for
Cassandra databases on
premises or on EC2
PREVIEW
NEW

Common data categories and use cases

Data silos to
OLTP ERP CRM LOB
DW Silo 1
Business
Intelligence
Devices Web Sensors Social
DW Silo 2
Business
Intelligence Machine
learning
BI +
analytics
Data
warehousing
Data lakes
Open formats
Central catalog
Traditional data warehousing approaches don’t scale

Customers moving to data lake architectures
Bringing together the best of both worlds
Extends or evolves DW architectures
Store any data in any format
Durable, available, and exabyte scale
Secure, compliant, auditable
Run any type of analytics from DW to Predictive
Data
Warehousing
Analytics Machine
Learning
Data lake

Any type of analytics on the data lake
Data
Warehousing
Big Data
Processing
Interactive
Query
Operational
Analytics
Real-time
Analytics
Predictive
Analytics
RecommendationsVisualizations
Data lake
Data
Exchange

Any type of analytics on the data lake
Most comprehensive analytics platform
Amazon S3 | AWS Glue
Lake Formation
Data lake
Amazon
Redshift
Amazon
EMR
Amazon
Athena
Amazon
Elasticsearch
Service
Amazon
Kinesis
Amazon
MSK
Amazon
SageMaker
Amazon
Personalize
Amazon
QuickSight
AWS Data
Exchange
Data
Warehousing
Big Data
Processing
Interactive
Query
Operational
Analytics
Real time
Analytics
Predictive
Analytics
RecommendationsVisualizations
Data
Exchange

Amazon EMR
Easily Run Spark, Hadoop, Hive, Presto, HBase, and more big data apps on AWS
Low cost
50–80% reduction in costs with
EC2 Spot and Reserved Instances
Per-second billing for flexibility
Use S3 storage
Process data in S3
securely with high performance
using the EMRFS connector
Latest versions
Updated with latest open source
frameworks within 30 days
Fully managed no cluster
setup, node provisioning,
cluster tuning
Easy

Performance Improvements in Spark for Amazon EMR
Performance optimized runtime for Apache Spark, 2.6x faster performance at 1/10th the cost
*Based on TPC-DS 3 TB Benchmarking running 6 node
C4x8 extra large clusters and EMR 5.28, Spark 2.4
10,164
16,478
26,478
0 5,000 10,000 15,000 20,000 25,000 30,000
Spark with EMR (with runtime)
3rd party Managed Spark (with their
runtime)
Spark with EMR (without runtime)
Runtime total on 104 queries
(seconds—lower is better)
Runtime optimized for Apache Spark performance
100% compliant with Apache Spark APIs
Best performance
2.6x faster than Spark with EMR without runtime
1.6x faster than 3rd party Managed Spark (with their runtime)
Lowest price
1/10th the cost of 3rd party Managed Spark (with their runtime)
NEW

Amazon EMR on AWS Outposts
Launch EMR in your data centers with AWS Outpost
Integrate with existing on-premises Hadoop deployments
Deploy secure, managed, EMR clusters in minutes
Process and analyze data on-premises on AWS Outpost
EMR
Hadoop + Spark
AWS
Outposts
On-premises
Hadoop/Spark
GA
NEW

Amazon Athena
Pay per query
Pay only for queries run
Save 30–90% on per-query costs
through compression
Use S3 storage
ANSI SQL
JDBC/ODBC drivers
Multiple formats,
compression types, and
complex joins and data types
SQL
Serverless: zero infrastructure,
zero administration
Integrated with QuickSight
EasyQuery instantly
Zero setup cost
Point to S3 and start querying
Serverless, interactive query service

Amazon Athena Federated Query
Run SQL queries on data spanning multiple data stores
Redshift
Data warehousing
ElastiCache
Redis
Aurora
MySQL, PostgreSQL
DynamoDB
Key value, Document
DocumentDB
Document
S3/Glacier
Run connectors in AWS Lambda: no servers to manage
Run SQL queries on relational, non-relational, object,
or custom data sources; in the cloud or on-premises
Open Source Connectors for common data sources
Build connectors to custom data sources
PREVIEW
NEW

Amazon QuickSight
First BI service built for the cloud with pay-per-session pricing & ML insights for everyone
Elastic Scaling
Auto-scale 10 to 10K+
users in minutes
Pay-as-you-go
Serverless
Create dashboards in
minutes
Deploy globally
without provisioning a
single server
Deeply integrated
with AWS services
Secure, Private access to
AWS data
Integrated S3 data lake
permissions through AWS IAM
API Support
Programmatically onboard users
and manage content
Easily embed in your apps
NEW

ML predictions in Amazon QuickSight (preview)
AWS/On-premise data sources
• Excel
• CSV
• MySQL
• PostgreSQL
• Maria DB
• Presto
• Spark
• SQL Server
• Amazon
Redshift
• RDS
• S3
• Athena
• Aurora
• EMR
• Snowflake
• Teradata
• Salesforce
• Square
• Adobe
Analytics
• Jira
• ServiceNow
• Twitter
• GitHub
1 Connect to any data:
Data lakes, SQL engines, 3rd
party applications and on-
premises databases
2 Select an ML model:
Create models with Amazon
SageMaker AutoPilot, existing
custom models and packaged
models from AWS Marketplace.
Custom
Models
QuickSight
Amazon
SageMaker
AutoPilot
Models
AWS
Marketplace
3 Visualize and share:
Analyze results, create
visualizations, build dashboards
/ email reports and share to
business stakeholders
NEW

Build predictive dashboards in hours with
point-and-click, no coding required

Easily embed analytics in your own tools
Powered by QuickSight APIs and flexible customization. Entirely serverless.
Deploy and manage dashboards + data via APIs
Match your application UI with QuickSight Themes
Embed dashboards in apps without servers
• Fast, consistent performance
• Pay-per-session
Automatically scale to 10s of 1000s of users
• No server management
• No scripting
NEW

Data exchange: AWS Data Exchange
Easily find and subscribe to 3rd-party data in the cloud
Efficiently access
3rd party data
Simplifies access to data: No
need to receive physical media,
manage FTP credentials, or
integrate with different APIs
Minimize legal reviews and
negotiations
Quickly find diverse
data in one place
>1,000 data products
>80 data providers including
include Dow Jones, Change
Healthcare, Foursquare, Dun
& Bradstreet, Thomson
Reuters, Pitney Bowes, Lexis
Nexis, and Deloitte
Easily analyze data
Download or copy data to S3
Combine, analyze, and model
with existing data
Analyze data with EMR,
Redshift, Athena, and AWS
Glue
GA
NEW

Our portfolio
Broad and deep portfolio, purpose-built for builders
S3/Glacier
Glue
ETL & Data Catalog
Lake Formation
Data Lakes
Database Migration Service | Snowball | Snowmobile | Kinesis Data Firehose | Kinesis Data Streams | Managed Streaming for Kafka
Data Movement
Data Lake
Business Intelligence & Machine Learning
Data Exchange
Data exchange
NEW
QuickSight
Visualizations
SageMaker
ML
Comprehend
NLP
Transcribe
Speech-to-text
Textract
Extract text
Personalize
Recommendation
Forecast
Forecasts
Translate
Translation
CodeGuru
Code reviews
Kendra
Enterprise search
NEW NEW
RDS
MySQL, PostgreSQL,
MariaDB, Oracle, SQL Server,
RDS on VMware
Aurora
MySQL, PostgreSQL
DynamoDB
Key value, Document
ElastiCache
Redis, Memcached
Neptune
Graph
Timestream
Time Series
QLDB
Ledger Database
Analytics Databases
Managed
Blockchain
Blockchain
Templates
Blockchain
Managed Apache
Cassandra Service
Wide column
NEW
DocumentDB
Document
Redshift
Data warehousing
EMR
Hadoop + Spark
Kinesis Data Analytics
Real time
Elasticsearch Service
Operational Analytics
Athena
Interactive analytics
NEW
NEW
NEW
NEW
NEW
AQUA EMR on Outposts
UltraWarm
RDS Proxy
RDS on Outposts

AWS reinvent 2019 recap - Riyadh - Database and Analytics - Assif Abbasi

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a AWS reinvent 2019 recap - Riyadh - Database and Analytics - Assif Abbasi

Similar a AWS reinvent 2019 recap - Riyadh - Database and Analytics - Assif Abbasi (20)

Más de AWS Riyadh User Group

Más de AWS Riyadh User Group (17)

Último

Último (20)

AWS reinvent 2019 recap - Riyadh - Database and Analytics - Assif Abbasi