SlideShare una empresa de Scribd logo
1 de 63
Descargar para leer sin conexión
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
re:Invent recaps :
Databases & Analytics
Asif Abbasi, Sr. Specialist Solutions Architecvt
What do these companies have in common?
Data
The world’s most
valuable resource is
no longer oil, but data.*
*Copyright: The Economist, 2017, David Parkins
“
”
*Source: Forbes Online; New Vantage Partners - Big Data Executive Survey
85% of businesses want to be data driven
but only 37% have been successful.
010010010
01010001
100010100
Data
1 Break free from
legacy databases
Move to managed2
ü Save time and cost
ü Remove undifferentiated heavy lifting
Turn data to insights5
ü Better experiences
ü Deeper engagement
ü Efficient processes
Build
data-driven apps
4
Modernize your
data warehouse
3
ü Agility
ü Global distribution
ü Performance at scale
ü Increase scale
ü Improve performance
ü Lower cost
ü Better and faster insights
ü Broader access to analytics
How do you build momentum?
010010010
01010001
100010100
1 Break free from
legacy databases
Move to managed2
Turn data to insights5
Build
data-driven apps
4
Modernize your
data warehouse
3
The Data Flywheel
Data
010010010
01010001
100010100
Data
1 Break free from
legacy databases
Move to managed2
Turn data to insights5
Build
data-driven apps
4
Modernize your
data warehouse
3
The Data Flywheel
Modernize your data infrastructure
Get the most value from your data
010010010
01010001
100010100
Data
1 Break free from
legacy databases
Move to managed2
Turn data to insights5
Build
data-driven apps
4
Modernize your
data warehouse
3
The Data FlywheelModernizeyour
datainfrastructure
Getthemostvalue
fromyourdata
Old-guard database providers
Very
expensive
Proprietary Lock-in
Punitive
licensing
You’ve
got mail
The picture can't be displayed.
Customers are moving to open databases
The picture can't be displayed.
+
Commercial-grade performance and reliability?
Customers are moving to open databases
Amazon Aurora
MySQL and PostgreSQL compatible relational database built for the cloud
Performance and availability of commercial-grade databases at 1/10th the cost
Performance
and scalability
5x throughput of MySQL
3x throughput of PostgreSQL
Up to15 read replicas
Scale out reads and writes
across multiple data centers
Fully managed
Managed by RDS:
no hardware provisioning,
software patching, setup,
configuration, or backups
Availability
and durability
Fault-tolerant self-healing storage
Six copies of data across three AZs
Continuous backup to S3
Single Global database with cross-
region replication
Network isolation
Encryption at rest/transit
Highly secure
Challenges with integrating ML with your database
Typical steps of incorporating ML into an application
Write application
code to read data
from the database
2
Query and format the
data for the ML
algorithm
3 Call an ML service to
run the algorithm4
Select and train
the model
1 Format the
output
5
Retrieve the
results back to
the application
6
Generate predictions directly from Aurora queries
Models run in SageMaker & Comprehend
Use standard SQL, no ML expertise required
Suitable for low-latency, high-volume use cases
Amazon
SageMaker
ML
Aurora
Database
Athena
Interactive
analytics
SQL
Select
From
Where
ML in Amazon Aurora and Athena
Bringing machine learning to data developers and data analysts
>200,000 databases migrated with DMS
More in 2019 than all of 2016-2018 combined
010010010
01010001
100010100
Data
1 Break free from
legacy databases
Move to managed2
Turn data to insights5
Build
data-driven apps
4
Modernize your
data warehouse
3
The Data FlywheelModernizeyour
datainfrastructure
Getthemostvalue
fromyourdata
Hardware and software installation
Configuration, patching, and backups
Cluster setup and data replication for high availability
Capacity planning, and scaling clusters for compute and storage
Managing software on-premises
is time consuming and complex
Customers moving to fully managed services
Relational databases
Aurora RDS EMR
Hadoop
and Spark
Elasticsearch
Service
Operational
analytics
Managed
Streaming
for Kafka
Real-time
analytics
DynamoDB DocumentDB ElastiCache
Managed
Cassandra
Service
Non-relational databases
Amazon RDS
Managed relational database service with a choice of popular databases
Easy to administer
No infrastructure provisioning
No software installation and
patching
Built-in monitoring
Performant & scalable
Scale with an API call or a few
clicks
Read replicas for increased
throughput
Automatic Multi-AZ
data replication
Automated backup,
snapshots, and failover
Available & durable Secure and compliant
Encryption at rest and in transit
Network isolation and
resource-level permissions
How do you scale your relational database to support
tens of thousands of connections?
Serverless applications
open and close tens of
thousands of connections
within seconds
Leads to longer query
response times that limits
application scalability
A database proxy server
are difficult to deploy,
patch, and manage
Amazon RDS Proxy
Fully managed, highly available database proxy
Supports new scale of serverless application connections
Pools and shares database connections
Preserve connections during database failovers
Manages DB credentials with Secrets Manager and IAM
Fully managed—No provisioning, patching, management
RDS
Proxy
Applications
RDS
Database Instance
Connection Pooling
PREVIEW
NEW
Amazon RDS on AWS Outposts
RDS
MySQL, PostgreSQL,
AWS
Outposts
Launch RDS in your data centers with AWS Outpost
Integrate with on-premises databases and applications
Deploy secure, managed, RDS in minutes
Store data without moving to cloud
Automates provisioning, patching, backup, restoring,
scaling, and failover
PREVIEW
NEW
Operational Analytics: Amazon Elasticsearch Service
Fully managed, scalable, secure, Elasticsearch service
Open source Elasticsearch
APIs, Kibana, and
Logstash
Open-source Elasticsearch APIs
Managed Kibana
Integration with Logstash
Scale clusters up/down via a
single API call or a few clicks
Secured network isolation
with VPC, encrypt data
at-rest and in-transit
Compliant: HIPPA, PCI DSS,
and ISO
Scalable, secure,
and compliant
Pay only for
what you use
Cost-optimized workloads
No upfront fee or
usage requirement
Critical features built-in:
encryption, VPC support,
24x7 monitoring
Fully managed
Deploy Elasticsearch clusters
in minutes: simplified hardware
provisioning, software
installation/patching, failure
recovery, backups, and monitoring
Challenges with analyzing high volumes of data in real-time
Storing data is
expensive at scale
Limits the amount of
data retained for analysis
Miss out on
valuable insights
UltraWarm for Amazon Elasticsearch Service
A new warm storage tier for Elasticsearch service
Kibana
Dashboard
Amazon Elasticsearch Service domain
Application
Load
Balancer
Seamlessly extends Elasticsearch service
Reduces cost by 90% to store the same amount of data
Scale up to 3 PB of log data per cluster
Analyze years of operational data
Amazon S3
UltraWarm
Node
UltraWarm
Node
UltraWarm
Node
Active
Master Node
Queries
PREVIEW
NEW
010010010
01010001
100010100
Data
1 Break free from
legacy databases
Move to managed2
Turn data to insights5
Build
data-driven apps
4
Modernize your
data warehouse
3
The Data FlywheelModernizeyour
datainfrastructure
Getthemostvalue
fromyourdata
Data Warehouse: Amazon Redshift
Most widely used Cloud Data Warehouse
Tens of thousands of customers use Redshift & process over 2EB of data per day
Robust result set
caching
Large # of tables support
~20000
Copy command support for ORC,
Parquet
IAM role chaining Elastic resize Groups
Redshift Spectrum: date formats,
scalar json and ION file formats
support, region expansion,
predicate filtering
Auto analyze
Health and performance
monitoring w/Amazon Cloud
watch
Automatic table
distribution style
Cloud watch support for
WLM queues
Performance enhancements—
hash join, vacuum, window
functions, resize ops,
aggregations, console, union all,
efficient compile code cache
Unload
to CSV
Auto WLM
~25 Query Monitoring
Rules (QMR) support
200+
new features in the past 18
months
AQUA
Concurrency Scaling DC1 migration to DC2
Resiliency of ROLLBACK
processing
Manage multi-part
query in AWS console
Auto analyze for
incremental changes on
table
Spectrum Request
Accelerator
Apply new distribution key
Redshift Spectrum: Row
group filtering in Parquet
and ORC, Nested data
support, Enhanced VPC
Routing, Multiple partitions
Faster Classic resize
with optimized data
transfer protocol
Performance: Bloom filters in
joins, complex queries that
create internal table,
communication layer
Redshift Spectrum:
Concurrency scaling
Amazon Lake Formation
integration
Auto-Vacuum sort,
Auto-Analyze and Auto
Table Sort
Auto WLM with query
priorities
Snapshot scheduler
Performance: join pushdowns
to subquery,, mixed workloads
temporary tables, rank functions,
null handling in join, single row
insert
Advisor recommendations
for distribution keys
AZ64 compression
encoding
Console redesign
Stored procedures
Spatial Processing
Column level access
control
with AWS lake formation
RA3
Performance of Inter-
Region Snapshot
Transfers
Federated
Query
Materialized
Views
Manual Pause and Resume
Amazon Redshift has been innovating quickly
Amazon Redshift Materialized Views
Defined by a SQL query, precomputed results, incrementally refreshed
Orders-of-magnitude query acceleration
Recommended for predictable and repeated queries used in
dashboarding and interactive analysis
C1 C2 C3
R1
R2
R3
C1 C2 C3 C4
R1
R2
R3
C1
R1
R2
R9
C1 C2 C3
R1
R2
R3
C1
R1
R7
R9
Materialized Views
PREVIEW
NEW
Amazon Redshift Data Lake Export
Export data directly to Amazon S3 in Apache Parquet
Save results of data transformation into S3 data lake
Export with the UNLOAD command and specify Parquet
Redshift formats, partitions, and moves data into S3
Analyze with Amazon SageMaker, Athena, and EMR
S3
Redshift
NEW
Amazon Redshift Federated Query
Analyze data across data warehouse, data lakes, and operational database
Query across multiple systems from Redshift
Combine data warehouse and transactional data
Compatible with Amazon RDS and Aurora (PostgreSQL)
SQ L
A M A Z O N
R D S
A M A Z O N
A U R O R A
A M A Z O N
R E D S H I F T
S 3 D A T A L A K E
PREVIEW
NEW
How do you scale cost-effectively for
diverse data warehouse workloads?
Amazon Redshift on RA3 instances
Optimize your data warehouse by paying for compute and storage separately
Delivers 3x the performance of existing cloud DWs
Automatically scales your DW storage capacity
DS2 customers can migrate and get 2x performance
and 2x storage for the same cost
Supports workloads up to 8 PB (compressed)
COMPUTE NODE
(RA3)
SSD Cache
S3 STORAGE
COMPUTE NODE
(RA3)
SSD Cache
COMPUTE NODE
(RA3)
SSD Cache
COMPUTE NODE
(RA3)
SSD Cache
Managed storage
$/node/hour
$/TB/month
GA
NEW
CE N TR A L SHA R E D STOR A G E
D W
C L U S T E R
It’s hard to cost-effectively
scale without compromising
performance
N E TWOR K
D W
C L U S T E R
D W
C L U S T E R
1
2
3
TODAY
CPU-DRAM THROUGHPUT
STORAGE (SSDs) THROUGHPUT
2012
12x
IMPROVEMENT
2x
IMPROVEMENT
Taking advantage of growing storage throughput
Processing more data closer to
SSD storage could dramatically
change how fast large amounts
of data can be processed over
today’s existing systems.
AQUA
(Advanced Query Accelerator)
for Amazon Redshift
An innovative new hardware-accelerated cache that delivers up
to 10x better query performance than other cloud data
warehouses
NVMe SSDs
CUSTOM ANALYTICS PROCESSORS
AWS NITRO SYSTEM
COMING IN
2020
NEW
AQUA – Advanced Query Accelerator
Redshift runs 10x faster than any other cloud data warehouse without increasing cost
AQUA brings compute to the storage layer so data
doesn’t have to move back and forth
High-speed cache on top of S3 scales out to process data
in parallel across many nodes
AWS custom-designed analytics processors accelerate data
compression, encryption, and data processing
100% compatible with the current version of RedshiftS3 STORAGE
AQUA
ADVANCED QUERY ACCELERATOR
RA3 COMPUTE CLUSTER
COMING IN
2020
NEW
Data warehousing: Amazon Redshift
Best performance,
most scalable
3x faster with RA3*
10x faster with AQUA*
Adds unlimited compute capacity
on-demand to meet unlimited
concurrent access
Lowest cost
Cost-optimized workloads
by paying compute and
storage separately
1/10th cost of Traditional
DW at $1000/TB/year
Up to 75% less than other
cloud data warehouses &
predictable costs
Data lake &
AWS integration
Analyze exabytes of data across
data warehouse, data lakes, and
operational database
Query data across various
analytics services
Most secure
& compliant
AWS-grade security (eg. VPC,
encryption with KMS, CloudTrail)
All major certifications such
as SOC, PCI, DSS, ISO,
FedRAMP, HIPPA
First and most popular cloud data warehouse
*vs other cloud DWs
010010010
01010001
100010100
Data
1 Break free from
legacy databases
Move to managed2
Turn data to insights5
Build
data-driven apps
4
Modernize your
data warehouse
3
The Data FlywheelModernizeyour
datainfrastructure
Getthemostvalue
fromyourdata
Characteristics of modern applications
Internet-scale and transactional
Users: 1M+
Data volume: TB–PB–EB
Locality: Global
Performance: Milliseconds–microseconds
Request Rate: Millions
Access: Web, Mobile, IoT, devices
Scale: Up-down, Out-in
Economics: Pay for what you use
Developer access: Instant API accessSocial mediaRide hailing Media streaming Dating
Break complex apps into smaller pieces and pick the
best tool to solve each problem
This ensures that the apps are well architected and
scale effectively
Developers are now building highly distributed apps using
purpose-built databases and micro-services architecture
Developers are doing what they do best
AWS Databases – for all your application use cases
It’s challenging to manage large Cassandra clusters
Specialized expertise to setup, configure, and maintain infrastructure and software
Scaling clusters is time-consuming, manual, and prone to over-provisioning
Manual backups and error-prone restore process to maintain integrity
Unreliable upgrades with clunky rollback and debugging capabilities
Amazon Managed (Apache) Cassandra Service
Scalable, highly available, and managed Cassandra-compatible database service
No need to provision, configure,
and operate large Cassandra
clusters or add and remove
nodes manually
No servers to manage
Single-digit millisecond
performance
Scale tables up and down
automatically based on
application traffic
Virtually unlimited
throughput and storage
Single-digit millisecond
performance at scale
Apache
Cassandra-compatible
Use the same application code,
licensed drivers, and tools
built on Cassandra
Simple migration
Simple migration to Managed
Cassandra Service for
Cassandra databases on
premises or on EC2
PREVIEW
NEW
Common data categories and use cases
010010010
01010001
100010100
Data
1 Break free from
legacy databases
Move to managed2
Turn data to insights5
Build
data-driven apps
4
Modernize your
data warehouse
3
The Data FlywheelModernizeyour
datainfrastructure
Getthemostvalue
fromyourdata
Data silos to
OLTP ERP CRM LOB
DW Silo 1
Business
Intelligence
Devices Web Sensors Social
DW Silo 2
Business
Intelligence Machine
learning
BI +
analytics
Data
warehousing
Data lakes
Open formats
Central catalog
Traditional data warehousing approaches don’t scale
Customers moving to data lake architectures
Bringing together the best of both worlds
Extends or evolves DW architectures
Store any data in any format
Durable, available, and exabyte scale
Secure, compliant, auditable
Run any type of analytics from DW to Predictive
Data
Warehousing
Analytics Machine
Learning
Data lake
Any type of analytics on the data lake
Data
Warehousing
Big Data
Processing
Interactive
Query
Operational
Analytics
Real-time
Analytics
Predictive
Analytics
RecommendationsVisualizations
Data lake
Data
Exchange
Any type of analytics on the data lake
Most comprehensive analytics platform
Amazon S3 | AWS Glue
Lake Formation
Data lake
Amazon
Redshift
Amazon
EMR
Amazon
Athena
Amazon
Elasticsearch
Service
Amazon
Kinesis
Amazon
MSK
Amazon
SageMaker
Amazon
Personalize
Amazon
QuickSight
AWS Data
Exchange
Data
Warehousing
Big Data
Processing
Interactive
Query
Operational
Analytics
Real time
Analytics
Predictive
Analytics
RecommendationsVisualizations
Data
Exchange
Amazon EMR
Easily Run Spark, Hadoop, Hive, Presto, HBase, and more big data apps on AWS
Low cost
50–80% reduction in costs with
EC2 Spot and Reserved Instances
Per-second billing for flexibility
Use S3 storage
Process data in S3
securely with high performance
using the EMRFS connector
Latest versions
Updated with latest open source
frameworks within 30 days
Fully managed no cluster
setup, node provisioning,
cluster tuning
Easy
Performance Improvements in Spark for Amazon EMR
Performance optimized runtime for Apache Spark, 2.6x faster performance at 1/10th the cost
*Based on TPC-DS 3 TB Benchmarking running 6 node
C4x8 extra large clusters and EMR 5.28, Spark 2.4
10,164
16,478
26,478
0 5,000 10,000 15,000 20,000 25,000 30,000
Spark with EMR (with runtime)
3rd party Managed Spark (with their
runtime)
Spark with EMR (without runtime)
Runtime total on 104 queries
(seconds—lower is better)
Runtime optimized for Apache Spark performance
100% compliant with Apache Spark APIs
Best performance
2.6x faster than Spark with EMR without runtime
1.6x faster than 3rd party Managed Spark (with their runtime)
Lowest price
1/10th the cost of 3rd party Managed Spark (with their runtime)
NEW
Amazon EMR on AWS Outposts
Launch EMR in your data centers with AWS Outpost
Integrate with existing on-premises Hadoop deployments
Deploy secure, managed, EMR clusters in minutes
Process and analyze data on-premises on AWS Outpost
EMR
Hadoop + Spark
AWS
Outposts
On-premises
Hadoop/Spark
GA
NEW
Amazon Athena
Pay per query
Pay only for queries run
Save 30–90% on per-query costs
through compression
Use S3 storage
ANSI SQL
JDBC/ODBC drivers
Multiple formats,
compression types, and
complex joins and data types
SQL
Serverless: zero infrastructure,
zero administration
Integrated with QuickSight
EasyQuery instantly
Zero setup cost
Point to S3 and start querying
Serverless, interactive query service
Amazon Athena Federated Query
Run SQL queries on data spanning multiple data stores
Redshift
Data warehousing
ElastiCache
Redis
Aurora
MySQL, PostgreSQL
DynamoDB
Key value, Document
DocumentDB
Document
S3/Glacier
Run connectors in AWS Lambda: no servers to manage
Run SQL queries on relational, non-relational, object,
or custom data sources; in the cloud or on-premises
Open Source Connectors for common data sources
Build connectors to custom data sources
PREVIEW
NEW
Amazon QuickSight
First BI service built for the cloud with pay-per-session pricing & ML insights for everyone
Elastic Scaling
Auto-scale 10 to 10K+
users in minutes
Pay-as-you-go
Serverless
Create dashboards in
minutes
Deploy globally
without provisioning a
single server
Deeply integrated
with AWS services
Secure, Private access to
AWS data
Integrated S3 data lake
permissions through AWS IAM
API Support
Programmatically onboard users
and manage content
Easily embed in your apps
NEW
ML predictions in Amazon QuickSight (preview)
AWS/On-premise data sources
• Excel
• CSV
• MySQL
• PostgreSQL
• Maria DB
• Presto
• Spark
• SQL Server
• Amazon
Redshift
• RDS
• S3
• Athena
• Aurora
• EMR
• Snowflake
• Teradata
• Salesforce
• Square
• Adobe
Analytics
• Jira
• ServiceNow
• Twitter
• GitHub
1 Connect to any data:
Data lakes, SQL engines, 3rd
party applications and on-
premises databases
2 Select an ML model:
Create models with Amazon
SageMaker AutoPilot, existing
custom models and packaged
models from AWS Marketplace.
Custom
Models
QuickSight
Amazon
SageMaker
AutoPilot
Models
AWS
Marketplace
3 Visualize and share:
Analyze results, create
visualizations, build dashboards
/ email reports and share to
business stakeholders
NEW
Build predictive dashboards in hours with
point-and-click, no coding required
Easily embed analytics in your own tools
Powered by QuickSight APIs and flexible customization. Entirely serverless.
Deploy and manage dashboards + data via APIs
Match your application UI with QuickSight Themes
Embed dashboards in apps without servers
• Fast, consistent performance
• Pay-per-session
Automatically scale to 10s of 1000s of users
• No server management
• No scripting
NEW
Data exchange: AWS Data Exchange
Easily find and subscribe to 3rd-party data in the cloud
Efficiently access
3rd party data
Simplifies access to data: No
need to receive physical media,
manage FTP credentials, or
integrate with different APIs
Minimize legal reviews and
negotiations
Quickly find diverse
data in one place
>1,000 data products
>80 data providers including
include Dow Jones, Change
Healthcare, Foursquare, Dun
& Bradstreet, Thomson
Reuters, Pitney Bowes, Lexis
Nexis, and Deloitte
Easily analyze data
Download or copy data to S3
Combine, analyze, and model
with existing data
Analyze data with EMR,
Redshift, Athena, and AWS
Glue
GA
NEW
Our portfolio
Broad and deep portfolio, purpose-built for builders
S3/Glacier
Glue
ETL & Data Catalog
Lake Formation
Data Lakes
Database Migration Service | Snowball | Snowmobile | Kinesis Data Firehose | Kinesis Data Streams | Managed Streaming for Kafka
Data Movement
Data Lake
Business Intelligence & Machine Learning
Data Exchange
Data exchange
NEW
QuickSight
Visualizations
SageMaker
ML
Comprehend
NLP
Transcribe
Speech-to-text
Textract
Extract text
Personalize
Recommendation
Forecast
Forecasts
Translate
Translation
CodeGuru
Code reviews
Kendra
Enterprise search
NEW NEW
RDS
MySQL, PostgreSQL,
MariaDB, Oracle, SQL Server,
RDS on VMware
Aurora
MySQL, PostgreSQL
DynamoDB
Key value, Document
ElastiCache
Redis, Memcached
Neptune
Graph
Timestream
Time Series
QLDB
Ledger Database
Analytics Databases
Managed
Blockchain
Blockchain
Templates
Blockchain
Managed Apache
Cassandra Service
Wide column
NEW
DocumentDB
Document
Redshift
Data warehousing
EMR
Hadoop + Spark
Kinesis Data Analytics
Real time
Elasticsearch Service
Operational Analytics
Athena
Interactive analytics
NEW
NEW
NEW
NEW
NEW
AQUA EMR on Outposts
UltraWarm
RDS Proxy
RDS on Outposts
Thank you!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Asif Abbasi – Sr. Specialist SA Analytics

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Cloud Architectures - Jinesh Varia - GrepTheWeb
Cloud Architectures - Jinesh Varia - GrepTheWebCloud Architectures - Jinesh Varia - GrepTheWeb
Cloud Architectures - Jinesh Varia - GrepTheWeb
 
AWS Reinvent Recap 2018
AWS Reinvent Recap 2018 AWS Reinvent Recap 2018
AWS Reinvent Recap 2018
 
AWS in Media: Cloud and Serverless Architectures
AWS in Media: Cloud and Serverless ArchitecturesAWS in Media: Cloud and Serverless Architectures
AWS in Media: Cloud and Serverless Architectures
 
Migrating to the cloud - Windows on AWS
Migrating to the cloud - Windows on AWSMigrating to the cloud - Windows on AWS
Migrating to the cloud - Windows on AWS
 
AWS re:Invent 2016: Turner's cloud native media supply chain for TNT, TBS, Ad...
AWS re:Invent 2016: Turner's cloud native media supply chain for TNT, TBS, Ad...AWS re:Invent 2016: Turner's cloud native media supply chain for TNT, TBS, Ad...
AWS re:Invent 2016: Turner's cloud native media supply chain for TNT, TBS, Ad...
 
Full Stack Analytics on AWS - AWS Summit Cape Town 2017
Full Stack Analytics on AWS - AWS Summit Cape Town 2017 Full Stack Analytics on AWS - AWS Summit Cape Town 2017
Full Stack Analytics on AWS - AWS Summit Cape Town 2017
 
AWS re:Invent 2016: Building Big Data Applications with the AWS Big Data Plat...
AWS re:Invent 2016: Building Big Data Applications with the AWS Big Data Plat...AWS re:Invent 2016: Building Big Data Applications with the AWS Big Data Plat...
AWS re:Invent 2016: Building Big Data Applications with the AWS Big Data Plat...
 
Born in the Cloud; Build it Like a Startup
Born in the Cloud; Build it Like a StartupBorn in the Cloud; Build it Like a Startup
Born in the Cloud; Build it Like a Startup
 
AWS re:Invent 2016: High Performance Cinematic Production in the Cloud (MAE304)
AWS re:Invent 2016: High Performance Cinematic Production in the Cloud (MAE304)AWS re:Invent 2016: High Performance Cinematic Production in the Cloud (MAE304)
AWS re:Invent 2016: High Performance Cinematic Production in the Cloud (MAE304)
 
AWS Services for Content Production
AWS Services for Content ProductionAWS Services for Content Production
AWS Services for Content Production
 
AWS Sydney Summit 2013 - Keynote
AWS Sydney Summit 2013 - KeynoteAWS Sydney Summit 2013 - Keynote
AWS Sydney Summit 2013 - Keynote
 
Primeros pasos en desarrollo serverless
Primeros pasos en desarrollo serverlessPrimeros pasos en desarrollo serverless
Primeros pasos en desarrollo serverless
 
Hong Kong AWS Summit 2017 - Keynote
Hong Kong AWS Summit 2017 - KeynoteHong Kong AWS Summit 2017 - Keynote
Hong Kong AWS Summit 2017 - Keynote
 
Migrate and Manage Workloads with Apps Associates
Migrate and Manage Workloads with Apps AssociatesMigrate and Manage Workloads with Apps Associates
Migrate and Manage Workloads with Apps Associates
 
Semplificare l'analisi dei dati con architetture "Serverless": architetture e...
Semplificare l'analisi dei dati con architetture "Serverless": architetture e...Semplificare l'analisi dei dati con architetture "Serverless": architetture e...
Semplificare l'analisi dei dati con architetture "Serverless": architetture e...
 
Deep Dive on Microservices and Docker - AWS Summit Cape Town 2017
Deep Dive on Microservices and Docker - AWS Summit Cape Town 2017Deep Dive on Microservices and Docker - AWS Summit Cape Town 2017
Deep Dive on Microservices and Docker - AWS Summit Cape Town 2017
 
Distributed Traceability in AWS - Life of a Transaction
Distributed Traceability in AWS - Life of a TransactionDistributed Traceability in AWS - Life of a Transaction
Distributed Traceability in AWS - Life of a Transaction
 
Foundations of Amazon EC2 - SRV319
Foundations of Amazon EC2 - SRV319 Foundations of Amazon EC2 - SRV319
Foundations of Amazon EC2 - SRV319
 
Amazon EC2 and Amazon VPC Hands-on Workshop
Amazon EC2 and Amazon VPC Hands-on WorkshopAmazon EC2 and Amazon VPC Hands-on Workshop
Amazon EC2 and Amazon VPC Hands-on Workshop
 
Casi reali di Mass Migration nel Cloud: benefici tangibili ed intangibili
Casi reali di Mass Migration nel Cloud: benefici tangibili ed intangibiliCasi reali di Mass Migration nel Cloud: benefici tangibili ed intangibili
Casi reali di Mass Migration nel Cloud: benefici tangibili ed intangibili
 

Similar a AWS reinvent 2019 recap - Riyadh - Database and Analytics - Assif Abbasi

State of the Union: Database & Analytics
State of the Union: Database & AnalyticsState of the Union: Database & Analytics
State of the Union: Database & Analytics
Amazon Web Services
 
Track 3 Session 6_打造應用專屬資料庫 (Purpose-built) 與了解託管服務優勢
Track 3 Session 6_打造應用專屬資料庫 (Purpose-built) 與了解託管服務優勢Track 3 Session 6_打造應用專屬資料庫 (Purpose-built) 與了解託管服務優勢
Track 3 Session 6_打造應用專屬資料庫 (Purpose-built) 與了解託管服務優勢
Amazon Web Services
 

Similar a AWS reinvent 2019 recap - Riyadh - Database and Analytics - Assif Abbasi (20)

State of the Union: Database & Analytics
State of the Union: Database & AnalyticsState of the Union: Database & Analytics
State of the Union: Database & Analytics
 
AWS Big Data Solution Days
AWS Big Data Solution DaysAWS Big Data Solution Days
AWS Big Data Solution Days
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
 
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
 
Keynote sp summit 2014 final
Keynote sp summit 2014  finalKeynote sp summit 2014  final
Keynote sp summit 2014 final
 
What’s New in Amazon RDS for Open-Source and Commercial Databases
What’s New in Amazon RDS for Open-Source and Commercial DatabasesWhat’s New in Amazon RDS for Open-Source and Commercial Databases
What’s New in Amazon RDS for Open-Source and Commercial Databases
 
Introduction to Amazon Redshift
Introduction to Amazon RedshiftIntroduction to Amazon Redshift
Introduction to Amazon Redshift
 
What’s New in Amazon RDS for Open-Source and Commercial Databases
What’s New in Amazon RDS for Open-Source and Commercial DatabasesWhat’s New in Amazon RDS for Open-Source and Commercial Databases
What’s New in Amazon RDS for Open-Source and Commercial Databases
 
Track 3 Session 6_打造應用專屬資料庫 (Purpose-built) 與了解託管服務優勢
Track 3 Session 6_打造應用專屬資料庫 (Purpose-built) 與了解託管服務優勢Track 3 Session 6_打造應用專屬資料庫 (Purpose-built) 與了解託管服務優勢
Track 3 Session 6_打造應用專屬資料庫 (Purpose-built) 與了解託管服務優勢
 
2016 AWS Big Data Solution Days
2016 AWS Big Data Solution Days2016 AWS Big Data Solution Days
2016 AWS Big Data Solution Days
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
 
What's New in Amazon RDS for Open Source and Commercial Databases
What's New in Amazon RDS for Open Source and Commercial DatabasesWhat's New in Amazon RDS for Open Source and Commercial Databases
What's New in Amazon RDS for Open Source and Commercial Databases
 
Introduction to Amazon Relational Database Service (Amazon RDS)
Introduction to Amazon Relational Database Service (Amazon RDS)Introduction to Amazon Relational Database Service (Amazon RDS)
Introduction to Amazon Relational Database Service (Amazon RDS)
 
Introduction to Amazon Relational Database Service
Introduction to Amazon Relational Database ServiceIntroduction to Amazon Relational Database Service
Introduction to Amazon Relational Database Service
 
Introduction to Amazon Relational Database Service
Introduction to Amazon Relational Database ServiceIntroduction to Amazon Relational Database Service
Introduction to Amazon Relational Database Service
 
Auckland Summit Keynote
Auckland Summit KeynoteAuckland Summit Keynote
Auckland Summit Keynote
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Big Data Solutions Day - Calgary
Big Data Solutions Day - CalgaryBig Data Solutions Day - Calgary
Big Data Solutions Day - Calgary
 
Building with Purpose - Built Databases: Match Your Workloads to the Right Da...
Building with Purpose - Built Databases: Match Your Workloads to the Right Da...Building with Purpose - Built Databases: Match Your Workloads to the Right Da...
Building with Purpose - Built Databases: Match Your Workloads to the Right Da...
 

Más de AWS Riyadh User Group

Más de AWS Riyadh User Group (17)

Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
 
Amazon SageMaker Build, Train and Deploy Your ML Models
Amazon SageMaker Build, Train and Deploy Your ML ModelsAmazon SageMaker Build, Train and Deploy Your ML Models
Amazon SageMaker Build, Train and Deploy Your ML Models
 
AWS Technical Day Riyadh Nov 2019 - The art of mastering data protection on aws
AWS Technical Day Riyadh Nov 2019 - The art of mastering data protection on awsAWS Technical Day Riyadh Nov 2019 - The art of mastering data protection on aws
AWS Technical Day Riyadh Nov 2019 - The art of mastering data protection on aws
 
AWS Technical Day Riyadh Nov 2019 - Scaling threat detection and response in aws
AWS Technical Day Riyadh Nov 2019 - Scaling threat detection and response in awsAWS Technical Day Riyadh Nov 2019 - Scaling threat detection and response in aws
AWS Technical Day Riyadh Nov 2019 - Scaling threat detection and response in aws
 
AWS Technical Day Riyadh Nov 2019 [Migration]
AWS Technical Day Riyadh Nov 2019 [Migration]AWS Technical Day Riyadh Nov 2019 [Migration]
AWS Technical Day Riyadh Nov 2019 [Migration]
 
AWS Amplify
AWS AmplifyAWS Amplify
AWS Amplify
 
EC2 and S3 Level 100
EC2 and S3 Level 100EC2 and S3 Level 100
EC2 and S3 Level 100
 
Devops on AWS
Devops on AWSDevops on AWS
Devops on AWS
 
Blockchain on AWS
Blockchain on AWSBlockchain on AWS
Blockchain on AWS
 
AWS AI Services
AWS AI ServicesAWS AI Services
AWS AI Services
 
AWS Cloudformation Session 01
AWS Cloudformation Session 01AWS Cloudformation Session 01
AWS Cloudformation Session 01
 
AWS Cloud Security
AWS Cloud SecurityAWS Cloud Security
AWS Cloud Security
 
AWS Messaging
AWS MessagingAWS Messaging
AWS Messaging
 
Amazon Virtual Private Cloud - VPC 2
Amazon Virtual Private Cloud - VPC 2Amazon Virtual Private Cloud - VPC 2
Amazon Virtual Private Cloud - VPC 2
 
Amazon Virtual Private Cloud - VPC 1
Amazon Virtual Private Cloud - VPC 1Amazon Virtual Private Cloud - VPC 1
Amazon Virtual Private Cloud - VPC 1
 
Containers on AWS
Containers on AWSContainers on AWS
Containers on AWS
 
Amazon relational database service (rds)
Amazon relational database service (rds)Amazon relational database service (rds)
Amazon relational database service (rds)
 

Último

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

AWS reinvent 2019 recap - Riyadh - Database and Analytics - Assif Abbasi

  • 1. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. re:Invent recaps : Databases & Analytics Asif Abbasi, Sr. Specialist Solutions Architecvt
  • 2. What do these companies have in common?
  • 3. Data The world’s most valuable resource is no longer oil, but data.* *Copyright: The Economist, 2017, David Parkins “ ”
  • 4. *Source: Forbes Online; New Vantage Partners - Big Data Executive Survey 85% of businesses want to be data driven but only 37% have been successful.
  • 5. 010010010 01010001 100010100 Data 1 Break free from legacy databases Move to managed2 ü Save time and cost ü Remove undifferentiated heavy lifting Turn data to insights5 ü Better experiences ü Deeper engagement ü Efficient processes Build data-driven apps 4 Modernize your data warehouse 3 ü Agility ü Global distribution ü Performance at scale ü Increase scale ü Improve performance ü Lower cost ü Better and faster insights ü Broader access to analytics How do you build momentum?
  • 6. 010010010 01010001 100010100 1 Break free from legacy databases Move to managed2 Turn data to insights5 Build data-driven apps 4 Modernize your data warehouse 3 The Data Flywheel Data
  • 7. 010010010 01010001 100010100 Data 1 Break free from legacy databases Move to managed2 Turn data to insights5 Build data-driven apps 4 Modernize your data warehouse 3 The Data Flywheel Modernize your data infrastructure Get the most value from your data
  • 8. 010010010 01010001 100010100 Data 1 Break free from legacy databases Move to managed2 Turn data to insights5 Build data-driven apps 4 Modernize your data warehouse 3 The Data FlywheelModernizeyour datainfrastructure Getthemostvalue fromyourdata
  • 9. Old-guard database providers Very expensive Proprietary Lock-in Punitive licensing You’ve got mail
  • 10. The picture can't be displayed. Customers are moving to open databases
  • 11. The picture can't be displayed. + Commercial-grade performance and reliability? Customers are moving to open databases
  • 12. Amazon Aurora MySQL and PostgreSQL compatible relational database built for the cloud Performance and availability of commercial-grade databases at 1/10th the cost Performance and scalability 5x throughput of MySQL 3x throughput of PostgreSQL Up to15 read replicas Scale out reads and writes across multiple data centers Fully managed Managed by RDS: no hardware provisioning, software patching, setup, configuration, or backups Availability and durability Fault-tolerant self-healing storage Six copies of data across three AZs Continuous backup to S3 Single Global database with cross- region replication Network isolation Encryption at rest/transit Highly secure
  • 13. Challenges with integrating ML with your database Typical steps of incorporating ML into an application Write application code to read data from the database 2 Query and format the data for the ML algorithm 3 Call an ML service to run the algorithm4 Select and train the model 1 Format the output 5 Retrieve the results back to the application 6
  • 14. Generate predictions directly from Aurora queries Models run in SageMaker & Comprehend Use standard SQL, no ML expertise required Suitable for low-latency, high-volume use cases Amazon SageMaker ML Aurora Database Athena Interactive analytics SQL Select From Where ML in Amazon Aurora and Athena Bringing machine learning to data developers and data analysts
  • 15. >200,000 databases migrated with DMS More in 2019 than all of 2016-2018 combined
  • 16. 010010010 01010001 100010100 Data 1 Break free from legacy databases Move to managed2 Turn data to insights5 Build data-driven apps 4 Modernize your data warehouse 3 The Data FlywheelModernizeyour datainfrastructure Getthemostvalue fromyourdata
  • 17. Hardware and software installation Configuration, patching, and backups Cluster setup and data replication for high availability Capacity planning, and scaling clusters for compute and storage Managing software on-premises is time consuming and complex
  • 18. Customers moving to fully managed services Relational databases Aurora RDS EMR Hadoop and Spark Elasticsearch Service Operational analytics Managed Streaming for Kafka Real-time analytics DynamoDB DocumentDB ElastiCache Managed Cassandra Service Non-relational databases
  • 19. Amazon RDS Managed relational database service with a choice of popular databases Easy to administer No infrastructure provisioning No software installation and patching Built-in monitoring Performant & scalable Scale with an API call or a few clicks Read replicas for increased throughput Automatic Multi-AZ data replication Automated backup, snapshots, and failover Available & durable Secure and compliant Encryption at rest and in transit Network isolation and resource-level permissions
  • 20. How do you scale your relational database to support tens of thousands of connections? Serverless applications open and close tens of thousands of connections within seconds Leads to longer query response times that limits application scalability A database proxy server are difficult to deploy, patch, and manage
  • 21. Amazon RDS Proxy Fully managed, highly available database proxy Supports new scale of serverless application connections Pools and shares database connections Preserve connections during database failovers Manages DB credentials with Secrets Manager and IAM Fully managed—No provisioning, patching, management RDS Proxy Applications RDS Database Instance Connection Pooling PREVIEW NEW
  • 22. Amazon RDS on AWS Outposts RDS MySQL, PostgreSQL, AWS Outposts Launch RDS in your data centers with AWS Outpost Integrate with on-premises databases and applications Deploy secure, managed, RDS in minutes Store data without moving to cloud Automates provisioning, patching, backup, restoring, scaling, and failover PREVIEW NEW
  • 23. Operational Analytics: Amazon Elasticsearch Service Fully managed, scalable, secure, Elasticsearch service Open source Elasticsearch APIs, Kibana, and Logstash Open-source Elasticsearch APIs Managed Kibana Integration with Logstash Scale clusters up/down via a single API call or a few clicks Secured network isolation with VPC, encrypt data at-rest and in-transit Compliant: HIPPA, PCI DSS, and ISO Scalable, secure, and compliant Pay only for what you use Cost-optimized workloads No upfront fee or usage requirement Critical features built-in: encryption, VPC support, 24x7 monitoring Fully managed Deploy Elasticsearch clusters in minutes: simplified hardware provisioning, software installation/patching, failure recovery, backups, and monitoring
  • 24. Challenges with analyzing high volumes of data in real-time Storing data is expensive at scale Limits the amount of data retained for analysis Miss out on valuable insights
  • 25. UltraWarm for Amazon Elasticsearch Service A new warm storage tier for Elasticsearch service Kibana Dashboard Amazon Elasticsearch Service domain Application Load Balancer Seamlessly extends Elasticsearch service Reduces cost by 90% to store the same amount of data Scale up to 3 PB of log data per cluster Analyze years of operational data Amazon S3 UltraWarm Node UltraWarm Node UltraWarm Node Active Master Node Queries PREVIEW NEW
  • 26. 010010010 01010001 100010100 Data 1 Break free from legacy databases Move to managed2 Turn data to insights5 Build data-driven apps 4 Modernize your data warehouse 3 The Data FlywheelModernizeyour datainfrastructure Getthemostvalue fromyourdata
  • 28. Most widely used Cloud Data Warehouse Tens of thousands of customers use Redshift & process over 2EB of data per day
  • 29. Robust result set caching Large # of tables support ~20000 Copy command support for ORC, Parquet IAM role chaining Elastic resize Groups Redshift Spectrum: date formats, scalar json and ION file formats support, region expansion, predicate filtering Auto analyze Health and performance monitoring w/Amazon Cloud watch Automatic table distribution style Cloud watch support for WLM queues Performance enhancements— hash join, vacuum, window functions, resize ops, aggregations, console, union all, efficient compile code cache Unload to CSV Auto WLM ~25 Query Monitoring Rules (QMR) support 200+ new features in the past 18 months AQUA Concurrency Scaling DC1 migration to DC2 Resiliency of ROLLBACK processing Manage multi-part query in AWS console Auto analyze for incremental changes on table Spectrum Request Accelerator Apply new distribution key Redshift Spectrum: Row group filtering in Parquet and ORC, Nested data support, Enhanced VPC Routing, Multiple partitions Faster Classic resize with optimized data transfer protocol Performance: Bloom filters in joins, complex queries that create internal table, communication layer Redshift Spectrum: Concurrency scaling Amazon Lake Formation integration Auto-Vacuum sort, Auto-Analyze and Auto Table Sort Auto WLM with query priorities Snapshot scheduler Performance: join pushdowns to subquery,, mixed workloads temporary tables, rank functions, null handling in join, single row insert Advisor recommendations for distribution keys AZ64 compression encoding Console redesign Stored procedures Spatial Processing Column level access control with AWS lake formation RA3 Performance of Inter- Region Snapshot Transfers Federated Query Materialized Views Manual Pause and Resume Amazon Redshift has been innovating quickly
  • 30. Amazon Redshift Materialized Views Defined by a SQL query, precomputed results, incrementally refreshed Orders-of-magnitude query acceleration Recommended for predictable and repeated queries used in dashboarding and interactive analysis C1 C2 C3 R1 R2 R3 C1 C2 C3 C4 R1 R2 R3 C1 R1 R2 R9 C1 C2 C3 R1 R2 R3 C1 R1 R7 R9 Materialized Views PREVIEW NEW
  • 31. Amazon Redshift Data Lake Export Export data directly to Amazon S3 in Apache Parquet Save results of data transformation into S3 data lake Export with the UNLOAD command and specify Parquet Redshift formats, partitions, and moves data into S3 Analyze with Amazon SageMaker, Athena, and EMR S3 Redshift NEW
  • 32. Amazon Redshift Federated Query Analyze data across data warehouse, data lakes, and operational database Query across multiple systems from Redshift Combine data warehouse and transactional data Compatible with Amazon RDS and Aurora (PostgreSQL) SQ L A M A Z O N R D S A M A Z O N A U R O R A A M A Z O N R E D S H I F T S 3 D A T A L A K E PREVIEW NEW
  • 33. How do you scale cost-effectively for diverse data warehouse workloads?
  • 34. Amazon Redshift on RA3 instances Optimize your data warehouse by paying for compute and storage separately Delivers 3x the performance of existing cloud DWs Automatically scales your DW storage capacity DS2 customers can migrate and get 2x performance and 2x storage for the same cost Supports workloads up to 8 PB (compressed) COMPUTE NODE (RA3) SSD Cache S3 STORAGE COMPUTE NODE (RA3) SSD Cache COMPUTE NODE (RA3) SSD Cache COMPUTE NODE (RA3) SSD Cache Managed storage $/node/hour $/TB/month GA NEW
  • 35. CE N TR A L SHA R E D STOR A G E D W C L U S T E R It’s hard to cost-effectively scale without compromising performance N E TWOR K D W C L U S T E R D W C L U S T E R 1 2 3
  • 36. TODAY CPU-DRAM THROUGHPUT STORAGE (SSDs) THROUGHPUT 2012 12x IMPROVEMENT 2x IMPROVEMENT Taking advantage of growing storage throughput Processing more data closer to SSD storage could dramatically change how fast large amounts of data can be processed over today’s existing systems.
  • 37. AQUA (Advanced Query Accelerator) for Amazon Redshift An innovative new hardware-accelerated cache that delivers up to 10x better query performance than other cloud data warehouses NVMe SSDs CUSTOM ANALYTICS PROCESSORS AWS NITRO SYSTEM COMING IN 2020 NEW
  • 38. AQUA – Advanced Query Accelerator Redshift runs 10x faster than any other cloud data warehouse without increasing cost AQUA brings compute to the storage layer so data doesn’t have to move back and forth High-speed cache on top of S3 scales out to process data in parallel across many nodes AWS custom-designed analytics processors accelerate data compression, encryption, and data processing 100% compatible with the current version of RedshiftS3 STORAGE AQUA ADVANCED QUERY ACCELERATOR RA3 COMPUTE CLUSTER COMING IN 2020 NEW
  • 39. Data warehousing: Amazon Redshift Best performance, most scalable 3x faster with RA3* 10x faster with AQUA* Adds unlimited compute capacity on-demand to meet unlimited concurrent access Lowest cost Cost-optimized workloads by paying compute and storage separately 1/10th cost of Traditional DW at $1000/TB/year Up to 75% less than other cloud data warehouses & predictable costs Data lake & AWS integration Analyze exabytes of data across data warehouse, data lakes, and operational database Query data across various analytics services Most secure & compliant AWS-grade security (eg. VPC, encryption with KMS, CloudTrail) All major certifications such as SOC, PCI, DSS, ISO, FedRAMP, HIPPA First and most popular cloud data warehouse *vs other cloud DWs
  • 40. 010010010 01010001 100010100 Data 1 Break free from legacy databases Move to managed2 Turn data to insights5 Build data-driven apps 4 Modernize your data warehouse 3 The Data FlywheelModernizeyour datainfrastructure Getthemostvalue fromyourdata
  • 41. Characteristics of modern applications Internet-scale and transactional Users: 1M+ Data volume: TB–PB–EB Locality: Global Performance: Milliseconds–microseconds Request Rate: Millions Access: Web, Mobile, IoT, devices Scale: Up-down, Out-in Economics: Pay for what you use Developer access: Instant API accessSocial mediaRide hailing Media streaming Dating
  • 42. Break complex apps into smaller pieces and pick the best tool to solve each problem This ensures that the apps are well architected and scale effectively Developers are now building highly distributed apps using purpose-built databases and micro-services architecture Developers are doing what they do best
  • 43. AWS Databases – for all your application use cases
  • 44. It’s challenging to manage large Cassandra clusters Specialized expertise to setup, configure, and maintain infrastructure and software Scaling clusters is time-consuming, manual, and prone to over-provisioning Manual backups and error-prone restore process to maintain integrity Unreliable upgrades with clunky rollback and debugging capabilities
  • 45. Amazon Managed (Apache) Cassandra Service Scalable, highly available, and managed Cassandra-compatible database service No need to provision, configure, and operate large Cassandra clusters or add and remove nodes manually No servers to manage Single-digit millisecond performance Scale tables up and down automatically based on application traffic Virtually unlimited throughput and storage Single-digit millisecond performance at scale Apache Cassandra-compatible Use the same application code, licensed drivers, and tools built on Cassandra Simple migration Simple migration to Managed Cassandra Service for Cassandra databases on premises or on EC2 PREVIEW NEW
  • 46. Common data categories and use cases
  • 47. 010010010 01010001 100010100 Data 1 Break free from legacy databases Move to managed2 Turn data to insights5 Build data-driven apps 4 Modernize your data warehouse 3 The Data FlywheelModernizeyour datainfrastructure Getthemostvalue fromyourdata
  • 48. Data silos to OLTP ERP CRM LOB DW Silo 1 Business Intelligence Devices Web Sensors Social DW Silo 2 Business Intelligence Machine learning BI + analytics Data warehousing Data lakes Open formats Central catalog Traditional data warehousing approaches don’t scale
  • 49. Customers moving to data lake architectures Bringing together the best of both worlds Extends or evolves DW architectures Store any data in any format Durable, available, and exabyte scale Secure, compliant, auditable Run any type of analytics from DW to Predictive Data Warehousing Analytics Machine Learning Data lake
  • 50. Any type of analytics on the data lake Data Warehousing Big Data Processing Interactive Query Operational Analytics Real-time Analytics Predictive Analytics RecommendationsVisualizations Data lake Data Exchange
  • 51. Any type of analytics on the data lake Most comprehensive analytics platform Amazon S3 | AWS Glue Lake Formation Data lake Amazon Redshift Amazon EMR Amazon Athena Amazon Elasticsearch Service Amazon Kinesis Amazon MSK Amazon SageMaker Amazon Personalize Amazon QuickSight AWS Data Exchange Data Warehousing Big Data Processing Interactive Query Operational Analytics Real time Analytics Predictive Analytics RecommendationsVisualizations Data Exchange
  • 52. Amazon EMR Easily Run Spark, Hadoop, Hive, Presto, HBase, and more big data apps on AWS Low cost 50–80% reduction in costs with EC2 Spot and Reserved Instances Per-second billing for flexibility Use S3 storage Process data in S3 securely with high performance using the EMRFS connector Latest versions Updated with latest open source frameworks within 30 days Fully managed no cluster setup, node provisioning, cluster tuning Easy
  • 53. Performance Improvements in Spark for Amazon EMR Performance optimized runtime for Apache Spark, 2.6x faster performance at 1/10th the cost *Based on TPC-DS 3 TB Benchmarking running 6 node C4x8 extra large clusters and EMR 5.28, Spark 2.4 10,164 16,478 26,478 0 5,000 10,000 15,000 20,000 25,000 30,000 Spark with EMR (with runtime) 3rd party Managed Spark (with their runtime) Spark with EMR (without runtime) Runtime total on 104 queries (seconds—lower is better) Runtime optimized for Apache Spark performance 100% compliant with Apache Spark APIs Best performance 2.6x faster than Spark with EMR without runtime 1.6x faster than 3rd party Managed Spark (with their runtime) Lowest price 1/10th the cost of 3rd party Managed Spark (with their runtime) NEW
  • 54. Amazon EMR on AWS Outposts Launch EMR in your data centers with AWS Outpost Integrate with existing on-premises Hadoop deployments Deploy secure, managed, EMR clusters in minutes Process and analyze data on-premises on AWS Outpost EMR Hadoop + Spark AWS Outposts On-premises Hadoop/Spark GA NEW
  • 55. Amazon Athena Pay per query Pay only for queries run Save 30–90% on per-query costs through compression Use S3 storage ANSI SQL JDBC/ODBC drivers Multiple formats, compression types, and complex joins and data types SQL Serverless: zero infrastructure, zero administration Integrated with QuickSight EasyQuery instantly Zero setup cost Point to S3 and start querying Serverless, interactive query service
  • 56. Amazon Athena Federated Query Run SQL queries on data spanning multiple data stores Redshift Data warehousing ElastiCache Redis Aurora MySQL, PostgreSQL DynamoDB Key value, Document DocumentDB Document S3/Glacier Run connectors in AWS Lambda: no servers to manage Run SQL queries on relational, non-relational, object, or custom data sources; in the cloud or on-premises Open Source Connectors for common data sources Build connectors to custom data sources PREVIEW NEW
  • 57. Amazon QuickSight First BI service built for the cloud with pay-per-session pricing & ML insights for everyone Elastic Scaling Auto-scale 10 to 10K+ users in minutes Pay-as-you-go Serverless Create dashboards in minutes Deploy globally without provisioning a single server Deeply integrated with AWS services Secure, Private access to AWS data Integrated S3 data lake permissions through AWS IAM API Support Programmatically onboard users and manage content Easily embed in your apps NEW
  • 58. ML predictions in Amazon QuickSight (preview) AWS/On-premise data sources • Excel • CSV • MySQL • PostgreSQL • Maria DB • Presto • Spark • SQL Server • Amazon Redshift • RDS • S3 • Athena • Aurora • EMR • Snowflake • Teradata • Salesforce • Square • Adobe Analytics • Jira • ServiceNow • Twitter • GitHub 1 Connect to any data: Data lakes, SQL engines, 3rd party applications and on- premises databases 2 Select an ML model: Create models with Amazon SageMaker AutoPilot, existing custom models and packaged models from AWS Marketplace. Custom Models QuickSight Amazon SageMaker AutoPilot Models AWS Marketplace 3 Visualize and share: Analyze results, create visualizations, build dashboards / email reports and share to business stakeholders NEW
  • 59. Build predictive dashboards in hours with point-and-click, no coding required
  • 60. Easily embed analytics in your own tools Powered by QuickSight APIs and flexible customization. Entirely serverless. Deploy and manage dashboards + data via APIs Match your application UI with QuickSight Themes Embed dashboards in apps without servers • Fast, consistent performance • Pay-per-session Automatically scale to 10s of 1000s of users • No server management • No scripting NEW
  • 61. Data exchange: AWS Data Exchange Easily find and subscribe to 3rd-party data in the cloud Efficiently access 3rd party data Simplifies access to data: No need to receive physical media, manage FTP credentials, or integrate with different APIs Minimize legal reviews and negotiations Quickly find diverse data in one place >1,000 data products >80 data providers including include Dow Jones, Change Healthcare, Foursquare, Dun & Bradstreet, Thomson Reuters, Pitney Bowes, Lexis Nexis, and Deloitte Easily analyze data Download or copy data to S3 Combine, analyze, and model with existing data Analyze data with EMR, Redshift, Athena, and AWS Glue GA NEW
  • 62. Our portfolio Broad and deep portfolio, purpose-built for builders S3/Glacier Glue ETL & Data Catalog Lake Formation Data Lakes Database Migration Service | Snowball | Snowmobile | Kinesis Data Firehose | Kinesis Data Streams | Managed Streaming for Kafka Data Movement Data Lake Business Intelligence & Machine Learning Data Exchange Data exchange NEW QuickSight Visualizations SageMaker ML Comprehend NLP Transcribe Speech-to-text Textract Extract text Personalize Recommendation Forecast Forecasts Translate Translation CodeGuru Code reviews Kendra Enterprise search NEW NEW RDS MySQL, PostgreSQL, MariaDB, Oracle, SQL Server, RDS on VMware Aurora MySQL, PostgreSQL DynamoDB Key value, Document ElastiCache Redis, Memcached Neptune Graph Timestream Time Series QLDB Ledger Database Analytics Databases Managed Blockchain Blockchain Templates Blockchain Managed Apache Cassandra Service Wide column NEW DocumentDB Document Redshift Data warehousing EMR Hadoop + Spark Kinesis Data Analytics Real time Elasticsearch Service Operational Analytics Athena Interactive analytics NEW NEW NEW NEW NEW AQUA EMR on Outposts UltraWarm RDS Proxy RDS on Outposts
  • 63. Thank you! © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Asif Abbasi – Sr. Specialist SA Analytics