Más contenido relacionado La actualidad más candente (20) Similar a Value of Data Beyond Analytics by Darin Briskman (20) Value of Data Beyond Analytics by Darin Briskman1. Analytics at Amazon
Darin Briskman
Product Manager
AWS Database, Analytics, Machine Learning, & Blockchain
Briskman@amazon.com
2. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Traditionally, analytics looked like this
Relational data
GBs-TBs scale [not designed for PB/EBs]
Expensive: Large initial capex + $10K-$50K/TB/year
90% of data was thrown away because of cost
OLTP ERP CRM LOB
Data Warehouse
Business Intelligence
3. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Our beliefs
1. The purpose of analytics is to help people make
better decisions
2. All data has value. No data should be thrown
away.
3. Everyone should have access to all data (subject to
access rules).
4. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Snowball
Snowmobile Kinesis
Data Firehose
Kinesis
Data Streams
S3
Redshift
EMR
Athena Kinesis
Elasticsearch Service
Data lakes on AWS
Kinesis
Video Streams
AI Services
QuickSight
Exabyte scale
Store and analyze relational and non-relational data
Purpose-built analytics tools
Cost effective
• Store at 2.3 cents per GB-month in Amazon S3
• Query with Amazon Athena at ½ cent per GB scanned
• DW with Amazon Redshift for $1,000/TB/year
Give access to everyone
• Amazon QuickSight: $0.30 for 30 minutes of use
5. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The Flywheel
6. CHALLENGE
Need to create constant feedback
loop for designers.
Gain up-to-the-minute
understanding of gamer
satisfaction to guarantee gamers
are engaged, resulting in the most
popular game played in the world.
Fortnite | 125+ million players
7. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Epic Games uses data lakes and analytics
Entire analytics platform running on AWS
Amazon S3 leveraged as a data lake
All telemetry data is collected with Amazon Kinesis
Real-time analytics done through Spark on Amazon EMR,
DynamoDB to create scoreboards and real-time queries
Use Amazon EMR for large batch data processing
Game designers use data to inform their decisions
Game
clients
Game
servers
Launcher
Game
services
N E A R R E A L T I M E P I P E L I N E
N E A R R E A L T I M E P I P E L I N E
Grafana
Scoreboards API
Limited raw data
(real time ad-hoc SQL)
User ETL
(metric definition)
Spark on EMR DynamoDB
NEAR REAL-TIME PIPELINES
BATCH PIPELINES
ETL using
EMR
Tableau/BI
Ad-hoc SQLS3
(Data lake)
Kinesis
APIs
Databases
S3
Other
sources
8. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
CHALLENGE
Needed to analyze data to find
insights, identify opportunities, and
evaluate business performance.
The Oracle DW did not scale, was
difficult to maintain, and costly.
SOLUTION
Deployed a data lake with Amazon S3,
and run analytics with Amazon
Redshift, Amazon Redshift Spectrum,
and Amazon EMR.
Result: They doubled the data stored
(100PB), lowered costs, and was able
to gain insights faster.
50 PB of data
600,000 analytics jobs/day
9. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Data Analytics
10. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What is the Goal?
To Provide an analytic ecosystem that Scales with the
Amazon Business
To Leverage AWS Technologies and to help Improve these
technologies for all Amazon Customers
To Provide Choice and Options in New Analytic Technologies
Provide an SQL based solution
Increasingly Focus on Enabling new analytic approaches
including Machine Learning and Programmatic Data Analysis
Enable both “Bring Your Own Cluster” and “Bring your Own
Query” Approaches
11. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
“Tools #2” by Juan Pablo Olmo. No alterations other than cropping. https://www.flickr.com/photos/juanpol/1562101472/
Image used with permissions under Creative Commons license 2.0, Attribution Generic License (https://creativecommons.org/licenses/by/2.0/)
12. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon EMR
(running Hive, Pig,
Spark, Presto, etc…)
Amazon DynamoDB
Amazon
Machine Learning
Amazon QuickSight
Amazon RDS
Amazon Elasticsearch
Service
Amazon Redshift Amazon Athena
Amazon SQS
Amazon Kinesis
Analytics
Amazon Kinesis
Firehose
Amazon S3
Amazon Kinesis
Open-source tools
(e.g. for ML, data science)
Commercial tools
13. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Moving Forward - AWS
S3 / EDX - Separate
Storage from Compute by
leveraging a parallel file
system as a global data
exchange
• Redshift - Preferred
platform SQL based
Analysis and traditional
Data Warehouse Data
• Focus is “Business Users”
• EMR – Scalable “Do
Everything” Platform - Enable
Teams who have chosen EMR
by providing Curated Data
• Focus is “Programattic Access”
Amazon
Redshift
14. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The Amazon “Data Lake” – Project Name “Andes”
The Goal: ”THE” Place for Data at Amazon
• Source teams (Data Producers) put their Public Data there to give access to Analytic
teams (Data Consumers) and to share private data within their team
• EMR Can Directly Access the Data in Parallel from Andes
• Redshift can load the data in Parallel from Andes, or it Can Directly Access the Data in
Parallel with Spectrum
15. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Putting The Pieces Together
The Analytic Architecture of the Future
Source
Systems
The Data Lake
“Andes”
Big Data Systems
Data Warehouses
“Bring Your Own Cluster” and
“Bring Your Own Query”
Services and Users
Postgre SQL
instance
Amazon
Redshift
Amazon
Redshift
Amazon
Redshift
Amazon
Kinesis
AWS Glue Amazon
QuickSight
Amazon
Athena
AmazonMachine
Learning
16. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Table Subscriptions - The Vision
17. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data Value Chain
Image credits: Icons from thenounproject.com: “Collect” icon by Ramesh; “Cloud Security” icon by Creative Stall; “Search” icon by
Dinosoft Labs;
“Shopping Cart” icon by Gregor Cresnar; “Cloud Upload Download” icon by naim; “Data science” icon by Becris
COLLECT STORE DELIVER ANALYZESUBSCRIBEDISCOVER
18. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Lake Formation
Build a secure data lake in days
Move, store, catalog, and
clean your data faster
Move, store, catalog,
and clean your data faster
with machine learning
Enforce security policies
across multiple services
Enforce security policies across
multiple services
Gain and manage new
insights
Empower analyst and data
scientist to gain and manage
new insights
19. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How it works
Data lakes and analytics on AWS
S3
IAM KMS
OLTP
ERP
CRM
LOB
Devices
Web
Sensors
Social Kinesis
Build data lakes quickly
• Identify, crawl, and catalog sources
• Ingest and clean data
• Transform into optimal formats
Simplify security management
• Enforce encryption
• Define access policies
• Implement audit login
Enable self-service and combined analytics
• Analysts discover all data available for analysis
from a single data catalog
• Use multiple analytics tools over the same data
Athena
Redshift
AI Services
EMR
QuickSight
Data
catalog
20. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How it works
21. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Glue—Serverless Data catalog & ETL service
Data Catalog
ETL Job
authoring
Discover data and
extract schema
Auto-generates
customizable ETL code
in Python and Spark
Automatically discovers data and stores schema
Data searchable, and available for ETL
Generates customizable code
Schedules and runs your ETL jobs
Serverless
22. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon EMR
Updated with the latest open
source frameworks within 30
days of release
Process data directly in the
S3 data lake securely with
high performance using the
EMRFS connector
Launch fully managed
Hadoop & Spark in minutes;
no cluster setup, node
provisioning, cluster tuning
Flexible billing with per-
second billing, EC2 spot,
reserved instances and
auto-scaling to reduce
costs 50–80%
Latest versions Use S3 storage EasyLow cost
T
h
e
p
i
c
t
u
r
e
c
a
n
'
t
b
e
d
i
s
p
l
a
Analytics and ML at scale
19 open-source projects: Apache Hadoop, Spark, HBase, Presto, and more
Enterprise-grade security
23. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Elasticsearch Service
Fully managed;
Deploy production-ready
clusters in minutes
Secure access with VPC to
keep all traffic within AWS
network
Zone awareness replicates
data between two AZs;
automatically monitors &
replaces failed nodes
Direct access to
Elasticsearch open-source
APIs; supports Logstash
and Kibana
Easy to Use Secure AvailableOpen
Easy to deploy, secure, operate, and scale Elasticsearch
Customers use Elasticsearch for log analytics, full-text search & application
monitoring
24. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Athena
Zero setup cost; just point to S3
and start querying
ANSI SQL interface,
JDBC/ODBC drivers, multiple
formats, compression types,
and complex joins and data
types
Serverless: zero
infrastructure, zero
administration
Integrated with QuickSight
Pay only for queries run;
save 30–90% on per-query
costs through compression
Query Instantly Open EasyPay per query
Interactive query service to analyze data in Amazon S3 using standard SQL
No infrastructure to set up or manage and no data to load
Ability to run SQL queries on data archived in Amazon Glacier
SQL
25. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon QuickSight
First BI service with pay-per-session pricing for everyone in your organization
Serverless, cloud-powered BI service (no servers to manage)
Scale from 10s of users to 100s of thousands of users
Pay only for what you use
• Readers: $0.30/30 min session with a $5/user/month max
• Authors: $18/month/Author
Integrates with S3, Athena, Redshift, RDS, Aurora, & EMR
26. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Directory Service
Microsoft AD
Custom Date Format Dashboard Save As Aggregate Calculations Readers Groups
Private VPC
25 GB SPICE
tables
Spark and Presto Connector Scheduled refresh Just In Time Provisioning One-click upgrade
Search Totals Excel Custom Range
100+
new features released since
launch
Federated SSO
Athena connector Export to CSV S3 Analytics
Week Aggregation Aurora PostgreSQL Calculations in SPICE
Cross Account
S3 Access
Aggregate Filters Hourly refresh
Row level security Hourly refresh
10K Filter Values On-screen controls
Redshift Spectrum
Support
KPI Chart
Spark Connector
AWS Directory Service
AD Connector
Tabular Reports Data labels
URL Actions
Combo Charts
Audit logging
with CloudTrail Geospatial maps Count Distinct Parameters Relative Date Filters Filter Groups
Table calculations Snowflake Connector SaaS Connectors Teradata Connector HIPAA PCI compliance
Amazon QuickSight has been innovating quickly
27. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon QuickSight—embedded dashboards
Supercharge your applications with embedded dashboards
Fully interactive with drill down, filtering, & external links
No servers to manage, no long-term commitments
Pay for usage with pay-per-session reader pricing
Easy embedding with JavaScript SDK
28. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Embedded NFL Next Gen Stats Dashboards
“With the Amazon QuickSight Readers and
pay-per-session pricing, we are able to
extend these secure, customized and easy
to use dashboards for each club without
having to provision servers or manage
infrastructure – all while only paying for
actual usage.”
Matt Swensson
Vice President, Emerging Products and Technology
Real-time stats for NFL games
Embedded in NFL Next Gen Stats Portal
Shared with 100s of users across NFL,
32 clubs and broadcast partners
30. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon QuickSight is used by customers at the largest scale
One of the world’s largest
metals and mining companies
deployed Amazon QuickSight
with its critical risk
management (CRM) solution
to ensure employee safety.
Thousands of employees
use its CRM globally.
Uses Amazon QuickSight
embedded in its Converge
Platform, a governance, risk,
and compliance healthcare
solution. Tens of thousands
of users across 900
healthcare organizations
use this platform.
Amazon.com is using
Amazon QuickSight
company-wide
31. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon QuickSight—ML Insights
Automated business insights powered by ML and natural language
ML-powered anomaly detection
ML-powered forecasting
Auto-narratives
32. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Discover all the hidden trends and
anomalies on millions of metrics
Amazon QuickSight—ML Insights
Example: anomaly detection
33. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
“Sales for office supplies in APAC
was 15% above expected.”
Amazon QuickSight—ML Insights
Example: anomaly detection
34. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
“SMB Segment was the top
contributor.”
Amazon QuickSight—ML Insights
Example: anomaly detection
35. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
“It’s significant because SMB
typically only accounts for 30% of
sales.”
Amazon QuickSight—ML Insights
Example: anomaly detection
36. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
QuickSight ML-powered forecasting Traditional BI forecasting
Captures seasonality and upward trends
Automatically excludes bad data
High confidence band
Captures only seasonality
Missing upward trend
Confidence band influenced by bad data
QuickSight ML Insights vs. traditional BI forecasting
VS.
37. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Insights in plain language narrative
Embedded within your dashboard
No more staring at dashboards for hours!
Fully customizable to meet every need
No coding needed. Easy-to-use UI templates.
Amazon QuickSight—ML Insights
Auto-narratives