SlideShare una empresa de Scribd logo
1 de 36
Descargar para leer sin conexión
S U M M I T
Hong Kong
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Building a Data Lake on AWS
Rahul Bhartia
Principal Big Data Architect - AWS
rbhartia@
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
There are more
people accessing data
And more
requirements for
making data available
Data Scientists
Analysts
Business Users
Applications
Secure Real time
Flexible Scalable
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
A
data lake
is a
centralized repository
that allows you to store
all your structured and unstructured data
at any scale
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Building a Data lake on AWS
Amazon S3
AWS
Glue
AWS
Snowball
AWS
DataSync
AWS Data
Migration
AWS Storage
Gateway
Amazon
Kinesis
Crawler
Job
Data
Catalog
AWS Lake
Formation
Amazon
Athena
Amazon
EMR
Amazon
QuickSight
Amazon
Redshift
Amazon
SageMaker
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Data lakes made serverless
Amazon
S3
AWS
Glue
Amazon
Athena
Amazon
QuickSight
Serverless. Zero
infrastructure. Zero
administration
Never pay for
idle resources
$
Availability and
fault tolerance
built in
Automatically scales
resources with
usage
Amazon
Kinesis
Amazon
Sagemaker
Devices Web Sensors Social
S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Amazon S3—Object Storage
Security and
compliance
• 3 different forms of
encryption at rest and
encryption in transit
• Log and monitor with
CloudTrail & discover and
protect data with Macie
Flexible management
• Classify, report, and
visualize data usage trends
• Use Tag for consumption,
cost, and security
• Build lifecycle policies to
automate tiering and
retention
Durability, availability
& scalability
• Built for eleven nine’s of
durability
• Data distributed across 3
facilities within a region;
• Global replication
capabilities
Query-in-Place
• Run analytics & ML on
without moving data
• Retrieve subset of
data, improving
performance by 400%
with S3 Select
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
AWS Glue Data Catalog
Unified metadata repository for data in
• Amazon S3
• Amazon DynamoDB
• Relational databases - Amazon RDS, Amazon Redshift
Query your data from Amazon Athena or Amazon
Redshift Spectrum or Amazon EMR
Augment technical metadata with business
metadata for tables
Schema evolution using versioning
Central and searchable
view of your data-assets
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Crawlers automatically build your Data Catalog and
keep it in sync.
Automatically discover new data, extracts schema
definitions
• Detect schema changes and version tables
• Detect Hive style partitions on Amazon S3
Built-in classifiers for popular types; custom classifiers
using Grok expression
Run ad hoc or on a schedule; serverless – only pay
when crawler runs
AWS Glue Crawlers
Automatically catalog your data
S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Enforce security policies
across multiple services
Gain and manage new
insights
Identify, ingest, clean, and
transform data
Build a secure data lake in days
AWS Lake Formation
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Easily load data to your data lake
Logs and Events
Databases
Amazon S3
Blueprints
AWS Lake Formation
Amazon
RDS
Amazon
Aurora
Amazon
Kinesis
Firehose
Amazon
CloudTrail
Full-load
Incremental
AWS Glue Data Catalog
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Blueprints build on AWS Glue
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Easily de-duplicate your data with ML transforms
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Secure once, access in multiple ways
Access Control
Admin
Amazon S3
AWS Lake
Formation
AWS Glue Data Catalog
Amazon
Athena
Amazon
EMR
Amazon
QuickSight
Amazon
Redshift
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Security permissions in AWS Lake Formation
Control data access with simple
grant and revoke permissions
Specify permissions on tables and
columns rather than on buckets and
objects
Easily view policies granted to a
particular user
Audit all data access at one place
S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Data movement from on-premises
AWS
Snowball
Petabyte and Exabyte-
scale data transport
solution that uses secure
appliances to transfer
large amounts of data
into and out of the AWS
cloud
AWS
DataSync
Automate moving data
between on-premises
and Amazon S3 using
Network File System
(NFS) protocol, at speeds
up to 10 times faster
than open-source tools.
AWS Storage
Gateway
Lets your on-premises
applications to use AWS
for storage; includes a
highly-optimized data
transfer mechanism,
bandwidth management,
along with local cache
AWS Database
Migration Service
Migrate database from
the most widely-used
commercial and open-
source offerings to AWS
quickly and securely with
minimal downtime to
applications
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Change Data Capture (CDC) to Amazon S3
AWS Database Migration
Service
Source
database
Crawlers Data catalogSnapshot
Data
AWS Glue
Amazon Athena
Amazon EMR
New!
• Support for Parquet
• Support for S3 encryption with KMS
Amazon Redshift
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Data movement in real-time
Amazon Kinesis
Video Streams
Securely stream video
from connected devices
to AWS for analytics,
machine learning (ML),
and other processing
Amazon Kinesis Data
Firehose
Capture, transform, and
load data streams into
AWS data stores for near
real-time analytics with
existing business
intelligence tools.
Amazon Kinesis Data
Streams
Build custom, real-time
applications that process
data streams using
popular stream
processing frameworks
Managed Streaming
For Kafka
Fully managed open-
source platform for
building real-time
streaming data pipelines
and applications.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Prefix: raw/life/year=!{timestamp:yyyy}/month=!{timestamp:MM}/day=!{timestamp:dd}/
Buffer: Up to 128MB or 15 minutes
Real-time events to Amazon S3
Kinesis Data
Streams
Kinesis Data
Firehose
Lambda
Transformation
Aggregated
JSON Data
Aggregated
Parquet Data
Amazon Athena
Crawlers
Save as Parquet
Data Catalog
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
AWS Glue ETL
New!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Transforming data using AWS Glue
Amazon S3
(Raw data)
Amazon S3
(Processed data)
AWS Glue
Data Catalog
AWS Glue
Crawler
AWS Glue
Crawler
AWS Glue
job
Lambda
Function
Amazon S3
(Enriched data)
AWS Glue
Crawler
AWS Glue
job
File Put
Event
Trigger
S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Analytics
Amazon
QuickSight
Pay only for what you
use; Scale to tens of
thousands of users;
Embedded analytics;
Build end-to-end BI
solutions
Amazon
Athena
Amazon
EMR
Flexible, open source
choice for Hadoop and
Spark; Lower cost than
on-premises with
autoscaling; Security with
Encryption, Authentication
and Authorization
Amazon
Redshift
Cost-effective and up to 10x
faster than traditional data
warehouses; Easy to setup,
deploy and manage; Scale
on-demand for large data
volume and high query
concurrency
Run interactive queries to
easily analyze data in
Amazon S3 using standard
SQL; No infrastructure to
set up or manage and no
data to load
New!
Workgroups Multi-master Concurrency scaling ML Insights
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Automated reports using Amazon Athena
athena.startQueryExecution("SELECT * FROM business_view”)
SNS
Queue
1
2
3 4
Email
notification
5 1. Schedule query
2. Track QueryID for status
3. Query results to Amazon S3
4. New file trigger
5. Job complete notification
Lambda
Function
Athena
Query
S3
Bucket
Lambda
Function
SNS
Topic
DynamoDB
Table
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Athena Workgroups
Isolation Metrics Cost Controls Tags
Use tags to categorize
your AWS resources in
different ways, for
instance by purpose,
owner, or environment.
Build dashboards and
alerting based on
Workgroup metrics are
published to Cloudwatch
Define per query data
scanned threshold; Any query
exceeding that will be
cancelled; Trigger alarms to
notify of increasing usage
and cost
Unique query output
location per Workgroup
Encrypt results with
unique AWS KMS key per
Workgroup
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Amazon EMR Notebooks
EMR Cluster
AWS Management
Console
EMR-managed Jupyter
notebook
Users
S3 bucket
Auto-save
Amazon S3
AWS Glue Data Catalog
SageMaker
Athena
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Amazon EMR – High Availability
Livy
Zookeeper
HiveServer2
Yarn RM
HDFS NameNode
Livy
Zookeeper
HiveServer2
Yarn RM
HDFS NameNode
Livy
Zookeeper
HiveServer2
Yarn RM
Master Node 1 Master Node 2 Master Node 3
EMR Cluster
Active
Standby
Standby
Active
Active
Active
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Data-warehousing with Amazon Redshift
AWS Database Migration
Service
Database
Crawlers
Data catalog
Amazon Kinesis
Firehose
Amazon Redshift
Files
Events
Save as Parquet
Upload to S3
Redshift Spectrum
CDC Replication
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Unload
to Parquet
Amazon Redshift
N E W !
New features
Speed
Scale
WLM
ConcurrencySimplicity
Amazon Lake
Formation integrationSecurity
Auto-Vacuum
& Analyze
Auto Data
Distribution
Deferred
Maintenance
Snapshot
Scheduler
Spectrum
Request
Accelerator
10x average
performance
improvement
Elastic
resize
Concurrency
Scaling
N E W !
N E W !N E W !
C O M I N G S O O N
C O M I N G S O O N C O M I N G S O O N
Improving
short query
acceleration
C O M I N G S O O N C O M I N G S O O N
Stored
procedures
N E W !
N E W !
N E W !
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
ML Insights with Amazon QuickSight
ML Anomaly
detection
ML Forecasting
Auto Narratives
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
AI & Machine Learning
AI services that enable
developers to plug-in pre-built
AI functionality into their apps
ML services that make it easy for
any developer to get started and
get deep with ML
Frameworks and interfaces for
machine learning practitioners
Amazon S3
Raw Data
Initial training data
is annotated by
human labelers
Active learning model
is trained from human
labeled data
Ambiguous data is sent to human
labelers for annotation
Human labeled data is then sent
back to retrain and improve the
machine learning model
Training data the
model understands is
labeled automatically
An accurate training data
set is ready for use in
Amazon SageMaker
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Amazon SageMaker
Frameworks Interfaces
EC2 P3
& P3dn
EC2 C5 FPGASs GreenGrass Elastic
Inference
AI & Machine Learning
AI Services
Frameworks & Infrastructure
Rekognition
Image
Polly
Transcribe
Translate Comprehend
& Comprehend Medical
Rekognition
Video
Textract
Forecast PersonalizeLex
Vision Speech ChatbotsLanguage Forecasting Recommendations
Infrastructure
Pre-built algorithms & notebooks
Data labeling (Ground Truth)
One-click model training & tuning
Optimization (NEO)
One-click deployment & hosting
Reinforcement learningAlgorithms & models (AWS Marketplace for ML)
Train DeployBuild
ML Services
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Thank you!
S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Más contenido relacionado

La actualidad más candente

ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...Amazon Web Services
 
Building a Modern Data Platform on AWS
Building a Modern Data Platform on AWSBuilding a Modern Data Platform on AWS
Building a Modern Data Platform on AWSAmazon Web Services
 
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Amazon Web Services
 
Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018
Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018
Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018Amazon Web Services
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluentconfluent
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWSGary Stafford
 
Introduction to AWS Cost Management
Introduction to AWS Cost ManagementIntroduction to AWS Cost Management
Introduction to AWS Cost ManagementAmazon Web Services
 
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017Amazon Web Services
 
글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)
글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)
글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)Amazon Web Services Korea
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaAmazon Web Services
 
Best Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWSBest Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWSAmazon Web Services
 
Getting Started with Amazon Kinesis
Getting Started with Amazon KinesisGetting Started with Amazon Kinesis
Getting Started with Amazon KinesisAmazon Web Services
 

La actualidad más candente (20)

ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...
 
Building a Modern Data Platform on AWS
Building a Modern Data Platform on AWSBuilding a Modern Data Platform on AWS
Building a Modern Data Platform on AWS
 
AWS Big Data Platform
AWS Big Data PlatformAWS Big Data Platform
AWS Big Data Platform
 
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
 
Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018
Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018
Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
AWS-Data-Migration-module3
AWS-Data-Migration-module3AWS-Data-Migration-module3
AWS-Data-Migration-module3
 
AWS 101
AWS 101AWS 101
AWS 101
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
 
Introduction to AWS Glue
Introduction to AWS GlueIntroduction to AWS Glue
Introduction to AWS Glue
 
Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
 
Big Data and Analytics on AWS
Big Data and Analytics on AWS Big Data and Analytics on AWS
Big Data and Analytics on AWS
 
Introduction to AWS Cost Management
Introduction to AWS Cost ManagementIntroduction to AWS Cost Management
Introduction to AWS Cost Management
 
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
 
글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)
글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)
글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
 
Best Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWSBest Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWS
 
Cost Optimization on AWS
Cost Optimization on AWSCost Optimization on AWS
Cost Optimization on AWS
 
Cloud Migration Workshop
Cloud Migration WorkshopCloud Migration Workshop
Cloud Migration Workshop
 
Getting Started with Amazon Kinesis
Getting Started with Amazon KinesisGetting Started with Amazon Kinesis
Getting Started with Amazon Kinesis
 

Similar a Building-a-Data-Lake-on-AWS

Building-Serverless-Analytics-On-AWS
Building-Serverless-Analytics-On-AWSBuilding-Serverless-Analytics-On-AWS
Building-Serverless-Analytics-On-AWSAmazon Web Services
 
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWSAWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWSSteven Hsieh
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaAmazon Web Services
 
Building a Modern Data Platform in the Cloud
Building a Modern Data Platform in the CloudBuilding a Modern Data Platform in the Cloud
Building a Modern Data Platform in the CloudAmazon Web Services
 
Building a modern data platform in AWS
Building a modern data platform in AWSBuilding a modern data platform in AWS
Building a modern data platform in AWSAmazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019javier ramirez
 
Build your own log analytics solution on AWS - ADB301 - Atlanta AWS Summit
Build your own log analytics solution on AWS - ADB301 - Atlanta AWS SummitBuild your own log analytics solution on AWS - ADB301 - Atlanta AWS Summit
Build your own log analytics solution on AWS - ADB301 - Atlanta AWS SummitAmazon Web Services
 
Stream processing and managing real-time data
Stream processing and managing real-time dataStream processing and managing real-time data
Stream processing and managing real-time dataAmazon Web Services
 
在 AWS 上構建無服務器分析
在 AWS 上構建無服務器分析在 AWS 上構建無服務器分析
在 AWS 上構建無服務器分析Amazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
Best Practices for Migrating Databases to the Cloud - AWS Summit Sydney
Best Practices for Migrating Databases to the Cloud - AWS Summit SydneyBest Practices for Migrating Databases to the Cloud - AWS Summit Sydney
Best Practices for Migrating Databases to the Cloud - AWS Summit SydneyAmazon Web Services
 
Build a dashboard using serverless security analytics - SDD201 - AWS re:Infor...
Build a dashboard using serverless security analytics - SDD201 - AWS re:Infor...Build a dashboard using serverless security analytics - SDD201 - AWS re:Infor...
Build a dashboard using serverless security analytics - SDD201 - AWS re:Infor...Amazon Web Services
 
Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...
Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...
Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...Amazon Web Services
 
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019Amazon Web Services
 
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Summits
 

Similar a Building-a-Data-Lake-on-AWS (20)

Building-Serverless-Analytics-On-AWS
Building-Serverless-Analytics-On-AWSBuilding-Serverless-Analytics-On-AWS
Building-Serverless-Analytics-On-AWS
 
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWSAWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
 
Building a Modern Data Platform in the Cloud
Building a Modern Data Platform in the CloudBuilding a Modern Data Platform in the Cloud
Building a Modern Data Platform in the Cloud
 
Data_Analytics_and_AI_ML
Data_Analytics_and_AI_MLData_Analytics_and_AI_ML
Data_Analytics_and_AI_ML
 
Building a modern data platform in AWS
Building a modern data platform in AWSBuilding a modern data platform in AWS
Building a modern data platform in AWS
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Modern Data Platform on AWS
Modern Data Platform on AWSModern Data Platform on AWS
Modern Data Platform on AWS
 
Construindo data lakes e analytics com AWS
Construindo data lakes e analytics com AWSConstruindo data lakes e analytics com AWS
Construindo data lakes e analytics com AWS
 
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019
 
Build your own log analytics solution on AWS - ADB301 - Atlanta AWS Summit
Build your own log analytics solution on AWS - ADB301 - Atlanta AWS SummitBuild your own log analytics solution on AWS - ADB301 - Atlanta AWS Summit
Build your own log analytics solution on AWS - ADB301 - Atlanta AWS Summit
 
Stream processing and managing real-time data
Stream processing and managing real-time dataStream processing and managing real-time data
Stream processing and managing real-time data
 
在 AWS 上構建無服務器分析
在 AWS 上構建無服務器分析在 AWS 上構建無服務器分析
在 AWS 上構建無服務器分析
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Best Practices for Migrating Databases to the Cloud - AWS Summit Sydney
Best Practices for Migrating Databases to the Cloud - AWS Summit SydneyBest Practices for Migrating Databases to the Cloud - AWS Summit Sydney
Best Practices for Migrating Databases to the Cloud - AWS Summit Sydney
 
Build a dashboard using serverless security analytics - SDD201 - AWS re:Infor...
Build a dashboard using serverless security analytics - SDD201 - AWS re:Infor...Build a dashboard using serverless security analytics - SDD201 - AWS re:Infor...
Build a dashboard using serverless security analytics - SDD201 - AWS re:Infor...
 
Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...
Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...
Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...
 
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
 
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
 

Más de Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Más de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Building-a-Data-Lake-on-AWS

  • 1. S U M M I T Hong Kong
  • 2. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Building a Data Lake on AWS Rahul Bhartia Principal Big Data Architect - AWS rbhartia@
  • 3. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T There are more people accessing data And more requirements for making data available Data Scientists Analysts Business Users Applications Secure Real time Flexible Scalable
  • 4. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T A data lake is a centralized repository that allows you to store all your structured and unstructured data at any scale
  • 5. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Building a Data lake on AWS Amazon S3 AWS Glue AWS Snowball AWS DataSync AWS Data Migration AWS Storage Gateway Amazon Kinesis Crawler Job Data Catalog AWS Lake Formation Amazon Athena Amazon EMR Amazon QuickSight Amazon Redshift Amazon SageMaker
  • 6. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Data lakes made serverless Amazon S3 AWS Glue Amazon Athena Amazon QuickSight Serverless. Zero infrastructure. Zero administration Never pay for idle resources $ Availability and fault tolerance built in Automatically scales resources with usage Amazon Kinesis Amazon Sagemaker Devices Web Sensors Social
  • 7. S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 8. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Amazon S3—Object Storage Security and compliance • 3 different forms of encryption at rest and encryption in transit • Log and monitor with CloudTrail & discover and protect data with Macie Flexible management • Classify, report, and visualize data usage trends • Use Tag for consumption, cost, and security • Build lifecycle policies to automate tiering and retention Durability, availability & scalability • Built for eleven nine’s of durability • Data distributed across 3 facilities within a region; • Global replication capabilities Query-in-Place • Run analytics & ML on without moving data • Retrieve subset of data, improving performance by 400% with S3 Select
  • 9. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T AWS Glue Data Catalog Unified metadata repository for data in • Amazon S3 • Amazon DynamoDB • Relational databases - Amazon RDS, Amazon Redshift Query your data from Amazon Athena or Amazon Redshift Spectrum or Amazon EMR Augment technical metadata with business metadata for tables Schema evolution using versioning Central and searchable view of your data-assets
  • 10. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Crawlers automatically build your Data Catalog and keep it in sync. Automatically discover new data, extracts schema definitions • Detect schema changes and version tables • Detect Hive style partitions on Amazon S3 Built-in classifiers for popular types; custom classifiers using Grok expression Run ad hoc or on a schedule; serverless – only pay when crawler runs AWS Glue Crawlers Automatically catalog your data
  • 11. S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Enforce security policies across multiple services Gain and manage new insights Identify, ingest, clean, and transform data Build a secure data lake in days AWS Lake Formation
  • 12. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Easily load data to your data lake Logs and Events Databases Amazon S3 Blueprints AWS Lake Formation Amazon RDS Amazon Aurora Amazon Kinesis Firehose Amazon CloudTrail Full-load Incremental AWS Glue Data Catalog
  • 13. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Blueprints build on AWS Glue
  • 14. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Easily de-duplicate your data with ML transforms
  • 15. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Secure once, access in multiple ways Access Control Admin Amazon S3 AWS Lake Formation AWS Glue Data Catalog Amazon Athena Amazon EMR Amazon QuickSight Amazon Redshift
  • 16. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Security permissions in AWS Lake Formation Control data access with simple grant and revoke permissions Specify permissions on tables and columns rather than on buckets and objects Easily view policies granted to a particular user Audit all data access at one place
  • 17. S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 18. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Data movement from on-premises AWS Snowball Petabyte and Exabyte- scale data transport solution that uses secure appliances to transfer large amounts of data into and out of the AWS cloud AWS DataSync Automate moving data between on-premises and Amazon S3 using Network File System (NFS) protocol, at speeds up to 10 times faster than open-source tools. AWS Storage Gateway Lets your on-premises applications to use AWS for storage; includes a highly-optimized data transfer mechanism, bandwidth management, along with local cache AWS Database Migration Service Migrate database from the most widely-used commercial and open- source offerings to AWS quickly and securely with minimal downtime to applications
  • 19. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Change Data Capture (CDC) to Amazon S3 AWS Database Migration Service Source database Crawlers Data catalogSnapshot Data AWS Glue Amazon Athena Amazon EMR New! • Support for Parquet • Support for S3 encryption with KMS Amazon Redshift
  • 20. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Data movement in real-time Amazon Kinesis Video Streams Securely stream video from connected devices to AWS for analytics, machine learning (ML), and other processing Amazon Kinesis Data Firehose Capture, transform, and load data streams into AWS data stores for near real-time analytics with existing business intelligence tools. Amazon Kinesis Data Streams Build custom, real-time applications that process data streams using popular stream processing frameworks Managed Streaming For Kafka Fully managed open- source platform for building real-time streaming data pipelines and applications.
  • 21. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Prefix: raw/life/year=!{timestamp:yyyy}/month=!{timestamp:MM}/day=!{timestamp:dd}/ Buffer: Up to 128MB or 15 minutes Real-time events to Amazon S3 Kinesis Data Streams Kinesis Data Firehose Lambda Transformation Aggregated JSON Data Aggregated Parquet Data Amazon Athena Crawlers Save as Parquet Data Catalog
  • 22. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T AWS Glue ETL New!
  • 23. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Transforming data using AWS Glue Amazon S3 (Raw data) Amazon S3 (Processed data) AWS Glue Data Catalog AWS Glue Crawler AWS Glue Crawler AWS Glue job Lambda Function Amazon S3 (Enriched data) AWS Glue Crawler AWS Glue job File Put Event Trigger
  • 24. S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 25. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Analytics Amazon QuickSight Pay only for what you use; Scale to tens of thousands of users; Embedded analytics; Build end-to-end BI solutions Amazon Athena Amazon EMR Flexible, open source choice for Hadoop and Spark; Lower cost than on-premises with autoscaling; Security with Encryption, Authentication and Authorization Amazon Redshift Cost-effective and up to 10x faster than traditional data warehouses; Easy to setup, deploy and manage; Scale on-demand for large data volume and high query concurrency Run interactive queries to easily analyze data in Amazon S3 using standard SQL; No infrastructure to set up or manage and no data to load New! Workgroups Multi-master Concurrency scaling ML Insights
  • 26. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Automated reports using Amazon Athena athena.startQueryExecution("SELECT * FROM business_view”) SNS Queue 1 2 3 4 Email notification 5 1. Schedule query 2. Track QueryID for status 3. Query results to Amazon S3 4. New file trigger 5. Job complete notification Lambda Function Athena Query S3 Bucket Lambda Function SNS Topic DynamoDB Table
  • 27. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Athena Workgroups Isolation Metrics Cost Controls Tags Use tags to categorize your AWS resources in different ways, for instance by purpose, owner, or environment. Build dashboards and alerting based on Workgroup metrics are published to Cloudwatch Define per query data scanned threshold; Any query exceeding that will be cancelled; Trigger alarms to notify of increasing usage and cost Unique query output location per Workgroup Encrypt results with unique AWS KMS key per Workgroup
  • 28. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Amazon EMR Notebooks EMR Cluster AWS Management Console EMR-managed Jupyter notebook Users S3 bucket Auto-save Amazon S3 AWS Glue Data Catalog SageMaker Athena
  • 29. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Amazon EMR – High Availability Livy Zookeeper HiveServer2 Yarn RM HDFS NameNode Livy Zookeeper HiveServer2 Yarn RM HDFS NameNode Livy Zookeeper HiveServer2 Yarn RM Master Node 1 Master Node 2 Master Node 3 EMR Cluster Active Standby Standby Active Active Active
  • 30. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Data-warehousing with Amazon Redshift AWS Database Migration Service Database Crawlers Data catalog Amazon Kinesis Firehose Amazon Redshift Files Events Save as Parquet Upload to S3 Redshift Spectrum CDC Replication
  • 31. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Unload to Parquet Amazon Redshift N E W ! New features Speed Scale WLM ConcurrencySimplicity Amazon Lake Formation integrationSecurity Auto-Vacuum & Analyze Auto Data Distribution Deferred Maintenance Snapshot Scheduler Spectrum Request Accelerator 10x average performance improvement Elastic resize Concurrency Scaling N E W ! N E W !N E W ! C O M I N G S O O N C O M I N G S O O N C O M I N G S O O N Improving short query acceleration C O M I N G S O O N C O M I N G S O O N Stored procedures N E W ! N E W ! N E W !
  • 32. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T ML Insights with Amazon QuickSight ML Anomaly detection ML Forecasting Auto Narratives
  • 33. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T AI & Machine Learning AI services that enable developers to plug-in pre-built AI functionality into their apps ML services that make it easy for any developer to get started and get deep with ML Frameworks and interfaces for machine learning practitioners Amazon S3 Raw Data Initial training data is annotated by human labelers Active learning model is trained from human labeled data Ambiguous data is sent to human labelers for annotation Human labeled data is then sent back to retrain and improve the machine learning model Training data the model understands is labeled automatically An accurate training data set is ready for use in Amazon SageMaker
  • 34. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Amazon SageMaker Frameworks Interfaces EC2 P3 & P3dn EC2 C5 FPGASs GreenGrass Elastic Inference AI & Machine Learning AI Services Frameworks & Infrastructure Rekognition Image Polly Transcribe Translate Comprehend & Comprehend Medical Rekognition Video Textract Forecast PersonalizeLex Vision Speech ChatbotsLanguage Forecasting Recommendations Infrastructure Pre-built algorithms & notebooks Data labeling (Ground Truth) One-click model training & tuning Optimization (NEO) One-click deployment & hosting Reinforcement learningAlgorithms & models (AWS Marketplace for ML) Train DeployBuild ML Services
  • 35. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Thank you! S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 36. S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.