SlideShare a Scribd company logo
1 of 36
Download to read offline
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data
every 5 years
There is more data
than people think
15
years
live for
Data platforms need to
1,000x
scale
>10x
grows
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
There are more
people accessing data
And more
requirements for
making data available
Data Scientists
Analysts
Business Users
Applications
Secure Real time
Flexible Scalable
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
A data lake is a centralized repository that allows
you to store all your structured and unstructured
data at any scale
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Why data lakes?
Data Lakes provide:
Relational and non-relational data
Scale-out to EBs
Diverse set of analytics and machine learning tools
Work on data without any data movement
Designed for low cost storage and analytics
OLTP ERP CRM LOB
Data Warehouse
Business
Intelligence
Data Lake
100110000100101011100
101010111001010100001
011111011010
0011110010110010110
0100011000010
Devices Web Sensors Social
Catalog
Machine
Learning
DW
Queries
Big data
processing
Interactive Real-time
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon S3 | AWS Glue
Any analytic
workload, any scale, at
the lowest possible cost
AWS Direct Connect
AWS Snowball
AWS Snowmobile
AWS Database Migration Service
AWS IoT Core
Amazon Kinesis Data Firehose
Amazon Kinesis Data Streams
Amazon Kinesis Video Streams
On-premises
Data Movement
Amazon SageMaker
AWS Deep Learning AMIs
Amazon Rekognition
Amazon Lex
AWS DeepLens
Amazon Comprehend
Amazon Translate
Amazon Transcribe
Amazon Polly
Amazon Athena
Amazon EMR
Amazon Redshift
Amazon Elasticsearch Service
Amazon Kinesis
Amazon QuickSight
Analytics
Machine Learning
Real-time
Data Movement
Data Lake
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
More data lakes & analytics on AWS than
anywhere else
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Typical steps of building a data lake
Setup storage1
Move data2
Cleanse, prep,
and catalog data
3
Configure and enforce
security and compliance
policies
4
Make data available
for analytics
5
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Building data lakes can still take months
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data preparation accounts for ~80% of the
work
Building training sets
Cleaning and organizing data
Collecting data sets
Mining data for patterns
Refining algorithms
Other
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Sample of steps required Find sources
Create Amazon Simple Storage Service (Amazon S3) locations
Configure access policies
Map tables to Amazon S3 locations
ETL jobs to load and clean data
Create metadata access policies
Configure access from analytics services
Rinse and repeat for other:
data sets, users, and end-services
And more:
manage and monitor ETL jobs
update metadata catalog as data changes
update policies across services as users and permissions change
manually maintain cleansing scripts
create audit processes for compliance
…
Manual | Error-prone | Time consuming
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Enforce security policies
across multiple services
Gain and manage new
insights
Identify, ingest, clean, and
transform data
Build a secure data lake in days
AWS Lake Formation
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How it works
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Register existing data or import new
Amazon S3 forms the storage layer for
Lake Formation
Register existing S3 buckets that
contain your data
Ask Lake Formation to create required
S3 buckets and import data into them
Data is stored in your account. You
have direct access to it. No lock-in.
Data Lake Storage
Data
Catalog
Access
Control
Data import
Lake Formation
Crawlers ML-based
data prep
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Easily load data to your data lake
logs
DBs
Blueprints
Data Lake Storage
Data
Catalog
Access
Control
Data import
Lake Formation
Crawlers ML-based
data prep
one-shot
incremental
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
With blueprints
You
1. Point us to the source
2. Tell us the location to load to
in your data lake
3. Specify how often you want to
load the data
Blueprints
1. Discover the source table(s)
schema
2. Automatically convert to the
target data format
3. Automatically partition the
data based on the partitioning
schema
4. Keep track of data that was
already processed
5. You can customize any of the
above
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Blueprints build on AWS Glue
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Easily de-duplicate your data with
ML transformations
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Secure once, access in multiple ways
Data Lake Storage
Data
Catalog
Access
Control
Lake Formation
Admin
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Security permissions in Lake Formation
Control data access with simple
grant and revoke permissions
Specify permissions on tables and
columns rather than on buckets
and objects
Easily view policies granted to a
particular user
Audit all data access at one place
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Security permissions in Lake Formation
Search and view permissions
granted to a user, role, or group in
one place
Verify permissions granted to a
user
Easily revoke policies for a user
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Grant table and column-level permissions
User 1
User 2
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Security – deep dive
User
IAM users, roles
Active Directory Amazon S3
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Search and collaborate across multiple users
Text-based, faceted search
across all metadata
Add attributes like Data
owners, stewards, and other as
table properties
Add data sensitivity level,
column definitions, and others
as column properties
Text-based search and filtering
Query data in Amazon Athena
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Audit and monitor in real time
See detailed alerts in the
console
Download audit logs for
further analytics
Data ingest and catalog
notifications also published to
Amazon CloudWatch events
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Example: a data lake in 3 easy steps
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Step 1: Blueprints to ingest data
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Monitor the import
1
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Imported data as table in the data lake
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Step 2: Grant permissions to securely
share data
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Step 3: Run query in Amazon Athena
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Lake Formation Pricing
No additional charges – Only pay for the
underlying services used.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Customer interest
“We are very excited about the launch of AWS Lake
Formation, which provides a central point of control to
easily load, clean, secure, and catalog data from thousands of
clients to our AWS-based data lake, dramatically reducing
our operational load. … Additionally, AWS Lake Formation
will be HIPAA compliant from day one …”
Aaron Symanski, CTO, Change Healthcare
“I can’t wait for my team to get our hands on AWS Lake
Formation. With an enterprise-ready option like Lake
Formation, we will be able to spend more time deriving
value from our data rather than doing the heavy lifting
involved in manually setting up and managing our data lake.”
Joshua Couch, VP Engineering, Fender Digital
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
https://aws.amazon.com/es/lake-formation/
https://aws.amazon.com/es/lake-formation/features/
https://aws.amazon.com/es/lake-formation/faqs/
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Ricardo Serafim
Analytics Specialist Solutions Architect
rserafim@amazon.com

More Related Content

What's hot

AWS Technical Day Riyadh Nov 2019 - Scaling threat detection and response in aws
AWS Technical Day Riyadh Nov 2019 - Scaling threat detection and response in awsAWS Technical Day Riyadh Nov 2019 - Scaling threat detection and response in aws
AWS Technical Day Riyadh Nov 2019 - Scaling threat detection and response in awsAWS Riyadh User Group
 
Building Data Lakes for Analytics on AWS
Building Data Lakes for Analytics on AWSBuilding Data Lakes for Analytics on AWS
Building Data Lakes for Analytics on AWSAmazon Web Services
 
Introduction to the Security Perspective of the Cloud Adoption Framework
Introduction to the Security Perspective of the Cloud Adoption FrameworkIntroduction to the Security Perspective of the Cloud Adoption Framework
Introduction to the Security Perspective of the Cloud Adoption FrameworkAmazon Web Services
 
Migrate & Modernize your legacy Microsoft applications with AWS
Migrate & Modernize your legacy Microsoft applications with AWSMigrate & Modernize your legacy Microsoft applications with AWS
Migrate & Modernize your legacy Microsoft applications with AWSAmazon Web Services
 
Databases-on-AWS-Purpose-built-databases,-the-right-tool-for-the-right-job
Databases-on-AWS-Purpose-built-databases,-the-right-tool-for-the-right-jobDatabases-on-AWS-Purpose-built-databases,-the-right-tool-for-the-right-job
Databases-on-AWS-Purpose-built-databases,-the-right-tool-for-the-right-jobAmazon Web Services
 
Serverless Extract-transform-load (ETL) on AWS Webinar
Serverless Extract-transform-load (ETL) on AWS WebinarServerless Extract-transform-load (ETL) on AWS Webinar
Serverless Extract-transform-load (ETL) on AWS WebinarAmazon Web Services
 
End User Collaboration on AWS - AWS Online Tech Talks
End User Collaboration on AWS - AWS Online Tech TalksEnd User Collaboration on AWS - AWS Online Tech Talks
End User Collaboration on AWS - AWS Online Tech TalksAmazon Web Services
 
Starting your Cloud Transformation Journey - Tel Aviv Summit 2018
Starting your Cloud Transformation Journey - Tel Aviv Summit 2018Starting your Cloud Transformation Journey - Tel Aviv Summit 2018
Starting your Cloud Transformation Journey - Tel Aviv Summit 2018Boaz Ziniman
 
Managed Relational Databases - Amazon RDS
Managed Relational Databases - Amazon RDSManaged Relational Databases - Amazon RDS
Managed Relational Databases - Amazon RDSAmazon Web Services
 
Enabling New Retail Customer Experiences with Big Data - AWS Online Tech Talks
Enabling New Retail Customer Experiences with Big Data - AWS Online Tech TalksEnabling New Retail Customer Experiences with Big Data - AWS Online Tech Talks
Enabling New Retail Customer Experiences with Big Data - AWS Online Tech TalksAmazon Web Services
 
BI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWSBI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWSAmazon Web Services
 
AWS Technical Day Riyadh Nov 2019 [Migration]
AWS Technical Day Riyadh Nov 2019 [Migration]AWS Technical Day Riyadh Nov 2019 [Migration]
AWS Technical Day Riyadh Nov 2019 [Migration]AWS Riyadh User Group
 
Migrate & Optimize Microsoft Applications on AWS
Migrate & Optimize Microsoft Applications on AWSMigrate & Optimize Microsoft Applications on AWS
Migrate & Optimize Microsoft Applications on AWSAmazon Web Services
 
Preparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/MLPreparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/MLAmazon Web Services
 

What's hot (20)

Construindo data lakes e analytics com AWS
Construindo data lakes e analytics com AWSConstruindo data lakes e analytics com AWS
Construindo data lakes e analytics com AWS
 
AWS Technical Day Riyadh Nov 2019 - Scaling threat detection and response in aws
AWS Technical Day Riyadh Nov 2019 - Scaling threat detection and response in awsAWS Technical Day Riyadh Nov 2019 - Scaling threat detection and response in aws
AWS Technical Day Riyadh Nov 2019 - Scaling threat detection and response in aws
 
Building Data Lakes for Analytics on AWS
Building Data Lakes for Analytics on AWSBuilding Data Lakes for Analytics on AWS
Building Data Lakes for Analytics on AWS
 
Introduction to the Security Perspective of the Cloud Adoption Framework
Introduction to the Security Perspective of the Cloud Adoption FrameworkIntroduction to the Security Perspective of the Cloud Adoption Framework
Introduction to the Security Perspective of the Cloud Adoption Framework
 
Amazon Container Services
Amazon Container ServicesAmazon Container Services
Amazon Container Services
 
BI & Analytics
BI & AnalyticsBI & Analytics
BI & Analytics
 
Migrate & Modernize your legacy Microsoft applications with AWS
Migrate & Modernize your legacy Microsoft applications with AWSMigrate & Modernize your legacy Microsoft applications with AWS
Migrate & Modernize your legacy Microsoft applications with AWS
 
Security & Compliance
Security & ComplianceSecurity & Compliance
Security & Compliance
 
Databases-on-AWS-Purpose-built-databases,-the-right-tool-for-the-right-job
Databases-on-AWS-Purpose-built-databases,-the-right-tool-for-the-right-jobDatabases-on-AWS-Purpose-built-databases,-the-right-tool-for-the-right-job
Databases-on-AWS-Purpose-built-databases,-the-right-tool-for-the-right-job
 
Serverless Extract-transform-load (ETL) on AWS Webinar
Serverless Extract-transform-load (ETL) on AWS WebinarServerless Extract-transform-load (ETL) on AWS Webinar
Serverless Extract-transform-load (ETL) on AWS Webinar
 
End User Collaboration on AWS - AWS Online Tech Talks
End User Collaboration on AWS - AWS Online Tech TalksEnd User Collaboration on AWS - AWS Online Tech Talks
End User Collaboration on AWS - AWS Online Tech Talks
 
The Perimeter Is Dead
The Perimeter Is DeadThe Perimeter Is Dead
The Perimeter Is Dead
 
Starting your Cloud Transformation Journey - Tel Aviv Summit 2018
Starting your Cloud Transformation Journey - Tel Aviv Summit 2018Starting your Cloud Transformation Journey - Tel Aviv Summit 2018
Starting your Cloud Transformation Journey - Tel Aviv Summit 2018
 
Managed Relational Databases - Amazon RDS
Managed Relational Databases - Amazon RDSManaged Relational Databases - Amazon RDS
Managed Relational Databases - Amazon RDS
 
Enabling New Retail Customer Experiences with Big Data - AWS Online Tech Talks
Enabling New Retail Customer Experiences with Big Data - AWS Online Tech TalksEnabling New Retail Customer Experiences with Big Data - AWS Online Tech Talks
Enabling New Retail Customer Experiences with Big Data - AWS Online Tech Talks
 
BI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWSBI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWS
 
AWS Technical Day Riyadh Nov 2019 [Migration]
AWS Technical Day Riyadh Nov 2019 [Migration]AWS Technical Day Riyadh Nov 2019 [Migration]
AWS Technical Day Riyadh Nov 2019 [Migration]
 
Migrate & Optimize Microsoft Applications on AWS
Migrate & Optimize Microsoft Applications on AWSMigrate & Optimize Microsoft Applications on AWS
Migrate & Optimize Microsoft Applications on AWS
 
Preparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/MLPreparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/ML
 
AWS 資料湖服務
AWS 資料湖服務AWS 資料湖服務
AWS 資料湖服務
 

Similar to Immersion Day - Como construir seu Data Lake em dias na AWS

AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019Amazon Web Services
 
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Summits
 
How to go from zero to data lakes in days - ADB202 - New York AWS Summit
How to go from zero to data lakes in days - ADB202 - New York AWS SummitHow to go from zero to data lakes in days - ADB202 - New York AWS Summit
How to go from zero to data lakes in days - ADB202 - New York AWS SummitAmazon Web Services
 
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics Platforms
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics PlatformsAutomate Business Insights on AWS - Simple, Fast, and Secure Analytics Platforms
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics PlatformsAmazon Web Services
 
Accelerated Data Lakes Deep Dive Webinar - Paul Macey
Accelerated Data Lakes Deep Dive Webinar - Paul MaceyAccelerated Data Lakes Deep Dive Webinar - Paul Macey
Accelerated Data Lakes Deep Dive Webinar - Paul MaceyAmazon Web Services
 
Introduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptxIntroduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptxSwathiPonugumati
 
在 AWS 上構建無服務器分析
在 AWS 上構建無服務器分析在 AWS 上構建無服務器分析
在 AWS 上構建無服務器分析Amazon Web Services
 
Secure, Build and Deduplicate Your Data Lake Data with Amazon Lake Formation
Secure, Build and Deduplicate Your Data Lake Data with Amazon Lake FormationSecure, Build and Deduplicate Your Data Lake Data with Amazon Lake Formation
Secure, Build and Deduplicate Your Data Lake Data with Amazon Lake FormationAmazon Web Services
 
Modern Data Platforms - Thinking Data Flywheel on the Cloud
Modern Data Platforms - Thinking Data Flywheel on the CloudModern Data Platforms - Thinking Data Flywheel on the Cloud
Modern Data Platforms - Thinking Data Flywheel on the CloudAlluxio, Inc.
 
Building a Modern Data Platform on AWS
Building a Modern Data Platform on AWSBuilding a Modern Data Platform on AWS
Building a Modern Data Platform on AWSAmazon Web Services
 
Immersion Day - Democratize o acesso ao dado
Immersion Day - Democratize o acesso ao dadoImmersion Day - Democratize o acesso ao dado
Immersion Day - Democratize o acesso ao dadoAmazon Web Services LATAM
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaAmazon Web Services
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaAmazon Web Services
 
Building Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS Summit
Building Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS SummitBuilding Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS Summit
Building Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS SummitAmazon Web Services
 
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWSAWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWSSteven Hsieh
 
Preparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/ML Preparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/ML Amazon Web Services
 

Similar to Immersion Day - Como construir seu Data Lake em dias na AWS (20)

AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
 
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
 
How to go from zero to data lakes in days - ADB202 - New York AWS Summit
How to go from zero to data lakes in days - ADB202 - New York AWS SummitHow to go from zero to data lakes in days - ADB202 - New York AWS Summit
How to go from zero to data lakes in days - ADB202 - New York AWS Summit
 
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics Platforms
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics PlatformsAutomate Business Insights on AWS - Simple, Fast, and Secure Analytics Platforms
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics Platforms
 
Accelerated Data Lakes Deep Dive Webinar - Paul Macey
Accelerated Data Lakes Deep Dive Webinar - Paul MaceyAccelerated Data Lakes Deep Dive Webinar - Paul Macey
Accelerated Data Lakes Deep Dive Webinar - Paul Macey
 
Accelerated Data Lakes Webinar
Accelerated Data Lakes WebinarAccelerated Data Lakes Webinar
Accelerated Data Lakes Webinar
 
Introduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptxIntroduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptx
 
在 AWS 上構建無服務器分析
在 AWS 上構建無服務器分析在 AWS 上構建無服務器分析
在 AWS 上構建無服務器分析
 
Secure, Build and Deduplicate Your Data Lake Data with Amazon Lake Formation
Secure, Build and Deduplicate Your Data Lake Data with Amazon Lake FormationSecure, Build and Deduplicate Your Data Lake Data with Amazon Lake Formation
Secure, Build and Deduplicate Your Data Lake Data with Amazon Lake Formation
 
Modern Data Platforms - Thinking Data Flywheel on the Cloud
Modern Data Platforms - Thinking Data Flywheel on the CloudModern Data Platforms - Thinking Data Flywheel on the Cloud
Modern Data Platforms - Thinking Data Flywheel on the Cloud
 
Building a Modern Data Platform on AWS
Building a Modern Data Platform on AWSBuilding a Modern Data Platform on AWS
Building a Modern Data Platform on AWS
 
Immersion Day - Democratize o acesso ao dado
Immersion Day - Democratize o acesso ao dadoImmersion Day - Democratize o acesso ao dado
Immersion Day - Democratize o acesso ao dado
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
 
Building Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS Summit
Building Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS SummitBuilding Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS Summit
Building Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS Summit
 
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWSAWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
 
Data_Analytics_and_AI_ML
Data_Analytics_and_AI_MLData_Analytics_and_AI_ML
Data_Analytics_and_AI_ML
 
Preparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/ML Preparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/ML
 

More from Amazon Web Services LATAM

AWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvemAWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvemAmazon Web Services LATAM
 
AWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e BackupAWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e BackupAmazon Web Services LATAM
 
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.Amazon Web Services LATAM
 
AWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvemAWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvemAmazon Web Services LATAM
 
AWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e BackupAWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e BackupAmazon Web Services LATAM
 
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.Amazon Web Services LATAM
 
Automatice el proceso de entrega con CI/CD en AWS
Automatice el proceso de entrega con CI/CD en AWSAutomatice el proceso de entrega con CI/CD en AWS
Automatice el proceso de entrega con CI/CD en AWSAmazon Web Services LATAM
 
Automatize seu processo de entrega de software com CI/CD na AWS
Automatize seu processo de entrega de software com CI/CD na AWSAutomatize seu processo de entrega de software com CI/CD na AWS
Automatize seu processo de entrega de software com CI/CD na AWSAmazon Web Services LATAM
 
Ransomware: como recuperar os seus dados na nuvem AWS
Ransomware: como recuperar os seus dados na nuvem AWSRansomware: como recuperar os seus dados na nuvem AWS
Ransomware: como recuperar os seus dados na nuvem AWSAmazon Web Services LATAM
 
Ransomware: cómo recuperar sus datos en la nube de AWS
Ransomware: cómo recuperar sus datos en la nube de AWSRansomware: cómo recuperar sus datos en la nube de AWS
Ransomware: cómo recuperar sus datos en la nube de AWSAmazon Web Services LATAM
 
Aprenda a migrar y transferir datos al usar la nube de AWS
Aprenda a migrar y transferir datos al usar la nube de AWSAprenda a migrar y transferir datos al usar la nube de AWS
Aprenda a migrar y transferir datos al usar la nube de AWSAmazon Web Services LATAM
 
Aprenda como migrar e transferir dados ao utilizar a nuvem da AWS
Aprenda como migrar e transferir dados ao utilizar a nuvem da AWSAprenda como migrar e transferir dados ao utilizar a nuvem da AWS
Aprenda como migrar e transferir dados ao utilizar a nuvem da AWSAmazon Web Services LATAM
 
Cómo mover a un almacenamiento de archivos administrados
Cómo mover a un almacenamiento de archivos administradosCómo mover a un almacenamiento de archivos administrados
Cómo mover a un almacenamiento de archivos administradosAmazon Web Services LATAM
 
Os benefícios de migrar seus workloads de Big Data para a AWS
Os benefícios de migrar seus workloads de Big Data para a AWSOs benefícios de migrar seus workloads de Big Data para a AWS
Os benefícios de migrar seus workloads de Big Data para a AWSAmazon Web Services LATAM
 

More from Amazon Web Services LATAM (20)

AWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvemAWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvem
 
AWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e BackupAWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e Backup
 
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
 
AWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvemAWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvem
 
AWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e BackupAWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e Backup
 
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
 
Automatice el proceso de entrega con CI/CD en AWS
Automatice el proceso de entrega con CI/CD en AWSAutomatice el proceso de entrega con CI/CD en AWS
Automatice el proceso de entrega con CI/CD en AWS
 
Automatize seu processo de entrega de software com CI/CD na AWS
Automatize seu processo de entrega de software com CI/CD na AWSAutomatize seu processo de entrega de software com CI/CD na AWS
Automatize seu processo de entrega de software com CI/CD na AWS
 
Cómo empezar con Amazon EKS
Cómo empezar con Amazon EKSCómo empezar con Amazon EKS
Cómo empezar con Amazon EKS
 
Como começar com Amazon EKS
Como começar com Amazon EKSComo começar com Amazon EKS
Como começar com Amazon EKS
 
Ransomware: como recuperar os seus dados na nuvem AWS
Ransomware: como recuperar os seus dados na nuvem AWSRansomware: como recuperar os seus dados na nuvem AWS
Ransomware: como recuperar os seus dados na nuvem AWS
 
Ransomware: cómo recuperar sus datos en la nube de AWS
Ransomware: cómo recuperar sus datos en la nube de AWSRansomware: cómo recuperar sus datos en la nube de AWS
Ransomware: cómo recuperar sus datos en la nube de AWS
 
Ransomware: Estratégias de Mitigação
Ransomware: Estratégias de MitigaçãoRansomware: Estratégias de Mitigação
Ransomware: Estratégias de Mitigação
 
Ransomware: Estratégias de Mitigación
Ransomware: Estratégias de MitigaciónRansomware: Estratégias de Mitigación
Ransomware: Estratégias de Mitigación
 
Aprenda a migrar y transferir datos al usar la nube de AWS
Aprenda a migrar y transferir datos al usar la nube de AWSAprenda a migrar y transferir datos al usar la nube de AWS
Aprenda a migrar y transferir datos al usar la nube de AWS
 
Aprenda como migrar e transferir dados ao utilizar a nuvem da AWS
Aprenda como migrar e transferir dados ao utilizar a nuvem da AWSAprenda como migrar e transferir dados ao utilizar a nuvem da AWS
Aprenda como migrar e transferir dados ao utilizar a nuvem da AWS
 
Cómo mover a un almacenamiento de archivos administrados
Cómo mover a un almacenamiento de archivos administradosCómo mover a un almacenamiento de archivos administrados
Cómo mover a un almacenamiento de archivos administrados
 
Simplifique su BI con AWS
Simplifique su BI con AWSSimplifique su BI con AWS
Simplifique su BI con AWS
 
Simplifique o seu BI com a AWS
Simplifique o seu BI com a AWSSimplifique o seu BI com a AWS
Simplifique o seu BI com a AWS
 
Os benefícios de migrar seus workloads de Big Data para a AWS
Os benefícios de migrar seus workloads de Big Data para a AWSOs benefícios de migrar seus workloads de Big Data para a AWS
Os benefícios de migrar seus workloads de Big Data para a AWS
 

Recently uploaded

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 

Recently uploaded (20)

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 

Immersion Day - Como construir seu Data Lake em dias na AWS

  • 1. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 2. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 3. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data every 5 years There is more data than people think 15 years live for Data platforms need to 1,000x scale >10x grows
  • 4. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. There are more people accessing data And more requirements for making data available Data Scientists Analysts Business Users Applications Secure Real time Flexible Scalable
  • 5. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. A data lake is a centralized repository that allows you to store all your structured and unstructured data at any scale
  • 6. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Why data lakes? Data Lakes provide: Relational and non-relational data Scale-out to EBs Diverse set of analytics and machine learning tools Work on data without any data movement Designed for low cost storage and analytics OLTP ERP CRM LOB Data Warehouse Business Intelligence Data Lake 100110000100101011100 101010111001010100001 011111011010 0011110010110010110 0100011000010 Devices Web Sensors Social Catalog Machine Learning DW Queries Big data processing Interactive Real-time
  • 7. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon S3 | AWS Glue Any analytic workload, any scale, at the lowest possible cost AWS Direct Connect AWS Snowball AWS Snowmobile AWS Database Migration Service AWS IoT Core Amazon Kinesis Data Firehose Amazon Kinesis Data Streams Amazon Kinesis Video Streams On-premises Data Movement Amazon SageMaker AWS Deep Learning AMIs Amazon Rekognition Amazon Lex AWS DeepLens Amazon Comprehend Amazon Translate Amazon Transcribe Amazon Polly Amazon Athena Amazon EMR Amazon Redshift Amazon Elasticsearch Service Amazon Kinesis Amazon QuickSight Analytics Machine Learning Real-time Data Movement Data Lake
  • 8. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. More data lakes & analytics on AWS than anywhere else
  • 9. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Typical steps of building a data lake Setup storage1 Move data2 Cleanse, prep, and catalog data 3 Configure and enforce security and compliance policies 4 Make data available for analytics 5
  • 10. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building data lakes can still take months
  • 11. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data preparation accounts for ~80% of the work Building training sets Cleaning and organizing data Collecting data sets Mining data for patterns Refining algorithms Other
  • 12. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Sample of steps required Find sources Create Amazon Simple Storage Service (Amazon S3) locations Configure access policies Map tables to Amazon S3 locations ETL jobs to load and clean data Create metadata access policies Configure access from analytics services Rinse and repeat for other: data sets, users, and end-services And more: manage and monitor ETL jobs update metadata catalog as data changes update policies across services as users and permissions change manually maintain cleansing scripts create audit processes for compliance … Manual | Error-prone | Time consuming
  • 13. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Enforce security policies across multiple services Gain and manage new insights Identify, ingest, clean, and transform data Build a secure data lake in days AWS Lake Formation
  • 14. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. How it works
  • 15. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Register existing data or import new Amazon S3 forms the storage layer for Lake Formation Register existing S3 buckets that contain your data Ask Lake Formation to create required S3 buckets and import data into them Data is stored in your account. You have direct access to it. No lock-in. Data Lake Storage Data Catalog Access Control Data import Lake Formation Crawlers ML-based data prep
  • 16. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Easily load data to your data lake logs DBs Blueprints Data Lake Storage Data Catalog Access Control Data import Lake Formation Crawlers ML-based data prep one-shot incremental
  • 17. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. With blueprints You 1. Point us to the source 2. Tell us the location to load to in your data lake 3. Specify how often you want to load the data Blueprints 1. Discover the source table(s) schema 2. Automatically convert to the target data format 3. Automatically partition the data based on the partitioning schema 4. Keep track of data that was already processed 5. You can customize any of the above
  • 18. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Blueprints build on AWS Glue
  • 19. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Easily de-duplicate your data with ML transformations
  • 20. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Secure once, access in multiple ways Data Lake Storage Data Catalog Access Control Lake Formation Admin
  • 21. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Security permissions in Lake Formation Control data access with simple grant and revoke permissions Specify permissions on tables and columns rather than on buckets and objects Easily view policies granted to a particular user Audit all data access at one place
  • 22. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Security permissions in Lake Formation Search and view permissions granted to a user, role, or group in one place Verify permissions granted to a user Easily revoke policies for a user
  • 23. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Grant table and column-level permissions User 1 User 2
  • 24. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Security – deep dive User IAM users, roles Active Directory Amazon S3
  • 25. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Search and collaborate across multiple users Text-based, faceted search across all metadata Add attributes like Data owners, stewards, and other as table properties Add data sensitivity level, column definitions, and others as column properties Text-based search and filtering Query data in Amazon Athena
  • 26. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Audit and monitor in real time See detailed alerts in the console Download audit logs for further analytics Data ingest and catalog notifications also published to Amazon CloudWatch events
  • 27. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Example: a data lake in 3 easy steps
  • 28. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Step 1: Blueprints to ingest data
  • 29. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Monitor the import 1
  • 30. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Imported data as table in the data lake
  • 31. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Step 2: Grant permissions to securely share data
  • 32. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Step 3: Run query in Amazon Athena
  • 33. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Lake Formation Pricing No additional charges – Only pay for the underlying services used.
  • 34. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Customer interest “We are very excited about the launch of AWS Lake Formation, which provides a central point of control to easily load, clean, secure, and catalog data from thousands of clients to our AWS-based data lake, dramatically reducing our operational load. … Additionally, AWS Lake Formation will be HIPAA compliant from day one …” Aaron Symanski, CTO, Change Healthcare “I can’t wait for my team to get our hands on AWS Lake Formation. With an enterprise-ready option like Lake Formation, we will be able to spend more time deriving value from our data rather than doing the heavy lifting involved in manually setting up and managing our data lake.” Joshua Couch, VP Engineering, Fender Digital
  • 35. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. https://aws.amazon.com/es/lake-formation/ https://aws.amazon.com/es/lake-formation/features/ https://aws.amazon.com/es/lake-formation/faqs/
  • 36. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Ricardo Serafim Analytics Specialist Solutions Architect rserafim@amazon.com