SlideShare una empresa de Scribd logo
1 de 19
Descargar para leer sin conexión
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
One Data Lake, Many Uses: Enable Multi-
Tenant Analytics with Amazon EMR
Bruno Faria
A N T 3 4 4
Users
Marketing
Finance
Data
science
Amazon S3 | Amazon Glacier | AWS Glue
Data lake
on AWS
Multi-tenancy motivation
Engineering
Multi-tenant analytics pipeline
COLLECT STORE
PROCESS/
ANALYZE
CONSUME
Authentication | Authorization | Tenant isolation | Monitoring | Metering | . . .
Tenant 1
Tenant 2
Tenant 3
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
User isolation
• Authentication
(User
identification)
Data isolation
• Authorization
• Access
rights/privileges
to resources
• Coarse-grained
• Fine-grained
Resource
isolation
• Queues
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Terminology
• Silo mode
• Tenant analytics (data + processing) is fully isolated from other tenants
• Constructs logically “unique”
• Shared mode
• Tenants share all analytic resources
Silo scenario . . .
Wrangle/manipulate data
Explore/analyze data
Run machine learning models
Processed data
Query results
Deploy
Data engineer
Data analyst
Data scientist
EMR cluster
EMR cluster
EMR cluster
Data engineer bucket
Data analyst bucket
Data scientist bucket
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Multi-tenancy in Amazon EMR
• Silo mode
• Each tenant gets their own Amazon EMR cluster with specific tools
for processing/analyzing
• Data stored in tenant’s Amazon Simple Storage Service (Amazon
S3) bucket or HDFS on the cluster
• Hive metastore on the cluster or externally on Amazon Relational
Database Service (Amazon RDS)
• Pros
• Complete isolation
• Easy to measure usage and resources
• Can be cost effective when using Spot instances
• Cons
• Cannot share data across clusters (especially when using HDFS)
• Can be expensive
Shared scenario . . . /Data engineer bucket
/Data scientist bucket
/Data analyst bucket
Explore/analyze data
Query results / processed data
Data engineer
Data analyst
Data scientist
EMR cluster
Edge node for Amazon EMR
Edge node for Amazon EMR
Edge node for Amazon EMR
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
LDAP
HiveServer2
Presto coordinator
Spark Thrift server
Hue server
Zeppelin server
AWS credentials
EMR Step (EMR API)
Amazon Elastic Compute Cloud
(Amazon EC2) key pair
SSH as “hadoop”
Authentication
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
• Network authentication protocol
• Eliminates the need for
transmission of passwords
across network
• Removes potential threat of an
attacker sniffing the network
Authentication using Kerberos
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
• Storage-based
• EMRFS/Amazon S3
• HDFS
• HiveServer2 and Presto (SQL-based)
• HBase
• Fine-grained access control by cluster tag (AWS
Identity and Access Management (IAM))
• Apache Ranger on Amazon EC2 instance (using
AWS CloudFormation)
Authorization
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
YARN
• Queues
• Share cluster among multiple tenants
• Applications assigned to queues
• ‘root’ – Parent of all queues
• Queues correspond to departments, users, or priorities
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Root
Marketing
Engineering
Data
science
Sales
YARN queue example
{
"Classification": "capacity-scheduler",
"Properties": {
"yarn.scheduler.capacity.maximum-am-resource-percent": "0.6",
"yarn.scheduler.capacity.resource-calculator":
"org.apache.hadoop.yarn.util.resource.DominantResourceCalculator",
"yarn.scheduler.capacity.root.queues": "default,engineering,datascience,marketing",
"yarn.scheduler.capacity.root.default.capacity": "10",
"yarn.scheduler.capacity.root.default.user-limit-factor": "2",
"yarn.scheduler.capacity.root.default.maximum-capacity": "40",
"yarn.scheduler.capacity.root.engineering.capacity": "45",
"yarn.scheduler.capacity.root.datascience.capacity": "30",
"yarn.scheduler.capacity.root.marketing.capacity": "15",
"yarn.scheduler.capacity.root.engineering.user-limit-factor": "2",
"yarn.scheduler.capacity.root.datascience.user-limit-factor": "2",
"yarn.scheduler.capacity.root.marketing.user-limit-factor": "2",
"yarn.scheduler.capacity.root.engineering.maximum-capacity": "75",
"yarn.scheduler.capacity.root.datascience.maximum-capacity": "55",
"yarn.scheduler.capacity.root.marketing.maximum-capacity": "50",
"yarn.scheduler.capacity.root.engineering.state": "RUNNING",
"yarn.scheduler.capacity.root.datascience.state": "RUNNING",
"yarn.scheduler.capacity.root.marketing.state": "RUNNING",
"yarn.scheduler.capacity.root.engineering.acl_submit_applications": "*",
"yarn.scheduler.capacity.root.datascience.acl_submit_applications": "*",
"yarn.scheduler.capacity.root.marketing.acl_submit_applications": "*"
}
},
{
"Classification": "yarn-site",
"Properties": {
"yarn.acl.enable": "true",
"yarn.resourcemanager.scheduler.class":
"org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler"
}
}
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
YARN scheduler
• Nested queues
• Queue weights –
Control fair share of
apps in the queue
• Manage queue access
through ACLs
root
Engineering
Dev
Data
engineering
Marketing
Advertising Analytics
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Más contenido relacionado

La actualidad más candente

Considerations for Building Your First Streaming Application (ANT359) - AWS r...
Considerations for Building Your First Streaming Application (ANT359) - AWS r...Considerations for Building Your First Streaming Application (ANT359) - AWS r...
Considerations for Building Your First Streaming Application (ANT359) - AWS r...Amazon Web Services
 
Building Advanced Workflows with AWS Glue (ANT372) - AWS re:Invent 2018
Building Advanced Workflows with AWS Glue (ANT372) - AWS re:Invent 2018Building Advanced Workflows with AWS Glue (ANT372) - AWS re:Invent 2018
Building Advanced Workflows with AWS Glue (ANT372) - AWS re:Invent 2018Amazon Web Services
 
Build a High-Performance, Cloud-Native, Open-Source Platform on AWS & Save Mi...
Build a High-Performance, Cloud-Native, Open-Source Platform on AWS & Save Mi...Build a High-Performance, Cloud-Native, Open-Source Platform on AWS & Save Mi...
Build a High-Performance, Cloud-Native, Open-Source Platform on AWS & Save Mi...Amazon Web Services
 
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018Amazon Web Services
 
SaaS Analytics and Metrics: Capturing and Surfacing the Data That's Fundament...
SaaS Analytics and Metrics: Capturing and Surfacing the Data That's Fundament...SaaS Analytics and Metrics: Capturing and Surfacing the Data That's Fundament...
SaaS Analytics and Metrics: Capturing and Surfacing the Data That's Fundament...Amazon Web Services
 
Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018
Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018
Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018Amazon Web Services
 
How to Use Jupyter Notebooks with Amazon EMR for Better Productivity (ANT387)...
How to Use Jupyter Notebooks with Amazon EMR for Better Productivity (ANT387)...How to Use Jupyter Notebooks with Amazon EMR for Better Productivity (ANT387)...
How to Use Jupyter Notebooks with Amazon EMR for Better Productivity (ANT387)...Amazon Web Services
 
Enabling a Digital Platform with Microservices Architecture (ARC218-S) - AWS ...
Enabling a Digital Platform with Microservices Architecture (ARC218-S) - AWS ...Enabling a Digital Platform with Microservices Architecture (ARC218-S) - AWS ...
Enabling a Digital Platform with Microservices Architecture (ARC218-S) - AWS ...Amazon Web Services
 
Serverless Data Prep with AWS Glue (ANT313) - AWS re:Invent 2018
Serverless Data Prep with AWS Glue (ANT313) - AWS re:Invent 2018Serverless Data Prep with AWS Glue (ANT313) - AWS re:Invent 2018
Serverless Data Prep with AWS Glue (ANT313) - AWS re:Invent 2018Amazon Web Services
 
Discover & Migrate at Scale with AWS Migration Hub & Application Discovery Se...
Discover & Migrate at Scale with AWS Migration Hub & Application Discovery Se...Discover & Migrate at Scale with AWS Migration Hub & Application Discovery Se...
Discover & Migrate at Scale with AWS Migration Hub & Application Discovery Se...Amazon Web Services
 
Building Advanced Workflows with AWS Glue (ANT333) - AWS re:Invent 2018
Building Advanced Workflows with AWS Glue (ANT333) - AWS re:Invent 2018Building Advanced Workflows with AWS Glue (ANT333) - AWS re:Invent 2018
Building Advanced Workflows with AWS Glue (ANT333) - AWS re:Invent 2018Amazon Web Services
 
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...Amazon Web Services
 
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018Amazon Web Services
 
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...Amazon Web Services
 
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...Amazon Web Services
 
Cost Optimization Tooling (ARC301) - AWS re:Invent 2018
Cost Optimization Tooling (ARC301) - AWS re:Invent 2018Cost Optimization Tooling (ARC301) - AWS re:Invent 2018
Cost Optimization Tooling (ARC301) - AWS re:Invent 2018Amazon Web Services
 
Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018
Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018
Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018Amazon Web Services
 
How to Build HR Lakes on AWS to Unlock New Business Insights (DAT367) - AWS r...
How to Build HR Lakes on AWS to Unlock New Business Insights (DAT367) - AWS r...How to Build HR Lakes on AWS to Unlock New Business Insights (DAT367) - AWS r...
How to Build HR Lakes on AWS to Unlock New Business Insights (DAT367) - AWS r...Amazon Web Services
 
Working with Relational Databases in AWS Glue ETL (ANT342) - AWS re:Invent 2018
Working with Relational Databases in AWS Glue ETL (ANT342) - AWS re:Invent 2018Working with Relational Databases in AWS Glue ETL (ANT342) - AWS re:Invent 2018
Working with Relational Databases in AWS Glue ETL (ANT342) - AWS re:Invent 2018Amazon Web Services
 
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...Amazon Web Services
 

La actualidad más candente (20)

Considerations for Building Your First Streaming Application (ANT359) - AWS r...
Considerations for Building Your First Streaming Application (ANT359) - AWS r...Considerations for Building Your First Streaming Application (ANT359) - AWS r...
Considerations for Building Your First Streaming Application (ANT359) - AWS r...
 
Building Advanced Workflows with AWS Glue (ANT372) - AWS re:Invent 2018
Building Advanced Workflows with AWS Glue (ANT372) - AWS re:Invent 2018Building Advanced Workflows with AWS Glue (ANT372) - AWS re:Invent 2018
Building Advanced Workflows with AWS Glue (ANT372) - AWS re:Invent 2018
 
Build a High-Performance, Cloud-Native, Open-Source Platform on AWS & Save Mi...
Build a High-Performance, Cloud-Native, Open-Source Platform on AWS & Save Mi...Build a High-Performance, Cloud-Native, Open-Source Platform on AWS & Save Mi...
Build a High-Performance, Cloud-Native, Open-Source Platform on AWS & Save Mi...
 
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018
 
SaaS Analytics and Metrics: Capturing and Surfacing the Data That's Fundament...
SaaS Analytics and Metrics: Capturing and Surfacing the Data That's Fundament...SaaS Analytics and Metrics: Capturing and Surfacing the Data That's Fundament...
SaaS Analytics and Metrics: Capturing and Surfacing the Data That's Fundament...
 
Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018
Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018
Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018
 
How to Use Jupyter Notebooks with Amazon EMR for Better Productivity (ANT387)...
How to Use Jupyter Notebooks with Amazon EMR for Better Productivity (ANT387)...How to Use Jupyter Notebooks with Amazon EMR for Better Productivity (ANT387)...
How to Use Jupyter Notebooks with Amazon EMR for Better Productivity (ANT387)...
 
Enabling a Digital Platform with Microservices Architecture (ARC218-S) - AWS ...
Enabling a Digital Platform with Microservices Architecture (ARC218-S) - AWS ...Enabling a Digital Platform with Microservices Architecture (ARC218-S) - AWS ...
Enabling a Digital Platform with Microservices Architecture (ARC218-S) - AWS ...
 
Serverless Data Prep with AWS Glue (ANT313) - AWS re:Invent 2018
Serverless Data Prep with AWS Glue (ANT313) - AWS re:Invent 2018Serverless Data Prep with AWS Glue (ANT313) - AWS re:Invent 2018
Serverless Data Prep with AWS Glue (ANT313) - AWS re:Invent 2018
 
Discover & Migrate at Scale with AWS Migration Hub & Application Discovery Se...
Discover & Migrate at Scale with AWS Migration Hub & Application Discovery Se...Discover & Migrate at Scale with AWS Migration Hub & Application Discovery Se...
Discover & Migrate at Scale with AWS Migration Hub & Application Discovery Se...
 
Building Advanced Workflows with AWS Glue (ANT333) - AWS re:Invent 2018
Building Advanced Workflows with AWS Glue (ANT333) - AWS re:Invent 2018Building Advanced Workflows with AWS Glue (ANT333) - AWS re:Invent 2018
Building Advanced Workflows with AWS Glue (ANT333) - AWS re:Invent 2018
 
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...
 
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
 
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
 
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
 
Cost Optimization Tooling (ARC301) - AWS re:Invent 2018
Cost Optimization Tooling (ARC301) - AWS re:Invent 2018Cost Optimization Tooling (ARC301) - AWS re:Invent 2018
Cost Optimization Tooling (ARC301) - AWS re:Invent 2018
 
Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018
Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018
Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018
 
How to Build HR Lakes on AWS to Unlock New Business Insights (DAT367) - AWS r...
How to Build HR Lakes on AWS to Unlock New Business Insights (DAT367) - AWS r...How to Build HR Lakes on AWS to Unlock New Business Insights (DAT367) - AWS r...
How to Build HR Lakes on AWS to Unlock New Business Insights (DAT367) - AWS r...
 
Working with Relational Databases in AWS Glue ETL (ANT342) - AWS re:Invent 2018
Working with Relational Databases in AWS Glue ETL (ANT342) - AWS re:Invent 2018Working with Relational Databases in AWS Glue ETL (ANT342) - AWS re:Invent 2018
Working with Relational Databases in AWS Glue ETL (ANT342) - AWS re:Invent 2018
 
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
 

Similar a One Data Lake, Many Uses: Enable Multi-Tenant Analytics with Amazon EMR (ANT344) - AWS re:Invent 2018

One Data Lake, Many Uses: Enabling Multi-Tenant Analytics with Amazon EMR (AN...
One Data Lake, Many Uses: Enabling Multi-Tenant Analytics with Amazon EMR (AN...One Data Lake, Many Uses: Enabling Multi-Tenant Analytics with Amazon EMR (AN...
One Data Lake, Many Uses: Enabling Multi-Tenant Analytics with Amazon EMR (AN...Amazon Web Services
 
Data freedom: come migrare i carichi di lavoro Big Data su AWS
Data freedom: come migrare i carichi di lavoro Big Data su AWSData freedom: come migrare i carichi di lavoro Big Data su AWS
Data freedom: come migrare i carichi di lavoro Big Data su AWSAmazon Web Services
 
Accelerate Analytics at Scale with Amazon EMR - AWS Summit Sydney 2018
Accelerate Analytics at Scale with Amazon EMR - AWS Summit Sydney 2018Accelerate Analytics at Scale with Amazon EMR - AWS Summit Sydney 2018
Accelerate Analytics at Scale with Amazon EMR - AWS Summit Sydney 2018Amazon Web Services
 
Best Practices to Secure Data Lake on AWS (ANT327) - AWS re:Invent 2018
Best Practices to Secure Data Lake on AWS (ANT327) - AWS re:Invent 2018Best Practices to Secure Data Lake on AWS (ANT327) - AWS re:Invent 2018
Best Practices to Secure Data Lake on AWS (ANT327) - AWS re:Invent 2018Amazon Web Services
 
AWS SUMMIT TEL AVIV - 2018
AWS SUMMIT TEL AVIV - 2018AWS SUMMIT TEL AVIV - 2018
AWS SUMMIT TEL AVIV - 2018Ayaz Hussain
 
Adding Search to DynamoDB: Database Week San Francisco
Adding Search to DynamoDB: Database Week San FranciscoAdding Search to DynamoDB: Database Week San Francisco
Adding Search to DynamoDB: Database Week San FranciscoAmazon Web Services
 
Using Search with a Database: Database Week SF
Using Search with a Database: Database Week SFUsing Search with a Database: Database Week SF
Using Search with a Database: Database Week SFAmazon Web Services
 
Evolve Your Incident Response Process and Powers for AWS
Evolve Your Incident Response Process and Powers for AWS Evolve Your Incident Response Process and Powers for AWS
Evolve Your Incident Response Process and Powers for AWS Amazon Web Services
 
Lock It Down: How to Secure Your Organization's AWS Account
Lock It Down: How to Secure Your Organization's AWS AccountLock It Down: How to Secure Your Organization's AWS Account
Lock It Down: How to Secure Your Organization's AWS AccountAmazon Web Services
 
Using AWS Key Management Service for Secure Workloads
Using AWS Key Management Service for Secure WorkloadsUsing AWS Key Management Service for Secure Workloads
Using AWS Key Management Service for Secure WorkloadsAmazon Web Services
 
Using Search with a Database - Peter Dachnowicz
Using Search with a Database - Peter DachnowiczUsing Search with a Database - Peter Dachnowicz
Using Search with a Database - Peter DachnowiczAmazon Web Services
 
Evolve Your Incident Response Process and Powers for AWS - SID306 - Chicago A...
Evolve Your Incident Response Process and Powers for AWS - SID306 - Chicago A...Evolve Your Incident Response Process and Powers for AWS - SID306 - Chicago A...
Evolve Your Incident Response Process and Powers for AWS - SID306 - Chicago A...Amazon Web Services
 
Accelerate Data Analytics at Scale with Amazon EMR
Accelerate Data Analytics at Scale with Amazon EMRAccelerate Data Analytics at Scale with Amazon EMR
Accelerate Data Analytics at Scale with Amazon EMRAmazon Web Services
 
Deep Dive on Amazon Elastic Container Service (ECS) I AWS Dev Day 2018
Deep Dive on Amazon Elastic Container Service (ECS) I AWS Dev Day 2018Deep Dive on Amazon Elastic Container Service (ECS) I AWS Dev Day 2018
Deep Dive on Amazon Elastic Container Service (ECS) I AWS Dev Day 2018AWS Germany
 
Serverless Architectural Patterns
Serverless Architectural PatternsServerless Architectural Patterns
Serverless Architectural PatternsAmazon Web Services
 
Serverless use cases with AWS Lambda - More Serverless Event
Serverless use cases with AWS Lambda - More Serverless EventServerless use cases with AWS Lambda - More Serverless Event
Serverless use cases with AWS Lambda - More Serverless EventBoaz Ziniman
 
IVS CTO Night And Day 2018 Winter - [re:Cap] Containers & Microservices
IVS CTO Night And Day 2018 Winter - [re:Cap] Containers & MicroservicesIVS CTO Night And Day 2018 Winter - [re:Cap] Containers & Microservices
IVS CTO Night And Day 2018 Winter - [re:Cap] Containers & MicroservicesAmazon Web Services Japan
 
Mythical Mysfits: Management and Ops with AWS Fargate (CON322-R1) - AWS re:In...
Mythical Mysfits: Management and Ops with AWS Fargate (CON322-R1) - AWS re:In...Mythical Mysfits: Management and Ops with AWS Fargate (CON322-R1) - AWS re:In...
Mythical Mysfits: Management and Ops with AWS Fargate (CON322-R1) - AWS re:In...Amazon Web Services
 
Getting Started with AWS Security
Getting Started with AWS SecurityGetting Started with AWS Security
Getting Started with AWS SecurityAmazon Web Services
 

Similar a One Data Lake, Many Uses: Enable Multi-Tenant Analytics with Amazon EMR (ANT344) - AWS re:Invent 2018 (20)

One Data Lake, Many Uses: Enabling Multi-Tenant Analytics with Amazon EMR (AN...
One Data Lake, Many Uses: Enabling Multi-Tenant Analytics with Amazon EMR (AN...One Data Lake, Many Uses: Enabling Multi-Tenant Analytics with Amazon EMR (AN...
One Data Lake, Many Uses: Enabling Multi-Tenant Analytics with Amazon EMR (AN...
 
Data freedom: come migrare i carichi di lavoro Big Data su AWS
Data freedom: come migrare i carichi di lavoro Big Data su AWSData freedom: come migrare i carichi di lavoro Big Data su AWS
Data freedom: come migrare i carichi di lavoro Big Data su AWS
 
Accelerate Analytics at Scale with Amazon EMR - AWS Summit Sydney 2018
Accelerate Analytics at Scale with Amazon EMR - AWS Summit Sydney 2018Accelerate Analytics at Scale with Amazon EMR - AWS Summit Sydney 2018
Accelerate Analytics at Scale with Amazon EMR - AWS Summit Sydney 2018
 
Best Practices to Secure Data Lake on AWS (ANT327) - AWS re:Invent 2018
Best Practices to Secure Data Lake on AWS (ANT327) - AWS re:Invent 2018Best Practices to Secure Data Lake on AWS (ANT327) - AWS re:Invent 2018
Best Practices to Secure Data Lake on AWS (ANT327) - AWS re:Invent 2018
 
AWS 101 - Tel Aviv Summit 2018
AWS 101 - Tel Aviv Summit 2018AWS 101 - Tel Aviv Summit 2018
AWS 101 - Tel Aviv Summit 2018
 
AWS SUMMIT TEL AVIV - 2018
AWS SUMMIT TEL AVIV - 2018AWS SUMMIT TEL AVIV - 2018
AWS SUMMIT TEL AVIV - 2018
 
Adding Search to DynamoDB: Database Week San Francisco
Adding Search to DynamoDB: Database Week San FranciscoAdding Search to DynamoDB: Database Week San Francisco
Adding Search to DynamoDB: Database Week San Francisco
 
Using Search with a Database: Database Week SF
Using Search with a Database: Database Week SFUsing Search with a Database: Database Week SF
Using Search with a Database: Database Week SF
 
Evolve Your Incident Response Process and Powers for AWS
Evolve Your Incident Response Process and Powers for AWS Evolve Your Incident Response Process and Powers for AWS
Evolve Your Incident Response Process and Powers for AWS
 
Lock It Down: How to Secure Your Organization's AWS Account
Lock It Down: How to Secure Your Organization's AWS AccountLock It Down: How to Secure Your Organization's AWS Account
Lock It Down: How to Secure Your Organization's AWS Account
 
Using AWS Key Management Service for Secure Workloads
Using AWS Key Management Service for Secure WorkloadsUsing AWS Key Management Service for Secure Workloads
Using AWS Key Management Service for Secure Workloads
 
Using Search with a Database - Peter Dachnowicz
Using Search with a Database - Peter DachnowiczUsing Search with a Database - Peter Dachnowicz
Using Search with a Database - Peter Dachnowicz
 
Evolve Your Incident Response Process and Powers for AWS - SID306 - Chicago A...
Evolve Your Incident Response Process and Powers for AWS - SID306 - Chicago A...Evolve Your Incident Response Process and Powers for AWS - SID306 - Chicago A...
Evolve Your Incident Response Process and Powers for AWS - SID306 - Chicago A...
 
Accelerate Data Analytics at Scale with Amazon EMR
Accelerate Data Analytics at Scale with Amazon EMRAccelerate Data Analytics at Scale with Amazon EMR
Accelerate Data Analytics at Scale with Amazon EMR
 
Deep Dive on Amazon Elastic Container Service (ECS) I AWS Dev Day 2018
Deep Dive on Amazon Elastic Container Service (ECS) I AWS Dev Day 2018Deep Dive on Amazon Elastic Container Service (ECS) I AWS Dev Day 2018
Deep Dive on Amazon Elastic Container Service (ECS) I AWS Dev Day 2018
 
Serverless Architectural Patterns
Serverless Architectural PatternsServerless Architectural Patterns
Serverless Architectural Patterns
 
Serverless use cases with AWS Lambda - More Serverless Event
Serverless use cases with AWS Lambda - More Serverless EventServerless use cases with AWS Lambda - More Serverless Event
Serverless use cases with AWS Lambda - More Serverless Event
 
IVS CTO Night And Day 2018 Winter - [re:Cap] Containers & Microservices
IVS CTO Night And Day 2018 Winter - [re:Cap] Containers & MicroservicesIVS CTO Night And Day 2018 Winter - [re:Cap] Containers & Microservices
IVS CTO Night And Day 2018 Winter - [re:Cap] Containers & Microservices
 
Mythical Mysfits: Management and Ops with AWS Fargate (CON322-R1) - AWS re:In...
Mythical Mysfits: Management and Ops with AWS Fargate (CON322-R1) - AWS re:In...Mythical Mysfits: Management and Ops with AWS Fargate (CON322-R1) - AWS re:In...
Mythical Mysfits: Management and Ops with AWS Fargate (CON322-R1) - AWS re:In...
 
Getting Started with AWS Security
Getting Started with AWS SecurityGetting Started with AWS Security
Getting Started with AWS Security
 

Más de Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Más de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

One Data Lake, Many Uses: Enable Multi-Tenant Analytics with Amazon EMR (ANT344) - AWS re:Invent 2018

  • 1. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. One Data Lake, Many Uses: Enable Multi- Tenant Analytics with Amazon EMR Bruno Faria A N T 3 4 4
  • 2. Users Marketing Finance Data science Amazon S3 | Amazon Glacier | AWS Glue Data lake on AWS Multi-tenancy motivation Engineering
  • 3. Multi-tenant analytics pipeline COLLECT STORE PROCESS/ ANALYZE CONSUME Authentication | Authorization | Tenant isolation | Monitoring | Metering | . . . Tenant 1 Tenant 2 Tenant 3
  • 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. User isolation • Authentication (User identification) Data isolation • Authorization • Access rights/privileges to resources • Coarse-grained • Fine-grained Resource isolation • Queues
  • 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Terminology • Silo mode • Tenant analytics (data + processing) is fully isolated from other tenants • Constructs logically “unique” • Shared mode • Tenants share all analytic resources
  • 7. Silo scenario . . . Wrangle/manipulate data Explore/analyze data Run machine learning models Processed data Query results Deploy Data engineer Data analyst Data scientist EMR cluster EMR cluster EMR cluster Data engineer bucket Data analyst bucket Data scientist bucket
  • 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Multi-tenancy in Amazon EMR • Silo mode • Each tenant gets their own Amazon EMR cluster with specific tools for processing/analyzing • Data stored in tenant’s Amazon Simple Storage Service (Amazon S3) bucket or HDFS on the cluster • Hive metastore on the cluster or externally on Amazon Relational Database Service (Amazon RDS) • Pros • Complete isolation • Easy to measure usage and resources • Can be cost effective when using Spot instances • Cons • Cannot share data across clusters (especially when using HDFS) • Can be expensive
  • 9. Shared scenario . . . /Data engineer bucket /Data scientist bucket /Data analyst bucket Explore/analyze data Query results / processed data Data engineer Data analyst Data scientist EMR cluster Edge node for Amazon EMR Edge node for Amazon EMR Edge node for Amazon EMR
  • 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. LDAP HiveServer2 Presto coordinator Spark Thrift server Hue server Zeppelin server AWS credentials EMR Step (EMR API) Amazon Elastic Compute Cloud (Amazon EC2) key pair SSH as “hadoop” Authentication
  • 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. • Network authentication protocol • Eliminates the need for transmission of passwords across network • Removes potential threat of an attacker sniffing the network Authentication using Kerberos
  • 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. • Storage-based • EMRFS/Amazon S3 • HDFS • HiveServer2 and Presto (SQL-based) • HBase • Fine-grained access control by cluster tag (AWS Identity and Access Management (IAM)) • Apache Ranger on Amazon EC2 instance (using AWS CloudFormation) Authorization
  • 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. YARN • Queues • Share cluster among multiple tenants • Applications assigned to queues • ‘root’ – Parent of all queues • Queues correspond to departments, users, or priorities
  • 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Root Marketing Engineering Data science Sales YARN queue example { "Classification": "capacity-scheduler", "Properties": { "yarn.scheduler.capacity.maximum-am-resource-percent": "0.6", "yarn.scheduler.capacity.resource-calculator": "org.apache.hadoop.yarn.util.resource.DominantResourceCalculator", "yarn.scheduler.capacity.root.queues": "default,engineering,datascience,marketing", "yarn.scheduler.capacity.root.default.capacity": "10", "yarn.scheduler.capacity.root.default.user-limit-factor": "2", "yarn.scheduler.capacity.root.default.maximum-capacity": "40", "yarn.scheduler.capacity.root.engineering.capacity": "45", "yarn.scheduler.capacity.root.datascience.capacity": "30", "yarn.scheduler.capacity.root.marketing.capacity": "15", "yarn.scheduler.capacity.root.engineering.user-limit-factor": "2", "yarn.scheduler.capacity.root.datascience.user-limit-factor": "2", "yarn.scheduler.capacity.root.marketing.user-limit-factor": "2", "yarn.scheduler.capacity.root.engineering.maximum-capacity": "75", "yarn.scheduler.capacity.root.datascience.maximum-capacity": "55", "yarn.scheduler.capacity.root.marketing.maximum-capacity": "50", "yarn.scheduler.capacity.root.engineering.state": "RUNNING", "yarn.scheduler.capacity.root.datascience.state": "RUNNING", "yarn.scheduler.capacity.root.marketing.state": "RUNNING", "yarn.scheduler.capacity.root.engineering.acl_submit_applications": "*", "yarn.scheduler.capacity.root.datascience.acl_submit_applications": "*", "yarn.scheduler.capacity.root.marketing.acl_submit_applications": "*" } }, { "Classification": "yarn-site", "Properties": { "yarn.acl.enable": "true", "yarn.resourcemanager.scheduler.class": "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler" } }
  • 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. YARN scheduler • Nested queues • Queue weights – Control fair share of apps in the queue • Manage queue access through ACLs root Engineering Dev Data engineering Marketing Advertising Analytics
  • 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.