SlideShare una empresa de Scribd logo
1 de 18
Descargar para leer sin conexión
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Brad Staszcuk
Solution Architect – Amazon Web Services
Dr Haydn Read
Head of Infrastructure Programmes - Auckland Council
Accelerate Analytics At Scale With
Amazon EMR
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
What Is Amazon EMR?
Low cost
Pay per-second
Open-source variety
Latest versions of software
Secure
Easy to enable options
Flexible
Full customisation and control
Easy to use
Launch a cluster in minutes
Managed
Spend less time monitoring
Flink
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Open-Source Applications
Amazon EMR
service
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Use The AWS Glue Data Catalog
• Support for Spark,
Hive and Presto
• Auto-generate
schema and
partitions
• Managed table
updates
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Hbase For Random Access At Massive Scale
HDFS
(data node)
Local Disk
HBase region
server
YARN node
manager
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Tips To Lower Your Costs
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Transient Or Long-Running Clusters
• Shut your clusters down when you don’t need them
• Use Amazon Linux AMI with preinstalled customisations for faster startup
• Use Auto Scaling to minimise costs for long-running clusters
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Amazon EC2 Spot And Instance Fleets
• EMR will select optimal EC2 AZ
• Provision across instance types
• Switch to on-demand
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Use Auto Scaling
Scaling options
Threshold
CloudWatch or custom metric
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Auto Scaling
• EMR scales-in at YARN task completion
• Selectively removes nodes with no running tasks
• yarn.resourcemanager.decommissioning.timeout
• Default timeout is one hour
• Spark scale-in contributions
• Spark specific blacklisting of tasks
• Unregistering cached data and shuffle blocks
• Advanced error handling
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Tips To Secure Your Cluster
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Encryption
• Spark
• Tez
• MapReduce
• Presto
• HBase
• Hive
• Pig
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Authentication
LDAP
HiveServer2
Presto Coordinator
Spark Thrift Server
Hue Server
Zeppelin Server
AWS credentials
Amazon EMR Step (EMR API)
EC2 key pair
SSH as “hadoop”
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
New – Authentication With Kerberos
Microsoft
Active Directory
KDC
Users
YARN RMDoAs
Service principals for
all cluster nodes
Master Node
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Authorisation
• Storage-based
• EMRFS/S3
• HDFS
• HiveServer2 and Presto (SQL-based)
• HBase
• YARN queues
• Fine-grained access control by cluster tag (IAM)
• Apache Ranger on edge node (using AWS CloudFormation)
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Configure AWS IAM Roles For EMRFS Requests
Context
User: aduser
Group: analyst
IAM role: analytics_prod
Can map IAM roles to user, group, or S3 prefix
Context
User: aduser2
Group: dev
IAM role: analytics_dev
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Introducing…
Dr Haydn Read, Auckland Council
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Thank You
craigar@amazon.com
https://aws.amazon.com/emr/

Más contenido relacionado

La actualidad más candente

Understanding High Availability on Amazon Aurora
Understanding High Availability on Amazon Aurora Understanding High Availability on Amazon Aurora
Understanding High Availability on Amazon Aurora Amazon Web Services
 
AWS Greengrass, Containers, and Your Dev Process for Edge Apps (GPSWS404) - A...
AWS Greengrass, Containers, and Your Dev Process for Edge Apps (GPSWS404) - A...AWS Greengrass, Containers, and Your Dev Process for Edge Apps (GPSWS404) - A...
AWS Greengrass, Containers, and Your Dev Process for Edge Apps (GPSWS404) - A...Amazon Web Services
 
Real-World Use Cases for Amazon DynamoDB (DAT333) - AWS re:Invent 2018
Real-World Use Cases for Amazon DynamoDB (DAT333) - AWS re:Invent 2018Real-World Use Cases for Amazon DynamoDB (DAT333) - AWS re:Invent 2018
Real-World Use Cases for Amazon DynamoDB (DAT333) - AWS re:Invent 2018Amazon Web Services
 
Hands-On with Amazon ElastiCache for Redis - Workshop (DAT309-R1) - AWS re:In...
Hands-On with Amazon ElastiCache for Redis - Workshop (DAT309-R1) - AWS re:In...Hands-On with Amazon ElastiCache for Redis - Workshop (DAT309-R1) - AWS re:In...
Hands-On with Amazon ElastiCache for Redis - Workshop (DAT309-R1) - AWS re:In...Amazon Web Services
 
How to Use Jupyter Notebooks with Amazon EMR for Better Productivity (ANT387)...
How to Use Jupyter Notebooks with Amazon EMR for Better Productivity (ANT387)...How to Use Jupyter Notebooks with Amazon EMR for Better Productivity (ANT387)...
How to Use Jupyter Notebooks with Amazon EMR for Better Productivity (ANT387)...Amazon Web Services
 
Move Data to AWS Faster for Migrations, DR, & Bidirectional Workflows (STG382...
Move Data to AWS Faster for Migrations, DR, & Bidirectional Workflows (STG382...Move Data to AWS Faster for Migrations, DR, & Bidirectional Workflows (STG382...
Move Data to AWS Faster for Migrations, DR, & Bidirectional Workflows (STG382...Amazon Web Services
 
Discover & Migrate at Scale with AWS Migration Hub & Application Discovery Se...
Discover & Migrate at Scale with AWS Migration Hub & Application Discovery Se...Discover & Migrate at Scale with AWS Migration Hub & Application Discovery Se...
Discover & Migrate at Scale with AWS Migration Hub & Application Discovery Se...Amazon Web Services
 
Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018
Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018
Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018Amazon Web Services
 
Querying Data in Place with AWS Object Storage Features and Analytics Tools (...
Querying Data in Place with AWS Object Storage Features and Analytics Tools (...Querying Data in Place with AWS Object Storage Features and Analytics Tools (...
Querying Data in Place with AWS Object Storage Features and Analytics Tools (...Amazon Web Services
 
Hands-On: Deploy Remote Graphics Desktops for Content Production (CMP422) - A...
Hands-On: Deploy Remote Graphics Desktops for Content Production (CMP422) - A...Hands-On: Deploy Remote Graphics Desktops for Content Production (CMP422) - A...
Hands-On: Deploy Remote Graphics Desktops for Content Production (CMP422) - A...Amazon Web Services
 
Migrating Financial and Accounting Systems from Oracle to Amazon DynamoDB (DA...
Migrating Financial and Accounting Systems from Oracle to Amazon DynamoDB (DA...Migrating Financial and Accounting Systems from Oracle to Amazon DynamoDB (DA...
Migrating Financial and Accounting Systems from Oracle to Amazon DynamoDB (DA...Amazon Web Services
 
Tailor-Made SaaS: Multi-Tenant Customizations with AWS Lambda (GPSCT311) - AW...
Tailor-Made SaaS: Multi-Tenant Customizations with AWS Lambda (GPSCT311) - AW...Tailor-Made SaaS: Multi-Tenant Customizations with AWS Lambda (GPSCT311) - AW...
Tailor-Made SaaS: Multi-Tenant Customizations with AWS Lambda (GPSCT311) - AW...Amazon Web Services
 
Cloud Data Migration with Amazon EBS (CMP406-R2) - AWS re:Invent 2018
Cloud Data Migration with Amazon EBS (CMP406-R2) - AWS re:Invent 2018Cloud Data Migration with Amazon EBS (CMP406-R2) - AWS re:Invent 2018
Cloud Data Migration with Amazon EBS (CMP406-R2) - AWS re:Invent 2018Amazon Web Services
 
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018Amazon Web Services
 
Customizing Data Lakes to Work for Your Enterprise with Sysco (STG340) - AWS ...
Customizing Data Lakes to Work for Your Enterprise with Sysco (STG340) - AWS ...Customizing Data Lakes to Work for Your Enterprise with Sysco (STG340) - AWS ...
Customizing Data Lakes to Work for Your Enterprise with Sysco (STG340) - AWS ...Amazon Web Services
 
Scalable Multi-Node Deep Learning Training in the Cloud (CMP368-R1) - AWS re:...
Scalable Multi-Node Deep Learning Training in the Cloud (CMP368-R1) - AWS re:...Scalable Multi-Node Deep Learning Training in the Cloud (CMP368-R1) - AWS re:...
Scalable Multi-Node Deep Learning Training in the Cloud (CMP368-R1) - AWS re:...Amazon Web Services
 
Migrating Open Source Databases from Amazon EC2 to Aurora PostgreSQL (DAT339)...
Migrating Open Source Databases from Amazon EC2 to Aurora PostgreSQL (DAT339)...Migrating Open Source Databases from Amazon EC2 to Aurora PostgreSQL (DAT339)...
Migrating Open Source Databases from Amazon EC2 to Aurora PostgreSQL (DAT339)...Amazon Web Services
 
Workshop: Architecting a Serverless Data Lake
Workshop: Architecting a Serverless Data LakeWorkshop: Architecting a Serverless Data Lake
Workshop: Architecting a Serverless Data LakeAmazon Web Services
 
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT332) - AWS re:Inv...
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT332) - AWS re:Inv...Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT332) - AWS re:Inv...
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT332) - AWS re:Inv...Amazon Web Services
 
Building Advanced Workflows with AWS Glue (ANT372) - AWS re:Invent 2018
Building Advanced Workflows with AWS Glue (ANT372) - AWS re:Invent 2018Building Advanced Workflows with AWS Glue (ANT372) - AWS re:Invent 2018
Building Advanced Workflows with AWS Glue (ANT372) - AWS re:Invent 2018Amazon Web Services
 

La actualidad más candente (20)

Understanding High Availability on Amazon Aurora
Understanding High Availability on Amazon Aurora Understanding High Availability on Amazon Aurora
Understanding High Availability on Amazon Aurora
 
AWS Greengrass, Containers, and Your Dev Process for Edge Apps (GPSWS404) - A...
AWS Greengrass, Containers, and Your Dev Process for Edge Apps (GPSWS404) - A...AWS Greengrass, Containers, and Your Dev Process for Edge Apps (GPSWS404) - A...
AWS Greengrass, Containers, and Your Dev Process for Edge Apps (GPSWS404) - A...
 
Real-World Use Cases for Amazon DynamoDB (DAT333) - AWS re:Invent 2018
Real-World Use Cases for Amazon DynamoDB (DAT333) - AWS re:Invent 2018Real-World Use Cases for Amazon DynamoDB (DAT333) - AWS re:Invent 2018
Real-World Use Cases for Amazon DynamoDB (DAT333) - AWS re:Invent 2018
 
Hands-On with Amazon ElastiCache for Redis - Workshop (DAT309-R1) - AWS re:In...
Hands-On with Amazon ElastiCache for Redis - Workshop (DAT309-R1) - AWS re:In...Hands-On with Amazon ElastiCache for Redis - Workshop (DAT309-R1) - AWS re:In...
Hands-On with Amazon ElastiCache for Redis - Workshop (DAT309-R1) - AWS re:In...
 
How to Use Jupyter Notebooks with Amazon EMR for Better Productivity (ANT387)...
How to Use Jupyter Notebooks with Amazon EMR for Better Productivity (ANT387)...How to Use Jupyter Notebooks with Amazon EMR for Better Productivity (ANT387)...
How to Use Jupyter Notebooks with Amazon EMR for Better Productivity (ANT387)...
 
Move Data to AWS Faster for Migrations, DR, & Bidirectional Workflows (STG382...
Move Data to AWS Faster for Migrations, DR, & Bidirectional Workflows (STG382...Move Data to AWS Faster for Migrations, DR, & Bidirectional Workflows (STG382...
Move Data to AWS Faster for Migrations, DR, & Bidirectional Workflows (STG382...
 
Discover & Migrate at Scale with AWS Migration Hub & Application Discovery Se...
Discover & Migrate at Scale with AWS Migration Hub & Application Discovery Se...Discover & Migrate at Scale with AWS Migration Hub & Application Discovery Se...
Discover & Migrate at Scale with AWS Migration Hub & Application Discovery Se...
 
Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018
Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018
Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018
 
Querying Data in Place with AWS Object Storage Features and Analytics Tools (...
Querying Data in Place with AWS Object Storage Features and Analytics Tools (...Querying Data in Place with AWS Object Storage Features and Analytics Tools (...
Querying Data in Place with AWS Object Storage Features and Analytics Tools (...
 
Hands-On: Deploy Remote Graphics Desktops for Content Production (CMP422) - A...
Hands-On: Deploy Remote Graphics Desktops for Content Production (CMP422) - A...Hands-On: Deploy Remote Graphics Desktops for Content Production (CMP422) - A...
Hands-On: Deploy Remote Graphics Desktops for Content Production (CMP422) - A...
 
Migrating Financial and Accounting Systems from Oracle to Amazon DynamoDB (DA...
Migrating Financial and Accounting Systems from Oracle to Amazon DynamoDB (DA...Migrating Financial and Accounting Systems from Oracle to Amazon DynamoDB (DA...
Migrating Financial and Accounting Systems from Oracle to Amazon DynamoDB (DA...
 
Tailor-Made SaaS: Multi-Tenant Customizations with AWS Lambda (GPSCT311) - AW...
Tailor-Made SaaS: Multi-Tenant Customizations with AWS Lambda (GPSCT311) - AW...Tailor-Made SaaS: Multi-Tenant Customizations with AWS Lambda (GPSCT311) - AW...
Tailor-Made SaaS: Multi-Tenant Customizations with AWS Lambda (GPSCT311) - AW...
 
Cloud Data Migration with Amazon EBS (CMP406-R2) - AWS re:Invent 2018
Cloud Data Migration with Amazon EBS (CMP406-R2) - AWS re:Invent 2018Cloud Data Migration with Amazon EBS (CMP406-R2) - AWS re:Invent 2018
Cloud Data Migration with Amazon EBS (CMP406-R2) - AWS re:Invent 2018
 
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018
 
Customizing Data Lakes to Work for Your Enterprise with Sysco (STG340) - AWS ...
Customizing Data Lakes to Work for Your Enterprise with Sysco (STG340) - AWS ...Customizing Data Lakes to Work for Your Enterprise with Sysco (STG340) - AWS ...
Customizing Data Lakes to Work for Your Enterprise with Sysco (STG340) - AWS ...
 
Scalable Multi-Node Deep Learning Training in the Cloud (CMP368-R1) - AWS re:...
Scalable Multi-Node Deep Learning Training in the Cloud (CMP368-R1) - AWS re:...Scalable Multi-Node Deep Learning Training in the Cloud (CMP368-R1) - AWS re:...
Scalable Multi-Node Deep Learning Training in the Cloud (CMP368-R1) - AWS re:...
 
Migrating Open Source Databases from Amazon EC2 to Aurora PostgreSQL (DAT339)...
Migrating Open Source Databases from Amazon EC2 to Aurora PostgreSQL (DAT339)...Migrating Open Source Databases from Amazon EC2 to Aurora PostgreSQL (DAT339)...
Migrating Open Source Databases from Amazon EC2 to Aurora PostgreSQL (DAT339)...
 
Workshop: Architecting a Serverless Data Lake
Workshop: Architecting a Serverless Data LakeWorkshop: Architecting a Serverless Data Lake
Workshop: Architecting a Serverless Data Lake
 
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT332) - AWS re:Inv...
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT332) - AWS re:Inv...Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT332) - AWS re:Inv...
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT332) - AWS re:Inv...
 
Building Advanced Workflows with AWS Glue (ANT372) - AWS re:Invent 2018
Building Advanced Workflows with AWS Glue (ANT372) - AWS re:Invent 2018Building Advanced Workflows with AWS Glue (ANT372) - AWS re:Invent 2018
Building Advanced Workflows with AWS Glue (ANT372) - AWS re:Invent 2018
 

Similar a Accelerate Data Analytics at Scale with Amazon EMR

Accelerate Analytics at Scale with Amazon EMR - AWS Summit Sydney 2018
Accelerate Analytics at Scale with Amazon EMR - AWS Summit Sydney 2018Accelerate Analytics at Scale with Amazon EMR - AWS Summit Sydney 2018
Accelerate Analytics at Scale with Amazon EMR - AWS Summit Sydney 2018Amazon Web Services
 
Data freedom: come migrare i carichi di lavoro Big Data su AWS
Data freedom: come migrare i carichi di lavoro Big Data su AWSData freedom: come migrare i carichi di lavoro Big Data su AWS
Data freedom: come migrare i carichi di lavoro Big Data su AWSAmazon Web Services
 
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...Amazon Web Services
 
One Data Lake, Many Uses: Enabling Multi-Tenant Analytics with Amazon EMR (AN...
One Data Lake, Many Uses: Enabling Multi-Tenant Analytics with Amazon EMR (AN...One Data Lake, Many Uses: Enabling Multi-Tenant Analytics with Amazon EMR (AN...
One Data Lake, Many Uses: Enabling Multi-Tenant Analytics with Amazon EMR (AN...Amazon Web Services
 
Scaling from zero to millions of users
Scaling from zero to millions of usersScaling from zero to millions of users
Scaling from zero to millions of usersAmazon Web Services
 
Serverless Architectural Patterns: Collision 2018
Serverless Architectural Patterns: Collision 2018Serverless Architectural Patterns: Collision 2018
Serverless Architectural Patterns: Collision 2018Amazon Web Services
 
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018Amazon Web Services
 
Databases - EBC on the road Brazil Edition [Portuguese]
Databases - EBC on the road Brazil Edition [Portuguese]Databases - EBC on the road Brazil Edition [Portuguese]
Databases - EBC on the road Brazil Edition [Portuguese]Amazon Web Services
 
Running Lean Architectures: How to Optimize for Cost Efficiency (ARC202-R2) -...
Running Lean Architectures: How to Optimize for Cost Efficiency (ARC202-R2) -...Running Lean Architectures: How to Optimize for Cost Efficiency (ARC202-R2) -...
Running Lean Architectures: How to Optimize for Cost Efficiency (ARC202-R2) -...Amazon Web Services
 
2018 10-19-jc conf-embrace-legacy-java-ee-by-aws-serverless
2018 10-19-jc conf-embrace-legacy-java-ee-by-aws-serverless2018 10-19-jc conf-embrace-legacy-java-ee-by-aws-serverless
2018 10-19-jc conf-embrace-legacy-java-ee-by-aws-serverlessKim Kao
 
Vonage & Aspect: Transform Real-Time Communications & Customer Engagement (TL...
Vonage & Aspect: Transform Real-Time Communications & Customer Engagement (TL...Vonage & Aspect: Transform Real-Time Communications & Customer Engagement (TL...
Vonage & Aspect: Transform Real-Time Communications & Customer Engagement (TL...Amazon Web Services
 
Serverless on AWS: Architectural Patterns and Best Practices
Serverless on AWS: Architectural Patterns and Best PracticesServerless on AWS: Architectural Patterns and Best Practices
Serverless on AWS: Architectural Patterns and Best PracticesVladimir Simek
 
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...Amazon Web Services
 
Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...
Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...
Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...Amazon Web Services
 
AWS Lambda use cases and best practices - Builders Day Israel
AWS Lambda use cases and best practices - Builders Day IsraelAWS Lambda use cases and best practices - Builders Day Israel
AWS Lambda use cases and best practices - Builders Day IsraelAmazon Web Services
 
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018Amazon Web Services
 
Serverless Architectural Patterns and Best Practices
Serverless Architectural Patterns and Best PracticesServerless Architectural Patterns and Best Practices
Serverless Architectural Patterns and Best PracticesAmazon Web Services
 

Similar a Accelerate Data Analytics at Scale with Amazon EMR (20)

Accelerate Analytics at Scale with Amazon EMR - AWS Summit Sydney 2018
Accelerate Analytics at Scale with Amazon EMR - AWS Summit Sydney 2018Accelerate Analytics at Scale with Amazon EMR - AWS Summit Sydney 2018
Accelerate Analytics at Scale with Amazon EMR - AWS Summit Sydney 2018
 
Data freedom: come migrare i carichi di lavoro Big Data su AWS
Data freedom: come migrare i carichi di lavoro Big Data su AWSData freedom: come migrare i carichi di lavoro Big Data su AWS
Data freedom: come migrare i carichi di lavoro Big Data su AWS
 
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
 
One Data Lake, Many Uses: Enabling Multi-Tenant Analytics with Amazon EMR (AN...
One Data Lake, Many Uses: Enabling Multi-Tenant Analytics with Amazon EMR (AN...One Data Lake, Many Uses: Enabling Multi-Tenant Analytics with Amazon EMR (AN...
One Data Lake, Many Uses: Enabling Multi-Tenant Analytics with Amazon EMR (AN...
 
Scaling from zero to millions of users
Scaling from zero to millions of usersScaling from zero to millions of users
Scaling from zero to millions of users
 
Serverless Architectural Patterns: Collision 2018
Serverless Architectural Patterns: Collision 2018Serverless Architectural Patterns: Collision 2018
Serverless Architectural Patterns: Collision 2018
 
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
 
Databases - EBC on the road Brazil Edition [Portuguese]
Databases - EBC on the road Brazil Edition [Portuguese]Databases - EBC on the road Brazil Edition [Portuguese]
Databases - EBC on the road Brazil Edition [Portuguese]
 
Running Lean Architectures: How to Optimize for Cost Efficiency (ARC202-R2) -...
Running Lean Architectures: How to Optimize for Cost Efficiency (ARC202-R2) -...Running Lean Architectures: How to Optimize for Cost Efficiency (ARC202-R2) -...
Running Lean Architectures: How to Optimize for Cost Efficiency (ARC202-R2) -...
 
2018 10-19-jc conf-embrace-legacy-java-ee-by-aws-serverless
2018 10-19-jc conf-embrace-legacy-java-ee-by-aws-serverless2018 10-19-jc conf-embrace-legacy-java-ee-by-aws-serverless
2018 10-19-jc conf-embrace-legacy-java-ee-by-aws-serverless
 
Vonage & Aspect: Transform Real-Time Communications & Customer Engagement (TL...
Vonage & Aspect: Transform Real-Time Communications & Customer Engagement (TL...Vonage & Aspect: Transform Real-Time Communications & Customer Engagement (TL...
Vonage & Aspect: Transform Real-Time Communications & Customer Engagement (TL...
 
Serverless on AWS: Architectural Patterns and Best Practices
Serverless on AWS: Architectural Patterns and Best PracticesServerless on AWS: Architectural Patterns and Best Practices
Serverless on AWS: Architectural Patterns and Best Practices
 
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...
 
Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...
Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...
Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...
 
AWS Lambda use cases and best practices - Builders Day Israel
AWS Lambda use cases and best practices - Builders Day IsraelAWS Lambda use cases and best practices - Builders Day Israel
AWS Lambda use cases and best practices - Builders Day Israel
 
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
 
Oracle on AWS
Oracle on AWSOracle on AWS
Oracle on AWS
 
Migrating database to cloud
Migrating database to cloudMigrating database to cloud
Migrating database to cloud
 
Oracle on AWS
Oracle on AWSOracle on AWS
Oracle on AWS
 
Serverless Architectural Patterns and Best Practices
Serverless Architectural Patterns and Best PracticesServerless Architectural Patterns and Best Practices
Serverless Architectural Patterns and Best Practices
 

Más de Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Más de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Accelerate Data Analytics at Scale with Amazon EMR

  • 1. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Brad Staszcuk Solution Architect – Amazon Web Services Dr Haydn Read Head of Infrastructure Programmes - Auckland Council Accelerate Analytics At Scale With Amazon EMR
  • 2. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. What Is Amazon EMR? Low cost Pay per-second Open-source variety Latest versions of software Secure Easy to enable options Flexible Full customisation and control Easy to use Launch a cluster in minutes Managed Spend less time monitoring Flink
  • 3. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Open-Source Applications Amazon EMR service
  • 4. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Use The AWS Glue Data Catalog • Support for Spark, Hive and Presto • Auto-generate schema and partitions • Managed table updates
  • 5. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Hbase For Random Access At Massive Scale HDFS (data node) Local Disk HBase region server YARN node manager
  • 6. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Tips To Lower Your Costs
  • 7. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Transient Or Long-Running Clusters • Shut your clusters down when you don’t need them • Use Amazon Linux AMI with preinstalled customisations for faster startup • Use Auto Scaling to minimise costs for long-running clusters
  • 8. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Amazon EC2 Spot And Instance Fleets • EMR will select optimal EC2 AZ • Provision across instance types • Switch to on-demand
  • 9. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Use Auto Scaling Scaling options Threshold CloudWatch or custom metric
  • 10. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Auto Scaling • EMR scales-in at YARN task completion • Selectively removes nodes with no running tasks • yarn.resourcemanager.decommissioning.timeout • Default timeout is one hour • Spark scale-in contributions • Spark specific blacklisting of tasks • Unregistering cached data and shuffle blocks • Advanced error handling
  • 11. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Tips To Secure Your Cluster
  • 12. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Encryption • Spark • Tez • MapReduce • Presto • HBase • Hive • Pig
  • 13. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Authentication LDAP HiveServer2 Presto Coordinator Spark Thrift Server Hue Server Zeppelin Server AWS credentials Amazon EMR Step (EMR API) EC2 key pair SSH as “hadoop”
  • 14. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. New – Authentication With Kerberos Microsoft Active Directory KDC Users YARN RMDoAs Service principals for all cluster nodes Master Node
  • 15. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Authorisation • Storage-based • EMRFS/S3 • HDFS • HiveServer2 and Presto (SQL-based) • HBase • YARN queues • Fine-grained access control by cluster tag (IAM) • Apache Ranger on edge node (using AWS CloudFormation)
  • 16. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Configure AWS IAM Roles For EMRFS Requests Context User: aduser Group: analyst IAM role: analytics_prod Can map IAM roles to user, group, or S3 prefix Context User: aduser2 Group: dev IAM role: analytics_dev
  • 17. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Introducing… Dr Haydn Read, Auckland Council
  • 18. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Thank You craigar@amazon.com https://aws.amazon.com/emr/