SlideShare una empresa de Scribd logo
1 de 47
1© Cloudera, Inc. All rights reserved.
Effective Spark on Multi-Tenant
Clusters
Kostas Sakellis
2© Cloudera, Inc. All rights reserved.
Me
• Spark Tech Lead Manager at Cloudera
• Contributed to Apache Spark
• Previously, stint on Cloudera Manager
3© Cloudera, Inc. All rights reserved.
Challenges
• Predictable execution time of Spark jobs
• Prevent Starvation
• Optimal cluster utilization
• Secure Data access
• Configuration Management
4© Cloudera, Inc. All rights reserved.
Spark on YARN
5© Cloudera, Inc. All rights reserved.
Why YARN?
• Spark supports pluggable Cluster Managers
• local, Standalone, YARN and Mesos
• YARN contains proper resource manager
• Enables multi-platform jobs
• Spark on YARN is mature with active community
6© Cloudera, Inc. All rights reserved.
Running an application
spark-submit --master yarn-cluster
--executor-memory 2g
--num-executors 3
--num-cores 2
<your-class>
7© Cloudera, Inc. All rights reserved.
Host-b.mydomain.com
System Architecture
host-a.mydomain.com
Resource Manager
Node Manager
Host-c.mydomain.com
Node Manager
Node Manager
Container
App Master
Exec2
Exec1
Exec3
Driver
Driver
Exec1 Exec2
8© Cloudera, Inc. All rights reserved.
Gotchas
• Ensure compatible YARN configuration
• yarn.nodemanager.resource.[memory-mb|cpu-vcores]
• yarn.scheduler.maximum-allocation-[vcores|mb]
• ...
• Remember overhead memory
• spark.yarn.executor.memoryOverhead
• Default of 10% since Spark 1.4
9© Cloudera, Inc. All rights reserved.
Container
[pid=63375,containerID=container_1388158490598_0001_01_00
0003] is running beyond physical memory limits. Current
usage: 2.1 GB of 2 GB physical memory used; 2.8 GB of 4.2
GB virtual memory used. Killing container.
[...]
Otherwise…
10© Cloudera, Inc. All rights reserved.
Container
[pid=63375,containerID=container_1388158490598_0001_01_00
0003] is running beyond physical memory limits. Current
usage: 2.1 GB of 2 GB physical memory used; 2.8 GB of 4.2
GB virtual memory used. Killing container.
[...]
Otherwise…
11© Cloudera, Inc. All rights reserved.
Host-b.mydomain.com
System Architecture
host-a.mydomain.com
Resource Manager
Node Manager
Host-c.mydomain.com
Node Manager
Node Manager
Exec2
Exec1
Exec3
Driver
Driver
Exec1 Exec2
Exec3
Exec2
Exec1
Driver
12© Cloudera, Inc. All rights reserved.
How do we share
a common
resource?
Courtesy of: https://radioglobalistic.files.wordpress.com/2011/02/lagos-traffic.jpg
13© Cloudera, Inc. All rights reserved.
Resource Management
• YARN has ability to create resource queues
• Priorities can be set per queues
• Preemption is also available
• Fixed in Spark 1.6 (SPARK-8167)
• yarn.scheduler.fair.preemption
14© Cloudera, Inc. All rights reserved.
Running an application
spark-submit --master yarn-cluster
--queue my-special-queue
--executor-memory 2g
--num-executors 3
--num-cores 2
<your-class>
15© Cloudera, Inc. All rights reserved.
How about
locality?
Courtesy of: https://radioglobalistic.files.wordpress.com/2011/02/lagos-traffic.jpgCourtesy of: https://blog.voxbone.com/wp-content/uploads/2015/07/think-global-act-local.jpg
16© Cloudera, Inc. All rights reserved.
ExecutorExecutor
Task Scheduling
Driver Executor
DAG Scheduler
Task Scheduler
Core
TaskTask
Shuffle
Shuffle
stagestageStage
Spark Context JobJobJob
17© Cloudera, Inc. All rights reserved.
Host-b.mydomain.com
Locality
host-a.mydomain.com
Resource Manager
Node Manager
HDFS
x:B1 x:B2 y:B1 y:B3
Host-c.mydomain.com
Node Manager
Node Manager
HDFS
x:B3 x:B2 y:B2 y:B3
HDFS
x:B3 x:B1 y:B1 y:B2
hdfs://x
hdfs://y
Exec2
Exec1Driver
18© Cloudera, Inc. All rights reserved.
Spark creates executors before
executing code!
19© Cloudera, Inc. All rights reserved.
Underutilized
Clusters
Courtesy of: http://media.nbclosangeles.com/images/1200*675/60-freeway-repair-dec16-2-empty.JPG
20© Cloudera, Inc. All rights reserved.
Dynamic Allocation
• Spark applications scale the number of executors based on load
• Removes need for: --num-executors
• Idle executors get killed
• First supported in CDH 5.4
• Ideal for:
• Long ETL jobs with large shuffles
• shell applications: hive and spark shell
21© Cloudera, Inc. All rights reserved.
Task Scheduling
Driver
DAG Scheduler
Task Scheduler
stagestageStage
Spark Context JobJobJob
host-a.mydomain.com
Node Manager
Exec1
host-b.mydomain.com
Node Manager
Exec2
host-c.mydomain.com
Node Manager
Task
Task
Exec3
Task
Task
RM
22© Cloudera, Inc. All rights reserved.
Dynamic Allocation Configuration
• Many Knobs
• spark.dynamicAllocation.enabled
• spark.dynamicAllocation.[min|max|initial]Executors
• spark.dynamicAllocation.executorIdleTimeout
• spark.dynamicAllocation.cachedExecutorIdleTimeout
• ...
• --num-executors will disable dynamic allocation
23© Cloudera, Inc. All rights reserved.
Dynamic Allocation Limitations
• Still required to specify cores
• --num-cores
• Memory
• --executor-memory
• Includes JVM overhead
• Caching
• spark.dynamicAllocation.cachedExecutorIdleTimeout
24© Cloudera, Inc. All rights reserved.
The Future of Dynamic Allocation
• Only “task size” needed: --task-size
• Eliminates
• --num-cores
• --num-executors
• --executor-memory
• Leads to better cluster utilization
25© Cloudera, Inc. All rights reserved.
Dynamic Allocation respects
Locality!
26© Cloudera, Inc. All rights reserved.
Security, oh no!
Courtesy of: https://www.iti.illinois.edu/sites/default/files/Cybersecurity_image.jpg
27© Cloudera, Inc. All rights reserved.
Security
• Shared resources -> Shared data
• Security has many facets
• Encryption
• Authentication
• Authorization
• Encryption is interesting for multi-tenant clusters
28© Cloudera, Inc. All rights reserved.
Encryption
Who’s looking at the data?
29© Cloudera, Inc. All rights reserved.
Data Flow in Spark
Driver
Executor
Executor
Spark
Submit
Control Plane
File Distribution
Shuffle Blocks
UI
Disk
Disk
Spilled/Shuffle Blocks
30© Cloudera, Inc. All rights reserved.
Prior to Spark 1.6
• Different channel, different method
• Control plane
• File distribution
• Shuffle Blocks
• User UI / REST API
• Spilled/Shuffle Blocks
SSL
SSL
SASL Encryption
No Encryption
Use encrypfs (or equivalent)
31© Cloudera, Inc. All rights reserved.
What is wrong with SSL?
32© Cloudera, Inc. All rights reserved.
Why not SSL?
• SSL can be hard to set up
• Need certificates readable on every node
• Sharing certificates not as secure
• Hard to have per-user certificate
33© Cloudera, Inc. All rights reserved.
Spark 1.6
• Standardize around a common transport library
• Replaces Akka RPC (SPARK-6028)
• Replaces HTTP File service (SPARK-11140)
• Uses Netty transport library with SASL Encryption
• But..
• WebUI still has no encryption
• Shuffle / Spilled blocks still require FS-level encryption
• SASL in JVM restricted to 3DES – not very strong and slow
34© Cloudera, Inc. All rights reserved.
Spark 2.0
• REPL class distribution using transport lib (SPARK-11563)
• HTTPS Support for WebUI (SPARK-2750)
• Encrypting spilled blocks is almost available (SPARK-5682)
• Depends on third party Chimera library for encryption
• Work is being done to add Chimera to Apache Commons
• Future:
• Use Chimera to encrypt over-the-wire data
35© Cloudera, Inc. All rights reserved.
Gateways:
launching Spark
Application
Courtesy of:
36© Cloudera, Inc. All rights reserved.
Host-b.mydomain.com
Spark Gateway
Resource Manager
Host-c.mydomain.com
Node Manager
Node Manager
gateway-a.mydomain.com
Bob Client
Client
Configs
Spark
Install
Random
Ports
Driver
Exec1 Exec2
Exec1 Driver
SSH
37© Cloudera, Inc. All rights reserved.
Gateway Considerations
• Gateway hosts actively managed by administrators
• Updates to client configurations and Spark installs
• Users need to tunnel into network
• Difficult to put users behind firewall
• YARN allows different Spark versions
• spark.yarn.jar or spark.yarn.archive
• Shared Spark services makes this difficult
38© Cloudera, Inc. All rights reserved.
Host-b.mydomain.com
Shared Services
Resource Manager
Host-c.mydomain.com
Node Manager
Node Manager
gateway-a.mydomain.com
Bob Client
Client
Configs
Spark
Install
Random
Ports
Driver
Exec1 Exec2
Exec1 Driver
SSH
S
S
S
S
History
Service
39© Cloudera, Inc. All rights reserved.
Alternative
An open source Apache licensed REST web service that manages
long running Spark contexts in your cluster
40© Cloudera, Inc. All rights reserved.
Livy Architecture
Rest
Server
Cluster Manager
Driver ExecutorExecutor
Client
Driver ExecutorExecutor
The Managed ClusterHTTP
Context 1
Context 2
Context 2
Context 1
41© Cloudera, Inc. All rights reserved.
Case 1: Spark Application JAR Submission
• Enables spark applications to be submitted without needing a
Spark installation
• Basically a wrapper around spark-submit
% curl –XPOST localhost:8998/batches -d
'{
"file": "<path_to_file>",
“className”: “com.foo.bar..”
...
}'
42© Cloudera, Inc. All rights reserved.
How do you retrieve results?
43© Cloudera, Inc. All rights reserved.
Case 2: Fine grained Job submission
• Programmatic submission of Spark jobs to a long running
application
• A thin Java (and Scala) client available for easier integration
• Provides automatic serialization/deserialization
• Enables Web/Mobile applications to use Spark as a backend
44© Cloudera, Inc. All rights reserved.
Case 2: Example
// Create Livy Client
LivyClient client = new LivyClientBuilder(false)
.setURI(new URI(”<uri>"))
.setAll(<config>)
.build()
// JobHandle allows monitoring of jobs
JobHandle<Long> handle = client.submit(new YourJob());
// Block until results are returned
handle.get(TIMEOUT, TimeUnit.SECONDS)
// Close connections
client.stop()
45© Cloudera, Inc. All rights reserved.
Case 2: Example
private static class YourJob implements Job<Long> {
@Override
public Long call(JobContext jc) {
ArrayList<Long> list = Arrays.asList(1, 2, 3, 4, 5);
JavaRDD<Integer> rdd = jc.sc().parallelize(list);
return rdd.count();
}
}
// Job Interface to Implement
public interface Job<T> extends Serializable {
T call(JobContext jc) throws Exception;
}
46© Cloudera, Inc. All rights reserved.
Contributions Welcome!
• http://livy.io/
• Code: https://github.com/cloudera/livy
• JIRA: https://issues.cloudera.org/browse/LIVY
• Users: http://groups.google.com/a/cloudera.org/group/livy-user
• Dev: http://groups.google.com/a/cloudera.org/group/livy-dev
47© Cloudera, Inc. All rights reserved.
Thank you

Más contenido relacionado

La actualidad más candente

Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
DataWorks Summit
 
Managing 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with AmbariManaging 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with Ambari
DataWorks Summit
 

La actualidad más candente (20)

Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobs
 
Parallelizing with Apache Spark in Unexpected Ways
Parallelizing with Apache Spark in Unexpected WaysParallelizing with Apache Spark in Unexpected Ways
Parallelizing with Apache Spark in Unexpected Ways
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
 
Kafka: Internals
Kafka: InternalsKafka: Internals
Kafka: Internals
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Data Pipelines with Kafka Connect
Data Pipelines with Kafka ConnectData Pipelines with Kafka Connect
Data Pipelines with Kafka Connect
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEA
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the Covers
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIs
 
Managing 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with AmbariManaging 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with Ambari
 
Hardening Kafka Replication
Hardening Kafka Replication Hardening Kafka Replication
Hardening Kafka Replication
 
How to Extend Apache Spark with Customized Optimizations
How to Extend Apache Spark with Customized OptimizationsHow to Extend Apache Spark with Customized Optimizations
How to Extend Apache Spark with Customized Optimizations
 
Getting Started with Confluent Schema Registry
Getting Started with Confluent Schema RegistryGetting Started with Confluent Schema Registry
Getting Started with Confluent Schema Registry
 

Similar a Effective Spark on Multi-Tenant Clusters

Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Cloudera, Inc.
 

Similar a Effective Spark on Multi-Tenant Clusters (20)

Apache Spark Operations
Apache Spark OperationsApache Spark Operations
Apache Spark Operations
 
Building Efficient Pipelines in Apache Spark
Building Efficient Pipelines in Apache SparkBuilding Efficient Pipelines in Apache Spark
Building Efficient Pipelines in Apache Spark
 
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and KuduBuilding Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
 
Getting Apache Spark Customers to Production
Getting Apache Spark Customers to ProductionGetting Apache Spark Customers to Production
Getting Apache Spark Customers to Production
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform Webinar
 
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for productionFaster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
 
Benchmarking Solr Performance
Benchmarking Solr PerformanceBenchmarking Solr Performance
Benchmarking Solr Performance
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
 
Kafka for DBAs
Kafka for DBAsKafka for DBAs
Kafka for DBAs
 
Yarns about YARN: Migrating to MapReduce v2
Yarns about YARN: Migrating to MapReduce v2Yarns about YARN: Migrating to MapReduce v2
Yarns about YARN: Migrating to MapReduce v2
 
Securing Spark Applications
Securing Spark ApplicationsSecuring Spark Applications
Securing Spark Applications
 
Decoupling Decisions with Apache Kafka
Decoupling Decisions with Apache KafkaDecoupling Decisions with Apache Kafka
Decoupling Decisions with Apache Kafka
 
Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARN
 
Breaking Spark: Top 5 mistakes to avoid when using Apache Spark in production
Breaking Spark: Top 5 mistakes to avoid when using Apache Spark in productionBreaking Spark: Top 5 mistakes to avoid when using Apache Spark in production
Breaking Spark: Top 5 mistakes to avoid when using Apache Spark in production
 
Chicago spark meetup-april2017-public
Chicago spark meetup-april2017-publicChicago spark meetup-april2017-public
Chicago spark meetup-april2017-public
 
Data Science and Machine Learning for the Enterprise
Data Science and Machine Learning for the EnterpriseData Science and Machine Learning for the Enterprise
Data Science and Machine Learning for the Enterprise
 
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale ToolkitDeploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
 
OpenStack for devops environment
OpenStack for devops environment OpenStack for devops environment
OpenStack for devops environment
 
Event Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaEvent Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache Kafka
 

Más de DataWorks Summit/Hadoop Summit

How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
 

Más de DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Effective Spark on Multi-Tenant Clusters

  • 1. 1© Cloudera, Inc. All rights reserved. Effective Spark on Multi-Tenant Clusters Kostas Sakellis
  • 2. 2© Cloudera, Inc. All rights reserved. Me • Spark Tech Lead Manager at Cloudera • Contributed to Apache Spark • Previously, stint on Cloudera Manager
  • 3. 3© Cloudera, Inc. All rights reserved. Challenges • Predictable execution time of Spark jobs • Prevent Starvation • Optimal cluster utilization • Secure Data access • Configuration Management
  • 4. 4© Cloudera, Inc. All rights reserved. Spark on YARN
  • 5. 5© Cloudera, Inc. All rights reserved. Why YARN? • Spark supports pluggable Cluster Managers • local, Standalone, YARN and Mesos • YARN contains proper resource manager • Enables multi-platform jobs • Spark on YARN is mature with active community
  • 6. 6© Cloudera, Inc. All rights reserved. Running an application spark-submit --master yarn-cluster --executor-memory 2g --num-executors 3 --num-cores 2 <your-class>
  • 7. 7© Cloudera, Inc. All rights reserved. Host-b.mydomain.com System Architecture host-a.mydomain.com Resource Manager Node Manager Host-c.mydomain.com Node Manager Node Manager Container App Master Exec2 Exec1 Exec3 Driver Driver Exec1 Exec2
  • 8. 8© Cloudera, Inc. All rights reserved. Gotchas • Ensure compatible YARN configuration • yarn.nodemanager.resource.[memory-mb|cpu-vcores] • yarn.scheduler.maximum-allocation-[vcores|mb] • ... • Remember overhead memory • spark.yarn.executor.memoryOverhead • Default of 10% since Spark 1.4
  • 9. 9© Cloudera, Inc. All rights reserved. Container [pid=63375,containerID=container_1388158490598_0001_01_00 0003] is running beyond physical memory limits. Current usage: 2.1 GB of 2 GB physical memory used; 2.8 GB of 4.2 GB virtual memory used. Killing container. [...] Otherwise…
  • 10. 10© Cloudera, Inc. All rights reserved. Container [pid=63375,containerID=container_1388158490598_0001_01_00 0003] is running beyond physical memory limits. Current usage: 2.1 GB of 2 GB physical memory used; 2.8 GB of 4.2 GB virtual memory used. Killing container. [...] Otherwise…
  • 11. 11© Cloudera, Inc. All rights reserved. Host-b.mydomain.com System Architecture host-a.mydomain.com Resource Manager Node Manager Host-c.mydomain.com Node Manager Node Manager Exec2 Exec1 Exec3 Driver Driver Exec1 Exec2 Exec3 Exec2 Exec1 Driver
  • 12. 12© Cloudera, Inc. All rights reserved. How do we share a common resource? Courtesy of: https://radioglobalistic.files.wordpress.com/2011/02/lagos-traffic.jpg
  • 13. 13© Cloudera, Inc. All rights reserved. Resource Management • YARN has ability to create resource queues • Priorities can be set per queues • Preemption is also available • Fixed in Spark 1.6 (SPARK-8167) • yarn.scheduler.fair.preemption
  • 14. 14© Cloudera, Inc. All rights reserved. Running an application spark-submit --master yarn-cluster --queue my-special-queue --executor-memory 2g --num-executors 3 --num-cores 2 <your-class>
  • 15. 15© Cloudera, Inc. All rights reserved. How about locality? Courtesy of: https://radioglobalistic.files.wordpress.com/2011/02/lagos-traffic.jpgCourtesy of: https://blog.voxbone.com/wp-content/uploads/2015/07/think-global-act-local.jpg
  • 16. 16© Cloudera, Inc. All rights reserved. ExecutorExecutor Task Scheduling Driver Executor DAG Scheduler Task Scheduler Core TaskTask Shuffle Shuffle stagestageStage Spark Context JobJobJob
  • 17. 17© Cloudera, Inc. All rights reserved. Host-b.mydomain.com Locality host-a.mydomain.com Resource Manager Node Manager HDFS x:B1 x:B2 y:B1 y:B3 Host-c.mydomain.com Node Manager Node Manager HDFS x:B3 x:B2 y:B2 y:B3 HDFS x:B3 x:B1 y:B1 y:B2 hdfs://x hdfs://y Exec2 Exec1Driver
  • 18. 18© Cloudera, Inc. All rights reserved. Spark creates executors before executing code!
  • 19. 19© Cloudera, Inc. All rights reserved. Underutilized Clusters Courtesy of: http://media.nbclosangeles.com/images/1200*675/60-freeway-repair-dec16-2-empty.JPG
  • 20. 20© Cloudera, Inc. All rights reserved. Dynamic Allocation • Spark applications scale the number of executors based on load • Removes need for: --num-executors • Idle executors get killed • First supported in CDH 5.4 • Ideal for: • Long ETL jobs with large shuffles • shell applications: hive and spark shell
  • 21. 21© Cloudera, Inc. All rights reserved. Task Scheduling Driver DAG Scheduler Task Scheduler stagestageStage Spark Context JobJobJob host-a.mydomain.com Node Manager Exec1 host-b.mydomain.com Node Manager Exec2 host-c.mydomain.com Node Manager Task Task Exec3 Task Task RM
  • 22. 22© Cloudera, Inc. All rights reserved. Dynamic Allocation Configuration • Many Knobs • spark.dynamicAllocation.enabled • spark.dynamicAllocation.[min|max|initial]Executors • spark.dynamicAllocation.executorIdleTimeout • spark.dynamicAllocation.cachedExecutorIdleTimeout • ... • --num-executors will disable dynamic allocation
  • 23. 23© Cloudera, Inc. All rights reserved. Dynamic Allocation Limitations • Still required to specify cores • --num-cores • Memory • --executor-memory • Includes JVM overhead • Caching • spark.dynamicAllocation.cachedExecutorIdleTimeout
  • 24. 24© Cloudera, Inc. All rights reserved. The Future of Dynamic Allocation • Only “task size” needed: --task-size • Eliminates • --num-cores • --num-executors • --executor-memory • Leads to better cluster utilization
  • 25. 25© Cloudera, Inc. All rights reserved. Dynamic Allocation respects Locality!
  • 26. 26© Cloudera, Inc. All rights reserved. Security, oh no! Courtesy of: https://www.iti.illinois.edu/sites/default/files/Cybersecurity_image.jpg
  • 27. 27© Cloudera, Inc. All rights reserved. Security • Shared resources -> Shared data • Security has many facets • Encryption • Authentication • Authorization • Encryption is interesting for multi-tenant clusters
  • 28. 28© Cloudera, Inc. All rights reserved. Encryption Who’s looking at the data?
  • 29. 29© Cloudera, Inc. All rights reserved. Data Flow in Spark Driver Executor Executor Spark Submit Control Plane File Distribution Shuffle Blocks UI Disk Disk Spilled/Shuffle Blocks
  • 30. 30© Cloudera, Inc. All rights reserved. Prior to Spark 1.6 • Different channel, different method • Control plane • File distribution • Shuffle Blocks • User UI / REST API • Spilled/Shuffle Blocks SSL SSL SASL Encryption No Encryption Use encrypfs (or equivalent)
  • 31. 31© Cloudera, Inc. All rights reserved. What is wrong with SSL?
  • 32. 32© Cloudera, Inc. All rights reserved. Why not SSL? • SSL can be hard to set up • Need certificates readable on every node • Sharing certificates not as secure • Hard to have per-user certificate
  • 33. 33© Cloudera, Inc. All rights reserved. Spark 1.6 • Standardize around a common transport library • Replaces Akka RPC (SPARK-6028) • Replaces HTTP File service (SPARK-11140) • Uses Netty transport library with SASL Encryption • But.. • WebUI still has no encryption • Shuffle / Spilled blocks still require FS-level encryption • SASL in JVM restricted to 3DES – not very strong and slow
  • 34. 34© Cloudera, Inc. All rights reserved. Spark 2.0 • REPL class distribution using transport lib (SPARK-11563) • HTTPS Support for WebUI (SPARK-2750) • Encrypting spilled blocks is almost available (SPARK-5682) • Depends on third party Chimera library for encryption • Work is being done to add Chimera to Apache Commons • Future: • Use Chimera to encrypt over-the-wire data
  • 35. 35© Cloudera, Inc. All rights reserved. Gateways: launching Spark Application Courtesy of:
  • 36. 36© Cloudera, Inc. All rights reserved. Host-b.mydomain.com Spark Gateway Resource Manager Host-c.mydomain.com Node Manager Node Manager gateway-a.mydomain.com Bob Client Client Configs Spark Install Random Ports Driver Exec1 Exec2 Exec1 Driver SSH
  • 37. 37© Cloudera, Inc. All rights reserved. Gateway Considerations • Gateway hosts actively managed by administrators • Updates to client configurations and Spark installs • Users need to tunnel into network • Difficult to put users behind firewall • YARN allows different Spark versions • spark.yarn.jar or spark.yarn.archive • Shared Spark services makes this difficult
  • 38. 38© Cloudera, Inc. All rights reserved. Host-b.mydomain.com Shared Services Resource Manager Host-c.mydomain.com Node Manager Node Manager gateway-a.mydomain.com Bob Client Client Configs Spark Install Random Ports Driver Exec1 Exec2 Exec1 Driver SSH S S S S History Service
  • 39. 39© Cloudera, Inc. All rights reserved. Alternative An open source Apache licensed REST web service that manages long running Spark contexts in your cluster
  • 40. 40© Cloudera, Inc. All rights reserved. Livy Architecture Rest Server Cluster Manager Driver ExecutorExecutor Client Driver ExecutorExecutor The Managed ClusterHTTP Context 1 Context 2 Context 2 Context 1
  • 41. 41© Cloudera, Inc. All rights reserved. Case 1: Spark Application JAR Submission • Enables spark applications to be submitted without needing a Spark installation • Basically a wrapper around spark-submit % curl –XPOST localhost:8998/batches -d '{ "file": "<path_to_file>", “className”: “com.foo.bar..” ... }'
  • 42. 42© Cloudera, Inc. All rights reserved. How do you retrieve results?
  • 43. 43© Cloudera, Inc. All rights reserved. Case 2: Fine grained Job submission • Programmatic submission of Spark jobs to a long running application • A thin Java (and Scala) client available for easier integration • Provides automatic serialization/deserialization • Enables Web/Mobile applications to use Spark as a backend
  • 44. 44© Cloudera, Inc. All rights reserved. Case 2: Example // Create Livy Client LivyClient client = new LivyClientBuilder(false) .setURI(new URI(”<uri>")) .setAll(<config>) .build() // JobHandle allows monitoring of jobs JobHandle<Long> handle = client.submit(new YourJob()); // Block until results are returned handle.get(TIMEOUT, TimeUnit.SECONDS) // Close connections client.stop()
  • 45. 45© Cloudera, Inc. All rights reserved. Case 2: Example private static class YourJob implements Job<Long> { @Override public Long call(JobContext jc) { ArrayList<Long> list = Arrays.asList(1, 2, 3, 4, 5); JavaRDD<Integer> rdd = jc.sc().parallelize(list); return rdd.count(); } } // Job Interface to Implement public interface Job<T> extends Serializable { T call(JobContext jc) throws Exception; }
  • 46. 46© Cloudera, Inc. All rights reserved. Contributions Welcome! • http://livy.io/ • Code: https://github.com/cloudera/livy • JIRA: https://issues.cloudera.org/browse/LIVY • Users: http://groups.google.com/a/cloudera.org/group/livy-user • Dev: http://groups.google.com/a/cloudera.org/group/livy-dev
  • 47. 47© Cloudera, Inc. All rights reserved. Thank you

Notas del editor

  1. This shows up in the YARN NodeManager logs
  2. Allow multiple groups to access shared resources while ensuring some dedicated share of the resource
  3. Allow multiple groups to access shared resources while ensuring some dedicated share of the resource
  4. Spark makes building a proof of concept with a subset of data relatively easy.
  5. Every connection in the previous slide can transmit sensitive data! Input data transmitted via broadcast variables Computed data during shuffles Data in serialized tasks, files uploaded with the job How to prevent other users from seeing this data?
  6. Spark makes building a proof of concept with a subset of data relatively easy.