SlideShare una empresa de Scribd logo
1 de 33
Descargar para leer sin conexión
SCALABLE MONITORING
USING PROMETHEUS
WITH APACHE SPARK
Diane Feddema, Principal Software Engineer
Zak Hassan, Software Engineer
#Radanalytics
YOUR SPEAKERS
DIANE FEDDEMA
PRINCIPAL SOFTWARE ENGINEER - AI/ML CENTER OF EXCELLENCE, CTO OFFICE
● Currently focused on developing and applying Data Science and Machine Learning techniques for performance
analysis, automating these analyses and displaying data in novel ways.
● Previously worked as a performance engineer at the National Center for Atmospheric Research, NCAR, working on
optimizations and tuning in parallel global climate models.
ZAK HASSAN
SOFTWARE ENGINEER - AI/ML CENTER OF EXCELLENCE, CTO OFFICE
● Currently focused on developing analytics platform on OpenShift and leveraging Apache Spark as the analytics engine.
Also, developing data science apps and working on making metrics observable through cloud-native technology.
● Previously worked as a Software Consultant in the financial services and insurance industry, building end-to-end
software solutions for clients.
#Radanalytics
OVERVIEW
OBSERVABILITY
● Motivation
● What Is Spark?​
● What Is Prometheus?
● Our Story
● Spark Cluster JVM Instrumentation
PERFORMANCE TUNING
● Tuning Spark jobs
● Spark Memory Model
● Prometheus as a performance tool
● Comparing cached vs non-cached dataframes
● Demo
#Radanalytics
MOTIVATION
● Rapid experimentation with data science apps
● Identify bottlenecks
● Improve performance
● Resolve incidents more quickly
● Improving memory usage to tune spark jobs
#Radanalytics
OUR STORY
● Instrumented spark jvm to expose metrics in a kubernetes pod.
● Added ability to monitor spark with prometheus
● Experimented with using Grafana with Prometheus to provide more insight
● Sharing our experiments and experience with using this to do performance
analysis of spark jobs.
● Demo at the very end
June 1, 2017 - https://github.com/radanalyticsio/openshift-spark/pull/28
- Added agent to report jolokia metrics endpoint in kubernetes pod
Nov 7, 2017 - https://github.com/radanalyticsio/openshift-spark/pull/35
- Added agent to report prometheus metrics endpoint in kubernetes pod
#Radanalytics
WHAT IS PROMETHEUS
● Open source monitoring
● in 2016 prometheus become the 2nd member of the CNCF
● scrapes metrics from a endpoint.
● Client libraries in Go, Java, Python, etc.
● Kubernetes comes instrumented out of the box with prometheus endpoints.
● If you don’t have native integration with prometheus there are lots of community
exporters that allow lots of things to expose metrics in your infrastructure to get
monitored.
#Radanalytics
WHAT IS APACHE SPARK
Apache Spark is an in-demand data processing engine with a thriving community and
steadily growing install base
● Supports interactive data exploration in addition to apps
● Batch and stream processing
● Machine learning libraries
● Distributed
● Separate storage and compute ( in memory processing)
● new external scheduler kubernetes
#Radanalytics
SPARK FEATURES
• Can run standalone, with yarn, mesos or Kubernetes as the cluster manager
• Has language bindings for Java, Scala, Python, and R
• Access data from JDBC, HDFS, S3 or regular filesystem
• Can persist data in different data formats: parquet, avro, json, csv, etc.
SQL MLlib Graph Streaming
SPARK CORE
#Radanalytics
SPARK APPLICATION
#Radanalytics
SPARK IN CONTAINERS
#Radanalytics
SPARK CLUSTER INSTRUMENT
SPARK MASTER
JAVA AGENT
SPARK WORKER
JAVA AGENT
SPARK WORKER
JAVA AGENT
PROMETHEUS
ALERT MANAGER
Notify alertmanager
Scrapes metrics
#Radanalytics
INSTRUMENT JAVA AGENT
#Radanalytics
PROMETHEUS TARGETS
#Radanalytics
PULL METRICS
● Prometheus lets you configure how often to scrape and which endpoints to scrap.
The prometheus server will pull in the metrics that are configured.
#Radanalytics
ALERTMANAGER
• PromQL query is used to create rules to notify you if the rule is triggered.
• Currently alertmanager will receive the notification and is able to notify you via email,
slack or other options (see docs for details) .
#Radanalytics
PROMQL
● Powerful query language to get metrics on kubernetes cluster along with spark
clusters.
● What are gauges and counters?
Gauges: Latest value of metric
Counters: Total number of event occurrences. Might be suffix “*total”.
You can use this format to get the last minute prom_metric_total[1m]
#Radanalytics
PART 2: Tuning Spark jobs with
Prometheus
Things we would like to know when tuning Spark programs:
● How much memory is the driver using?
● How much memory are the workers using?
● How is the JVM begin utilized by spark?
● Is my spark job saturating the network?
● What is the cluster view of network, cpu and memory utilization?
We will demonstrate how Prometheus coupled with Grafana on Kubernetes
can help answer these types of questions.
#Radanalytics
Our Example Application
Focus on Memory:
Efficient Memory use is Key to good performance in Spark jobs.
How:
We will create Prometheus + Grafana dashboards to evaluate memory
usage under different conditions?
Example:
Our Spark Python example will compare memory usage with and without
caching to illustrate how memory usage and timing change for a
PySpark program performing a cartesian product followed by a groupby
operation
#Radanalytics
A little Background
Memory allocation in Spark
● Spark is an "in-memory" computing framework
● Memory is a limited resource!
● There is competition for memory
● Caching reusable results can save overall memory usage
under certain conditions
● Memory runs out in many large jobs forcing spills to disk
#Radanalytics
Spark Unified Memory Model
LRU eviction and user defined memory configuration options
Block
Block
Total JVM Heap Memory allocated to SPARK JOB
Memory allocated to
EXECUTION
Block
Block
Block
Block
Block
Block
Block
Block
Memory allocated to
STORAGE
Block
Block
Block
Block
Block
Block
Block
Block
Block
Block
Block
Block
Block
Block
Block
Block
Block
Block
Block
Block
Block
Block
Block
Block
Spill to
disk
Block
Block
Block
Block
Block
Block
Block
Block
?
Block
?
Spark.memory.storageFraction
User specified unevictable amount
Spill to
disk
EXECUTION takes precedence over
STORAGE up to user defined
unevictable amount
Block
?
Block
? Spill to
disk
#Radanalytics
Block
Using Spark SQL and Spark RDD
API together in a tuning exercise
We want to use Spark SQL to manipulate dataframes
Spark SQL is a component of Spark
● it provides structured data processing
● it is implemented as a library on top of Spark
APIs:
● SQL syntax
● Dataframes
● Datasets
Backend components:
● Catalyst - query optimizer
● Tungsten - off-heap memory management eliminates overhead of Java Objects
#Radanalytics
Performance Optimizations with Spark SQL
JDBC Console
User Programs
(Python, Scala, Java)
SPARK SQL
Catalyst Optimizer Dataframe API
Spark Core
Spark SQL performance benefits:
● Catalyst compiles Spark SQL programs down to an RDD
● Tungsten provides more efficient data storage compared
to Java objects on the heap
● Dataframe API and RDD API can be intermixed
RDDs
#Radanalytics
Using Prometheus + Grafana for
performance optimization
Specific code example:
Compare non-cached and cached dataframes that are reused in a
groupBy transformation
When is good idea to use cache in a dataframe?
● when a result of a computation is going to be reused later
● when it is costly to recompute that result
● in cases where algorithms make several passes over the data
#Radanalytics
Determining memory consumption for
dataframes you want to cache
#Radanalytics
Example: Code for non-cached run
rdd1 = RandomRDDs.normalVectorRDD(spark, nRow, nCol, numPartitions, seed)
seed = 3
rdd2 = RandomRDDs.normalVectorRDD(spark, nRow, nCol, numPartitions, seed)
sc = spark.sparkContext
# convert each tuple in the rdd to a row
randomNumberRdd1 = rdd1.map(lambda x: Row(A=float(x[0]), B=float(x[1]), C=float(x[2]), D=float(x[3])))
randomNumberRdd2 = rdd2.map(lambda x: Row(E=float(x[0]), F=float(x[1]), G=float(x[2]), H=float(x[3])))
# create dataframe from rdd
schemaRandomNumberDF1 = spark.createDataFrame(randomNumberRdd1)
schemaRandomNumberDF2 = spark.createDataFrame(randomNumberRdd2)
cross_df = schemaRandomNumberDF1.crossJoin(schemaRandomNumberDF2)
# aggregate
results = schemaRandomNumberDF1.groupBy("A").agg(func.max("B"),func.sum("C"))
results.show(n=100)
print "----------Count in cross-join--------------- {0}".format(cross_df.count())
#Radanalytics
Example: Code for cached run
rdd1 = RandomRDDs.normalVectorRDD(spark, nRow, nCol, numPartitions, seed)
seed = 3
rdd2 = RandomRDDs.normalVectorRDD(spark, nRow, nCol, numPartitions, seed)
sc = spark.sparkContext
# convert each tuple in the rdd to a row
randomNumberRdd1 = rdd1.map(lambda x: Row(A=float(x[0]), B=float(x[1]), C=float(x[2]), D=float(x[3])))
randomNumberRdd2 = rdd2.map(lambda x: Row(E=float(x[0]), F=float(x[1]), G=float(x[2]), H=float(x[3])))
# create dataframe from rdd
schemaRandomNumberDF1 = spark.createDataFrame(randomNumberRdd1)
schemaRandomNumberDF2 = spark.createDataFrame(randomNumberRdd2)
# cache the dataframe
schemaRandomNumberDF1.cache()
schemaRandomNumberDF2.cache()
cross_df = schemaRandomNumberDF1.crossJoin(schemaRandomNumberDF2)
# aggregate
results = schemaRandomNumberDF1.groupBy("A").agg(func.max("B"),func.sum("C"))
results.show(n=100)
print "----------Count in cross-join--------------- {0}".format(cross_df.count())
#Radanalytics
Query plan comparison
Non-Cached Cached
#Radanalytics
Example: Comparing cached vs
non-cached runs
Prometheus dashboard: non-cached Prometheus dashboard: cached
#Radanalytics
Prometheus dashboard: non-cached Prometheus dashboard: cached
#Radanalytics
Example: Comparing cached vs
non-cached runs
Comparing non-cached vs cached runs
RIP = 0.76
% Change = 76
RIP (Relative Index of Performance)
RIP: 0 to 1 = Improvement
0 to -1 = Degradation
% Change: negative values = Improvement
RIP = 0.63
% Change = 63
RIP = 0.62
% Change = 62
RIP = 0.63
% Change = 63
RIP = 0.10
% Change = 10
RIP = 0.00
% Change = 0
#Radanalytics
SPARK JOB + PROMETHEUS +
GRAFANA
DEMO
Demo Time!
#Radanalytics
Recap
You learned:
■ About our story on spark cluster metrics monitoring with prometheus
■ Spark Features
■ How prometheus can be integrated with apache spark
■ Spark Applications and how memory works
■ Spark Cluster JVM Instrumentation
■ How do I deploy a spark job and monitor it via grafana dashboard
■ Performance difference between cache vs non-cached dataframes
■ Monitoring tips and tricks
#Radanalytics
Thank You!
Questions?
#Radanalytics

Más contenido relacionado

La actualidad más candente

Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkOptimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkDatabricks
 
Memory Management in Apache Spark
Memory Management in Apache SparkMemory Management in Apache Spark
Memory Management in Apache SparkDatabricks
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsDatabricks
 
Apache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper OptimizationApache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper OptimizationDatabricks
 
PySpark Programming | PySpark Concepts with Hands-On | PySpark Training | Edu...
PySpark Programming | PySpark Concepts with Hands-On | PySpark Training | Edu...PySpark Programming | PySpark Concepts with Hands-On | PySpark Training | Edu...
PySpark Programming | PySpark Concepts with Hands-On | PySpark Training | Edu...Edureka!
 
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...Edureka!
 
Downscaling: The Achilles heel of Autoscaling Apache Spark Clusters
Downscaling: The Achilles heel of Autoscaling Apache Spark ClustersDownscaling: The Achilles heel of Autoscaling Apache Spark Clusters
Downscaling: The Achilles heel of Autoscaling Apache Spark ClustersDatabricks
 
Diving into Delta Lake: Unpacking the Transaction Log
Diving into Delta Lake: Unpacking the Transaction LogDiving into Delta Lake: Unpacking the Transaction Log
Diving into Delta Lake: Unpacking the Transaction LogDatabricks
 
Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming JobsDatabricks
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekVenkata Naga Ravi
 
Parallelization of Structured Streaming Jobs Using Delta Lake
Parallelization of Structured Streaming Jobs Using Delta LakeParallelization of Structured Streaming Jobs Using Delta Lake
Parallelization of Structured Streaming Jobs Using Delta LakeDatabricks
 
Apache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationApache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationDatabricks
 
Change Data Feed in Delta
Change Data Feed in DeltaChange Data Feed in Delta
Change Data Feed in DeltaDatabricks
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsAnton Kirillov
 
Data Source API in Spark
Data Source API in SparkData Source API in Spark
Data Source API in SparkDatabricks
 
Why your Spark Job is Failing
Why your Spark Job is FailingWhy your Spark Job is Failing
Why your Spark Job is FailingDataWorks Summit
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark Summit
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Databricks
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQLYousun Jeong
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDatabricks
 

La actualidad más candente (20)

Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkOptimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache Spark
 
Memory Management in Apache Spark
Memory Management in Apache SparkMemory Management in Apache Spark
Memory Management in Apache Spark
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIs
 
Apache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper OptimizationApache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper Optimization
 
PySpark Programming | PySpark Concepts with Hands-On | PySpark Training | Edu...
PySpark Programming | PySpark Concepts with Hands-On | PySpark Training | Edu...PySpark Programming | PySpark Concepts with Hands-On | PySpark Training | Edu...
PySpark Programming | PySpark Concepts with Hands-On | PySpark Training | Edu...
 
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
 
Downscaling: The Achilles heel of Autoscaling Apache Spark Clusters
Downscaling: The Achilles heel of Autoscaling Apache Spark ClustersDownscaling: The Achilles heel of Autoscaling Apache Spark Clusters
Downscaling: The Achilles heel of Autoscaling Apache Spark Clusters
 
Diving into Delta Lake: Unpacking the Transaction Log
Diving into Delta Lake: Unpacking the Transaction LogDiving into Delta Lake: Unpacking the Transaction Log
Diving into Delta Lake: Unpacking the Transaction Log
 
Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming Jobs
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
 
Parallelization of Structured Streaming Jobs Using Delta Lake
Parallelization of Structured Streaming Jobs Using Delta LakeParallelization of Structured Streaming Jobs Using Delta Lake
Parallelization of Structured Streaming Jobs Using Delta Lake
 
Apache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationApache Spark Core – Practical Optimization
Apache Spark Core – Practical Optimization
 
Change Data Feed in Delta
Change Data Feed in DeltaChange Data Feed in Delta
Change Data Feed in Delta
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & Internals
 
Data Source API in Spark
Data Source API in SparkData Source API in Spark
Data Source API in Spark
 
Why your Spark Job is Failing
Why your Spark Job is FailingWhy your Spark Job is Failing
Why your Spark Job is Failing
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQL
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
 

Similar a Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane Feddema and Zak Hassan

SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARKSCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARKzmhassan
 
Spark Driven Big Data Analytics
Spark Driven Big Data AnalyticsSpark Driven Big Data Analytics
Spark Driven Big Data Analyticsinoshg
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsDatabricks
 
Apache Spark: What's under the hood
Apache Spark: What's under the hoodApache Spark: What's under the hood
Apache Spark: What's under the hoodAdarsh Pannu
 
Putting the Spark into Functional Fashion Tech Analystics
Putting the Spark into Functional Fashion Tech AnalysticsPutting the Spark into Functional Fashion Tech Analystics
Putting the Spark into Functional Fashion Tech AnalysticsGareth Rogers
 
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Anant Corporation
 
Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64Ganesh Raju
 
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterBKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterLinaro
 
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to ClusterBKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to ClusterLinaro
 
Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...Ahsan Javed Awan
 
An introduction To Apache Spark
An introduction To Apache SparkAn introduction To Apache Spark
An introduction To Apache SparkAmir Sedighi
 
Profiling & Testing with Spark
Profiling & Testing with SparkProfiling & Testing with Spark
Profiling & Testing with SparkRoger Rafanell Mas
 
Architecting and productionising data science applications at scale
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scalesamthemonad
 
Boosting spark performance: An Overview of Techniques
Boosting spark performance: An Overview of TechniquesBoosting spark performance: An Overview of Techniques
Boosting spark performance: An Overview of TechniquesAhsan Javed Awan
 
Apache Spark Introduction.pdf
Apache Spark Introduction.pdfApache Spark Introduction.pdf
Apache Spark Introduction.pdfMaheshPandit16
 
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...Databricks
 
Near Data Computing Architectures: Opportunities and Challenges for Apache Spark
Near Data Computing Architectures: Opportunities and Challenges for Apache SparkNear Data Computing Architectures: Opportunities and Challenges for Apache Spark
Near Data Computing Architectures: Opportunities and Challenges for Apache SparkAhsan Javed Awan
 
Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...
Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...
Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...Spark Summit
 
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDPBuild Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDPDatabricks
 

Similar a Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane Feddema and Zak Hassan (20)

SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARKSCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
 
Spark Driven Big Data Analytics
Spark Driven Big Data AnalyticsSpark Driven Big Data Analytics
Spark Driven Big Data Analytics
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
 
Apache Spark: What's under the hood
Apache Spark: What's under the hoodApache Spark: What's under the hood
Apache Spark: What's under the hood
 
Putting the Spark into Functional Fashion Tech Analystics
Putting the Spark into Functional Fashion Tech AnalysticsPutting the Spark into Functional Fashion Tech Analystics
Putting the Spark into Functional Fashion Tech Analystics
 
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
 
Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64
 
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterBKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
 
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to ClusterBKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
 
Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...
 
An introduction To Apache Spark
An introduction To Apache SparkAn introduction To Apache Spark
An introduction To Apache Spark
 
Profiling & Testing with Spark
Profiling & Testing with SparkProfiling & Testing with Spark
Profiling & Testing with Spark
 
Architecting and productionising data science applications at scale
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scale
 
Boosting spark performance: An Overview of Techniques
Boosting spark performance: An Overview of TechniquesBoosting spark performance: An Overview of Techniques
Boosting spark performance: An Overview of Techniques
 
Apache Spark Introduction.pdf
Apache Spark Introduction.pdfApache Spark Introduction.pdf
Apache Spark Introduction.pdf
 
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
 
Near Data Computing Architectures: Opportunities and Challenges for Apache Spark
Near Data Computing Architectures: Opportunities and Challenges for Apache SparkNear Data Computing Architectures: Opportunities and Challenges for Apache Spark
Near Data Computing Architectures: Opportunities and Challenges for Apache Spark
 
Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...
Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...
Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...
 
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDPBuild Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
 
Apache Spark
Apache SparkApache Spark
Apache Spark
 

Más de Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

Más de Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Último

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 

Último (20)

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 

Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane Feddema and Zak Hassan

  • 1. SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK Diane Feddema, Principal Software Engineer Zak Hassan, Software Engineer #Radanalytics
  • 2. YOUR SPEAKERS DIANE FEDDEMA PRINCIPAL SOFTWARE ENGINEER - AI/ML CENTER OF EXCELLENCE, CTO OFFICE ● Currently focused on developing and applying Data Science and Machine Learning techniques for performance analysis, automating these analyses and displaying data in novel ways. ● Previously worked as a performance engineer at the National Center for Atmospheric Research, NCAR, working on optimizations and tuning in parallel global climate models. ZAK HASSAN SOFTWARE ENGINEER - AI/ML CENTER OF EXCELLENCE, CTO OFFICE ● Currently focused on developing analytics platform on OpenShift and leveraging Apache Spark as the analytics engine. Also, developing data science apps and working on making metrics observable through cloud-native technology. ● Previously worked as a Software Consultant in the financial services and insurance industry, building end-to-end software solutions for clients. #Radanalytics
  • 3. OVERVIEW OBSERVABILITY ● Motivation ● What Is Spark?​ ● What Is Prometheus? ● Our Story ● Spark Cluster JVM Instrumentation PERFORMANCE TUNING ● Tuning Spark jobs ● Spark Memory Model ● Prometheus as a performance tool ● Comparing cached vs non-cached dataframes ● Demo #Radanalytics
  • 4. MOTIVATION ● Rapid experimentation with data science apps ● Identify bottlenecks ● Improve performance ● Resolve incidents more quickly ● Improving memory usage to tune spark jobs #Radanalytics
  • 5. OUR STORY ● Instrumented spark jvm to expose metrics in a kubernetes pod. ● Added ability to monitor spark with prometheus ● Experimented with using Grafana with Prometheus to provide more insight ● Sharing our experiments and experience with using this to do performance analysis of spark jobs. ● Demo at the very end June 1, 2017 - https://github.com/radanalyticsio/openshift-spark/pull/28 - Added agent to report jolokia metrics endpoint in kubernetes pod Nov 7, 2017 - https://github.com/radanalyticsio/openshift-spark/pull/35 - Added agent to report prometheus metrics endpoint in kubernetes pod #Radanalytics
  • 6. WHAT IS PROMETHEUS ● Open source monitoring ● in 2016 prometheus become the 2nd member of the CNCF ● scrapes metrics from a endpoint. ● Client libraries in Go, Java, Python, etc. ● Kubernetes comes instrumented out of the box with prometheus endpoints. ● If you don’t have native integration with prometheus there are lots of community exporters that allow lots of things to expose metrics in your infrastructure to get monitored. #Radanalytics
  • 7. WHAT IS APACHE SPARK Apache Spark is an in-demand data processing engine with a thriving community and steadily growing install base ● Supports interactive data exploration in addition to apps ● Batch and stream processing ● Machine learning libraries ● Distributed ● Separate storage and compute ( in memory processing) ● new external scheduler kubernetes #Radanalytics
  • 8. SPARK FEATURES • Can run standalone, with yarn, mesos or Kubernetes as the cluster manager • Has language bindings for Java, Scala, Python, and R • Access data from JDBC, HDFS, S3 or regular filesystem • Can persist data in different data formats: parquet, avro, json, csv, etc. SQL MLlib Graph Streaming SPARK CORE #Radanalytics
  • 11. SPARK CLUSTER INSTRUMENT SPARK MASTER JAVA AGENT SPARK WORKER JAVA AGENT SPARK WORKER JAVA AGENT PROMETHEUS ALERT MANAGER Notify alertmanager Scrapes metrics #Radanalytics
  • 14. PULL METRICS ● Prometheus lets you configure how often to scrape and which endpoints to scrap. The prometheus server will pull in the metrics that are configured. #Radanalytics
  • 15. ALERTMANAGER • PromQL query is used to create rules to notify you if the rule is triggered. • Currently alertmanager will receive the notification and is able to notify you via email, slack or other options (see docs for details) . #Radanalytics
  • 16. PROMQL ● Powerful query language to get metrics on kubernetes cluster along with spark clusters. ● What are gauges and counters? Gauges: Latest value of metric Counters: Total number of event occurrences. Might be suffix “*total”. You can use this format to get the last minute prom_metric_total[1m] #Radanalytics
  • 17. PART 2: Tuning Spark jobs with Prometheus Things we would like to know when tuning Spark programs: ● How much memory is the driver using? ● How much memory are the workers using? ● How is the JVM begin utilized by spark? ● Is my spark job saturating the network? ● What is the cluster view of network, cpu and memory utilization? We will demonstrate how Prometheus coupled with Grafana on Kubernetes can help answer these types of questions. #Radanalytics
  • 18. Our Example Application Focus on Memory: Efficient Memory use is Key to good performance in Spark jobs. How: We will create Prometheus + Grafana dashboards to evaluate memory usage under different conditions? Example: Our Spark Python example will compare memory usage with and without caching to illustrate how memory usage and timing change for a PySpark program performing a cartesian product followed by a groupby operation #Radanalytics
  • 19. A little Background Memory allocation in Spark ● Spark is an "in-memory" computing framework ● Memory is a limited resource! ● There is competition for memory ● Caching reusable results can save overall memory usage under certain conditions ● Memory runs out in many large jobs forcing spills to disk #Radanalytics
  • 20. Spark Unified Memory Model LRU eviction and user defined memory configuration options Block Block Total JVM Heap Memory allocated to SPARK JOB Memory allocated to EXECUTION Block Block Block Block Block Block Block Block Memory allocated to STORAGE Block Block Block Block Block Block Block Block Block Block Block Block Block Block Block Block Block Block Block Block Block Block Block Block Spill to disk Block Block Block Block Block Block Block Block ? Block ? Spark.memory.storageFraction User specified unevictable amount Spill to disk EXECUTION takes precedence over STORAGE up to user defined unevictable amount Block ? Block ? Spill to disk #Radanalytics Block
  • 21. Using Spark SQL and Spark RDD API together in a tuning exercise We want to use Spark SQL to manipulate dataframes Spark SQL is a component of Spark ● it provides structured data processing ● it is implemented as a library on top of Spark APIs: ● SQL syntax ● Dataframes ● Datasets Backend components: ● Catalyst - query optimizer ● Tungsten - off-heap memory management eliminates overhead of Java Objects #Radanalytics
  • 22. Performance Optimizations with Spark SQL JDBC Console User Programs (Python, Scala, Java) SPARK SQL Catalyst Optimizer Dataframe API Spark Core Spark SQL performance benefits: ● Catalyst compiles Spark SQL programs down to an RDD ● Tungsten provides more efficient data storage compared to Java objects on the heap ● Dataframe API and RDD API can be intermixed RDDs #Radanalytics
  • 23. Using Prometheus + Grafana for performance optimization Specific code example: Compare non-cached and cached dataframes that are reused in a groupBy transformation When is good idea to use cache in a dataframe? ● when a result of a computation is going to be reused later ● when it is costly to recompute that result ● in cases where algorithms make several passes over the data #Radanalytics
  • 24. Determining memory consumption for dataframes you want to cache #Radanalytics
  • 25. Example: Code for non-cached run rdd1 = RandomRDDs.normalVectorRDD(spark, nRow, nCol, numPartitions, seed) seed = 3 rdd2 = RandomRDDs.normalVectorRDD(spark, nRow, nCol, numPartitions, seed) sc = spark.sparkContext # convert each tuple in the rdd to a row randomNumberRdd1 = rdd1.map(lambda x: Row(A=float(x[0]), B=float(x[1]), C=float(x[2]), D=float(x[3]))) randomNumberRdd2 = rdd2.map(lambda x: Row(E=float(x[0]), F=float(x[1]), G=float(x[2]), H=float(x[3]))) # create dataframe from rdd schemaRandomNumberDF1 = spark.createDataFrame(randomNumberRdd1) schemaRandomNumberDF2 = spark.createDataFrame(randomNumberRdd2) cross_df = schemaRandomNumberDF1.crossJoin(schemaRandomNumberDF2) # aggregate results = schemaRandomNumberDF1.groupBy("A").agg(func.max("B"),func.sum("C")) results.show(n=100) print "----------Count in cross-join--------------- {0}".format(cross_df.count()) #Radanalytics
  • 26. Example: Code for cached run rdd1 = RandomRDDs.normalVectorRDD(spark, nRow, nCol, numPartitions, seed) seed = 3 rdd2 = RandomRDDs.normalVectorRDD(spark, nRow, nCol, numPartitions, seed) sc = spark.sparkContext # convert each tuple in the rdd to a row randomNumberRdd1 = rdd1.map(lambda x: Row(A=float(x[0]), B=float(x[1]), C=float(x[2]), D=float(x[3]))) randomNumberRdd2 = rdd2.map(lambda x: Row(E=float(x[0]), F=float(x[1]), G=float(x[2]), H=float(x[3]))) # create dataframe from rdd schemaRandomNumberDF1 = spark.createDataFrame(randomNumberRdd1) schemaRandomNumberDF2 = spark.createDataFrame(randomNumberRdd2) # cache the dataframe schemaRandomNumberDF1.cache() schemaRandomNumberDF2.cache() cross_df = schemaRandomNumberDF1.crossJoin(schemaRandomNumberDF2) # aggregate results = schemaRandomNumberDF1.groupBy("A").agg(func.max("B"),func.sum("C")) results.show(n=100) print "----------Count in cross-join--------------- {0}".format(cross_df.count()) #Radanalytics
  • 27. Query plan comparison Non-Cached Cached #Radanalytics
  • 28. Example: Comparing cached vs non-cached runs Prometheus dashboard: non-cached Prometheus dashboard: cached #Radanalytics
  • 29. Prometheus dashboard: non-cached Prometheus dashboard: cached #Radanalytics Example: Comparing cached vs non-cached runs
  • 30. Comparing non-cached vs cached runs RIP = 0.76 % Change = 76 RIP (Relative Index of Performance) RIP: 0 to 1 = Improvement 0 to -1 = Degradation % Change: negative values = Improvement RIP = 0.63 % Change = 63 RIP = 0.62 % Change = 62 RIP = 0.63 % Change = 63 RIP = 0.10 % Change = 10 RIP = 0.00 % Change = 0 #Radanalytics
  • 31. SPARK JOB + PROMETHEUS + GRAFANA DEMO Demo Time! #Radanalytics
  • 32. Recap You learned: ■ About our story on spark cluster metrics monitoring with prometheus ■ Spark Features ■ How prometheus can be integrated with apache spark ■ Spark Applications and how memory works ■ Spark Cluster JVM Instrumentation ■ How do I deploy a spark job and monitor it via grafana dashboard ■ Performance difference between cache vs non-cached dataframes ■ Monitoring tips and tricks #Radanalytics