SlideShare una empresa de Scribd logo
1 de 15
Descargar para leer sin conexión
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 1
Auto Scaling Systems With Elastic Spark Streaming
PhuDuc Nguyen
Consulting Engineer @ Oracle Data Cloud
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 2
Auto Scaling Systems With Elastic Spark Streaming
Problem - streaming jobs with variable traffic pattern
● Our traffic pattern has natural peaks and valleys.
● We also experience unpredictable traffic spikes.
○ Breaking world news events - eg Brexit.
○ Some of our partners run batch jobs and send data at unknown times.
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 3
Auto Scaling Systems With Elastic Spark Streaming
Various scenarios that result in “catch up mode”
● Streaming job has fallen behind real-time. In order to catch up, throughput must be > 1x to process backlog +
real-time.
● Catch up from failure: hardware, software, network, etc. Data doesn’t stop just because your streaming job has.
Deal with it. Be ready to catch up.
● Catch up from replay: pushing historical data back through the pipeline to fix a bug or apply new
feature/transformation retroactively.
● Unforeseen spikes in traffic.
● Underestimating the size of a streaming job.
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 4
Auto Scaling Systems With Elastic Spark Streaming
Problems aren’t new - can use traditional methods to cope…
● Statically size the cluster/job to max servers needed to handle peak rate + some percentage for extra wiggle room.
○ Works but wastes resources when not at peak traffic; larger difference between peaks/valleys = larger waste
when idle.
● Manually restart job adjusting spark.cores.max=n
○ Manually babysitting cluster = pain. No one wants to wake up at 3am to grow or shrink a cluster.
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 5
Auto Scaling Systems With Elastic Spark Streaming
What about backpressure?
● Async consumer signalling upstream producer to slow down. Reactive Manifesto is brilliantly simple, elegant, and
powerful API.
● Impl in Spark Streaming is effective and works very well.
● Can use “enough” servers and allow backpressure to kick in during peaks/spikes and will catch up later during valley.
○ But can result in falling behind or a lag in real-time data during backpressure. If job is required to maintain
real-time, buffering upstream may not be an option.
○ Once caught up, during traffic valley, you’ll likely have idle/wasted resources.
○ Prolonged sustained spikes > than “normal peak” for many days may exceed your upstream buffer - eg Brexit
caused a very large spike in our traffic for about 10 days. Our Kafka retention window is only 5 days.
○ Ultimately, max throughput is capped by the amount of servers it has...and sometimes we simply need more
servers.
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 6
Auto Scaling Systems With Elastic Spark Streaming
A solution: Elastic Spark Streaming
● Auto scale - ability to add/remove servers dynamically without restarting the stream.
● Streaming job must automatically adjust to the demands of traffic.
● Maximize resource utilization - use servers when you need them, release them when you don’t. Don’t pay for idle
resources.
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 7
Auto Scaling Systems With Elastic Spark Streaming
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 8
Auto Scaling Systems With Elastic Spark Streaming
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 9
Auto Scaling Systems With Elastic Spark Streaming
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 10
Auto Scaling Systems With Elastic Spark Streaming
Implementation
● We collect “processing %” = totalProcessingTime / microbatchInterval
● If “processing %” > 100%, that microbatch fell behind
● If “processing %” < 100%, that microbatch is stable
● Don’t act on a single datapoint. Collect a sliding window and act on trend.
● Careful to avoid creating a system that constantly scales up and down within seconds or minutes - appears as if
the streaming job is indecisive and thrashing/flapping.
● Each streaming job has a unique workload, traffic pattern, and micro batch interval and thus requires
tunable/configurable thresholds.
○ Max threshold default is 100%; trigger scale up after N datapoints have median > max threshold.
○ Min threshold default is 75%; trigger scale down after N datapoints have medan < min threshold and queued
batches are empty.
○ N is configurable.
○ Scale up and down by configurable percentage of current servers.
● sparkContext.requestExecutors(count) and sparkContext.killExecutor(executorId).
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 11
Auto Scaling Systems With Elastic Spark Streaming
Testing Elastic Streams
● Invariant - elasticity affects performance and resource utilization, it cannot impact data integrity.
● Impl random scaling - WTF why?!?! Before using any heuristics to trigger elasticity, we want to prove that no
matter how good/bad your heuristics are, elasticity can only affect performance. Even with the worst heuristics,
we want to prove that no data loss or corruption occurs. Random scaling intentionally forces the streaming job
into a state of thrashing/flapping. We can observe the performance is much worse than no scaling at all.
● Single command in testing project does the following…
○ Build new mesos cluster.
○ Use 1 static docker image that contains all input/test data.
○ Start 2 spark streaming jobs that ingest same dataset and share same processing logic: one job runs with
no-scaling, other job randomly scales and thrashes.
○ Start last spark job that ingests output from other 2 jobs, perform distinct and diff against the datasets for
every record and field. They must be equivalent otherwise you’ve violated the invariant.
○ Tear down mesos cluster.
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 12
Auto Scaling Systems With Elastic Spark Streaming
Where do the spark jobs add servers from and release to?
● Deploy our streaming jobs on mesos cluster and launched via marathon.
● Each streaming job
○ Expands/contracts independently from one another - it has no knowledge of its neighbor.
○ Can gain more servers from the mesos cluster and releases them back to the mesos cluster on-demand.
● A lightweight app monitors used/unused resources on mesos. If the mesos cluster needs to grow, the app automatically triggers
the automation to add mesos workers from the cloud. As the mesos cluster needs to shrink, the app automatically releases the
servers back to the cloud. Like nested balloons that can expand/contract on demand...
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 13
Auto Scaling Systems With Elastic Spark Streaming
Metrics
● A single ingest streaming job ingests over 5TB/day.
● Output is fed into many other streaming jobs (over 30). Results of processing generates > 2x ingest rate.
● At peak times, single ingest job processes > 130k events/second.
● Roughly 500-800 servers in the cloud for our team alone - elasticity changes this number frequently.
● Our team alone spends $250k/month = $3mln/year on infrastructure.
● 8 engineers are responsible for all of it: infrastructure, development, testing.
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 14
Auto Scaling Systems With Elastic Spark Streaming
Impact of Elastic Spark Streaming
● Operational gains
○ Minimize manual developer intervention
○ Devs focus on features not babysitting
● Financial impact
○ Maximize resource utilization
○ If your cloud bill is in the millions, your savings can be in the millions!
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 15
Auto Scaling Systems With Elastic Spark Streaming
Thanks for listening. Q&A?

Más contenido relacionado

La actualidad más candente

Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on KubernetesApache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
DataWorks Summit
 

La actualidad más candente (20)

Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
 
Introduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matterIntroduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matter
 
Amazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best Practices
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Web Crawling with Apache Nutch
Web Crawling with Apache NutchWeb Crawling with Apache Nutch
Web Crawling with Apache Nutch
 
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Kafka: Internals
Kafka: InternalsKafka: Internals
Kafka: Internals
 
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on KubernetesApache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Monitor Apache Spark 3 on Kubernetes using Metrics and PluginsMonitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins
 
Why Micro Focus Chose Pulsar for Data Ingestion - Pulsar Summit NA 2021
Why Micro Focus Chose Pulsar for Data Ingestion - Pulsar Summit NA 2021Why Micro Focus Chose Pulsar for Data Ingestion - Pulsar Summit NA 2021
Why Micro Focus Chose Pulsar for Data Ingestion - Pulsar Summit NA 2021
 
XStream: stream processing platform at facebook
XStream:  stream processing platform at facebookXStream:  stream processing platform at facebook
XStream: stream processing platform at facebook
 
Building a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQLBuilding a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQL
 
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
 
Oracle RAC features on Exadata
Oracle RAC features on ExadataOracle RAC features on Exadata
Oracle RAC features on Exadata
 
Securing Kafka
Securing Kafka Securing Kafka
Securing Kafka
 
Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®
 

Destacado

Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Spark Summit
 
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark Summit
 
Optimizing Spark Deployments for Containers: Isolation, Safety, and Performan...
Optimizing Spark Deployments for Containers: Isolation, Safety, and Performan...Optimizing Spark Deployments for Containers: Isolation, Safety, and Performan...
Optimizing Spark Deployments for Containers: Isolation, Safety, and Performan...
Spark Summit
 
Accelerating Machine Learning and Deep Learning At Scale...With Apache Spark:...
Accelerating Machine Learning and Deep Learning At Scale...With Apache Spark:...Accelerating Machine Learning and Deep Learning At Scale...With Apache Spark:...
Accelerating Machine Learning and Deep Learning At Scale...With Apache Spark:...
Spark Summit
 

Destacado (20)

Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
 
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
 
Optimizing Spark Deployments for Containers: Isolation, Safety, and Performan...
Optimizing Spark Deployments for Containers: Isolation, Safety, and Performan...Optimizing Spark Deployments for Containers: Isolation, Safety, and Performan...
Optimizing Spark Deployments for Containers: Isolation, Safety, and Performan...
 
Improving Python and Spark Performance and Interoperability: Spark Summit Eas...
Improving Python and Spark Performance and Interoperability: Spark Summit Eas...Improving Python and Spark Performance and Interoperability: Spark Summit Eas...
Improving Python and Spark Performance and Interoperability: Spark Summit Eas...
 
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaKerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
 
Unlocking Value in Device Data Using Spark: Spark Summit East talk by John La...
Unlocking Value in Device Data Using Spark: Spark Summit East talk by John La...Unlocking Value in Device Data Using Spark: Spark Summit East talk by John La...
Unlocking Value in Device Data Using Spark: Spark Summit East talk by John La...
 
Accelerating Machine Learning and Deep Learning At Scale...With Apache Spark:...
Accelerating Machine Learning and Deep Learning At Scale...With Apache Spark:...Accelerating Machine Learning and Deep Learning At Scale...With Apache Spark:...
Accelerating Machine Learning and Deep Learning At Scale...With Apache Spark:...
 
Scalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
Scalable Data Science with SparkR: Spark Summit East talk by Felix CheungScalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
Scalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
 
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
 
Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engin...
Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engin...Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engin...
Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engin...
 
Apache Spark in Cloud and Hybrid: Why Security and Governance Become More Imp...
Apache Spark in Cloud and Hybrid: Why Security and Governance Become More Imp...Apache Spark in Cloud and Hybrid: Why Security and Governance Become More Imp...
Apache Spark in Cloud and Hybrid: Why Security and Governance Become More Imp...
 
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan SharmaSparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
 
Real-time Platform for Second Look Business Use Case Using Spark and Kafka: S...
Real-time Platform for Second Look Business Use Case Using Spark and Kafka: S...Real-time Platform for Second Look Business Use Case Using Spark and Kafka: S...
Real-time Platform for Second Look Business Use Case Using Spark and Kafka: S...
 
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...
 
Artificial Intelligence: How Enterprises Can Crush It With Apache Spark: Keyn...
Artificial Intelligence: How Enterprises Can Crush It With Apache Spark: Keyn...Artificial Intelligence: How Enterprises Can Crush It With Apache Spark: Keyn...
Artificial Intelligence: How Enterprises Can Crush It With Apache Spark: Keyn...
 
Exceptions are the Norm: Dealing with Bad Actors in ETL
Exceptions are the Norm: Dealing with Bad Actors in ETLExceptions are the Norm: Dealing with Bad Actors in ETL
Exceptions are the Norm: Dealing with Bad Actors in ETL
 
Effective Spark with Alluxio: Spark Summit East talk by Gene Pang and Haoyuan...
Effective Spark with Alluxio: Spark Summit East talk by Gene Pang and Haoyuan...Effective Spark with Alluxio: Spark Summit East talk by Gene Pang and Haoyuan...
Effective Spark with Alluxio: Spark Summit East talk by Gene Pang and Haoyuan...
 
Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...
Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...
Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...
 
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
 
Sparkler—Crawler on Apache Spark: Spark Summit East talk by Karanjeet Singh a...
Sparkler—Crawler on Apache Spark: Spark Summit East talk by Karanjeet Singh a...Sparkler—Crawler on Apache Spark: Spark Summit East talk by Karanjeet Singh a...
Sparkler—Crawler on Apache Spark: Spark Summit East talk by Karanjeet Singh a...
 

Similar a Auto Scaling Systems With Elastic Spark Streaming: Spark Summit East talk by PhuDuc Nguyen

A5 oracle exadata-the game changer for online transaction processing data w...
A5   oracle exadata-the game changer for online transaction processing data w...A5   oracle exadata-the game changer for online transaction processing data w...
A5 oracle exadata-the game changer for online transaction processing data w...
Dr. Wilfred Lin (Ph.D.)
 
CON6492 - Oracle Database Public Cloud Services v1 1
CON6492 - Oracle Database Public Cloud Services v1 1CON6492 - Oracle Database Public Cloud Services v1 1
CON6492 - Oracle Database Public Cloud Services v1 1
David van Schalkwyk
 

Similar a Auto Scaling Systems With Elastic Spark Streaming: Spark Summit East talk by PhuDuc Nguyen (20)

Oracle super cluster for oracle e business suite
Oracle super cluster for oracle e business suiteOracle super cluster for oracle e business suite
Oracle super cluster for oracle e business suite
 
Alta Disponibilidade no MySQL 5.7
Alta Disponibilidade no MySQL 5.7Alta Disponibilidade no MySQL 5.7
Alta Disponibilidade no MySQL 5.7
 
Enterprise manager 13c
Enterprise manager 13cEnterprise manager 13c
Enterprise manager 13c
 
How Workload Prioritization Reduces Your Datacenter Footprint
How Workload Prioritization Reduces Your Datacenter FootprintHow Workload Prioritization Reduces Your Datacenter Footprint
How Workload Prioritization Reduces Your Datacenter Footprint
 
GLOC Keynote 2014 - In-memory
GLOC Keynote 2014 - In-memoryGLOC Keynote 2014 - In-memory
GLOC Keynote 2014 - In-memory
 
Ground Breakers Romania: Oracle Autonomous Database
Ground Breakers Romania: Oracle Autonomous DatabaseGround Breakers Romania: Oracle Autonomous Database
Ground Breakers Romania: Oracle Autonomous Database
 
Anna Vergeles, Nataliia Manakova "Unsupervised Real-Time Stream-Based Novelty...
Anna Vergeles, Nataliia Manakova "Unsupervised Real-Time Stream-Based Novelty...Anna Vergeles, Nataliia Manakova "Unsupervised Real-Time Stream-Based Novelty...
Anna Vergeles, Nataliia Manakova "Unsupervised Real-Time Stream-Based Novelty...
 
Oracle Cloud Café hybrid Cloud 19 mai 2016
Oracle Cloud Café hybrid Cloud 19 mai 2016Oracle Cloud Café hybrid Cloud 19 mai 2016
Oracle Cloud Café hybrid Cloud 19 mai 2016
 
Oracle Management Cloud - HybridCloud Café - May 2016
Oracle Management Cloud - HybridCloud Café - May 2016Oracle Management Cloud - HybridCloud Café - May 2016
Oracle Management Cloud - HybridCloud Café - May 2016
 
Meetup Oracle Database MAD_BCN: 1.2 Oracle Database 18c (autonomous database)
Meetup Oracle Database MAD_BCN: 1.2 Oracle Database 18c (autonomous database)Meetup Oracle Database MAD_BCN: 1.2 Oracle Database 18c (autonomous database)
Meetup Oracle Database MAD_BCN: 1.2 Oracle Database 18c (autonomous database)
 
Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...
Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...
Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...
 
Oracle RAC 12c Rel. 2 for Continuous Availability
Oracle RAC 12c Rel. 2 for Continuous AvailabilityOracle RAC 12c Rel. 2 for Continuous Availability
Oracle RAC 12c Rel. 2 for Continuous Availability
 
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at UberDisaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
 
Key considerations in productionizing streaming applications
Key considerations in productionizing streaming applicationsKey considerations in productionizing streaming applications
Key considerations in productionizing streaming applications
 
Exadata x4 for_sap
Exadata x4 for_sapExadata x4 for_sap
Exadata x4 for_sap
 
Oracle ADF Architecture TV - Development - Performance & Tuning
Oracle ADF Architecture TV - Development - Performance & TuningOracle ADF Architecture TV - Development - Performance & Tuning
Oracle ADF Architecture TV - Development - Performance & Tuning
 
A5 oracle exadata-the game changer for online transaction processing data w...
A5   oracle exadata-the game changer for online transaction processing data w...A5   oracle exadata-the game changer for online transaction processing data w...
A5 oracle exadata-the game changer for online transaction processing data w...
 
CON6492 - Oracle Database Public Cloud Services v1 1
CON6492 - Oracle Database Public Cloud Services v1 1CON6492 - Oracle Database Public Cloud Services v1 1
CON6492 - Oracle Database Public Cloud Services v1 1
 
AWR, ASH with EM13 at HotSos 2016
AWR, ASH with EM13 at HotSos 2016AWR, ASH with EM13 at HotSos 2016
AWR, ASH with EM13 at HotSos 2016
 
Netherlands Tech Tour 02 - MySQL Fabric
Netherlands Tech Tour 02 -   MySQL FabricNetherlands Tech Tour 02 -   MySQL Fabric
Netherlands Tech Tour 02 - MySQL Fabric
 

Más de Spark Summit

Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
 

Más de Spark Summit (20)

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
 

Último

Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
q6pzkpark
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
vexqp
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
vexqp
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
ptikerjasaptiker
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
vexqp
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
vexqp
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
cnajjemba
 

Último (20)

Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
 
Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
 

Auto Scaling Systems With Elastic Spark Streaming: Spark Summit East talk by PhuDuc Nguyen

  • 1. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 1 Auto Scaling Systems With Elastic Spark Streaming PhuDuc Nguyen Consulting Engineer @ Oracle Data Cloud
  • 2. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 2 Auto Scaling Systems With Elastic Spark Streaming Problem - streaming jobs with variable traffic pattern ● Our traffic pattern has natural peaks and valleys. ● We also experience unpredictable traffic spikes. ○ Breaking world news events - eg Brexit. ○ Some of our partners run batch jobs and send data at unknown times.
  • 3. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 3 Auto Scaling Systems With Elastic Spark Streaming Various scenarios that result in “catch up mode” ● Streaming job has fallen behind real-time. In order to catch up, throughput must be > 1x to process backlog + real-time. ● Catch up from failure: hardware, software, network, etc. Data doesn’t stop just because your streaming job has. Deal with it. Be ready to catch up. ● Catch up from replay: pushing historical data back through the pipeline to fix a bug or apply new feature/transformation retroactively. ● Unforeseen spikes in traffic. ● Underestimating the size of a streaming job.
  • 4. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 4 Auto Scaling Systems With Elastic Spark Streaming Problems aren’t new - can use traditional methods to cope… ● Statically size the cluster/job to max servers needed to handle peak rate + some percentage for extra wiggle room. ○ Works but wastes resources when not at peak traffic; larger difference between peaks/valleys = larger waste when idle. ● Manually restart job adjusting spark.cores.max=n ○ Manually babysitting cluster = pain. No one wants to wake up at 3am to grow or shrink a cluster.
  • 5. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 5 Auto Scaling Systems With Elastic Spark Streaming What about backpressure? ● Async consumer signalling upstream producer to slow down. Reactive Manifesto is brilliantly simple, elegant, and powerful API. ● Impl in Spark Streaming is effective and works very well. ● Can use “enough” servers and allow backpressure to kick in during peaks/spikes and will catch up later during valley. ○ But can result in falling behind or a lag in real-time data during backpressure. If job is required to maintain real-time, buffering upstream may not be an option. ○ Once caught up, during traffic valley, you’ll likely have idle/wasted resources. ○ Prolonged sustained spikes > than “normal peak” for many days may exceed your upstream buffer - eg Brexit caused a very large spike in our traffic for about 10 days. Our Kafka retention window is only 5 days. ○ Ultimately, max throughput is capped by the amount of servers it has...and sometimes we simply need more servers.
  • 6. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 6 Auto Scaling Systems With Elastic Spark Streaming A solution: Elastic Spark Streaming ● Auto scale - ability to add/remove servers dynamically without restarting the stream. ● Streaming job must automatically adjust to the demands of traffic. ● Maximize resource utilization - use servers when you need them, release them when you don’t. Don’t pay for idle resources.
  • 7. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 7 Auto Scaling Systems With Elastic Spark Streaming
  • 8. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 8 Auto Scaling Systems With Elastic Spark Streaming
  • 9. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 9 Auto Scaling Systems With Elastic Spark Streaming
  • 10. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 10 Auto Scaling Systems With Elastic Spark Streaming Implementation ● We collect “processing %” = totalProcessingTime / microbatchInterval ● If “processing %” > 100%, that microbatch fell behind ● If “processing %” < 100%, that microbatch is stable ● Don’t act on a single datapoint. Collect a sliding window and act on trend. ● Careful to avoid creating a system that constantly scales up and down within seconds or minutes - appears as if the streaming job is indecisive and thrashing/flapping. ● Each streaming job has a unique workload, traffic pattern, and micro batch interval and thus requires tunable/configurable thresholds. ○ Max threshold default is 100%; trigger scale up after N datapoints have median > max threshold. ○ Min threshold default is 75%; trigger scale down after N datapoints have medan < min threshold and queued batches are empty. ○ N is configurable. ○ Scale up and down by configurable percentage of current servers. ● sparkContext.requestExecutors(count) and sparkContext.killExecutor(executorId).
  • 11. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 11 Auto Scaling Systems With Elastic Spark Streaming Testing Elastic Streams ● Invariant - elasticity affects performance and resource utilization, it cannot impact data integrity. ● Impl random scaling - WTF why?!?! Before using any heuristics to trigger elasticity, we want to prove that no matter how good/bad your heuristics are, elasticity can only affect performance. Even with the worst heuristics, we want to prove that no data loss or corruption occurs. Random scaling intentionally forces the streaming job into a state of thrashing/flapping. We can observe the performance is much worse than no scaling at all. ● Single command in testing project does the following… ○ Build new mesos cluster. ○ Use 1 static docker image that contains all input/test data. ○ Start 2 spark streaming jobs that ingest same dataset and share same processing logic: one job runs with no-scaling, other job randomly scales and thrashes. ○ Start last spark job that ingests output from other 2 jobs, perform distinct and diff against the datasets for every record and field. They must be equivalent otherwise you’ve violated the invariant. ○ Tear down mesos cluster.
  • 12. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 12 Auto Scaling Systems With Elastic Spark Streaming Where do the spark jobs add servers from and release to? ● Deploy our streaming jobs on mesos cluster and launched via marathon. ● Each streaming job ○ Expands/contracts independently from one another - it has no knowledge of its neighbor. ○ Can gain more servers from the mesos cluster and releases them back to the mesos cluster on-demand. ● A lightweight app monitors used/unused resources on mesos. If the mesos cluster needs to grow, the app automatically triggers the automation to add mesos workers from the cloud. As the mesos cluster needs to shrink, the app automatically releases the servers back to the cloud. Like nested balloons that can expand/contract on demand...
  • 13. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 13 Auto Scaling Systems With Elastic Spark Streaming Metrics ● A single ingest streaming job ingests over 5TB/day. ● Output is fed into many other streaming jobs (over 30). Results of processing generates > 2x ingest rate. ● At peak times, single ingest job processes > 130k events/second. ● Roughly 500-800 servers in the cloud for our team alone - elasticity changes this number frequently. ● Our team alone spends $250k/month = $3mln/year on infrastructure. ● 8 engineers are responsible for all of it: infrastructure, development, testing.
  • 14. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 14 Auto Scaling Systems With Elastic Spark Streaming Impact of Elastic Spark Streaming ● Operational gains ○ Minimize manual developer intervention ○ Devs focus on features not babysitting ● Financial impact ○ Maximize resource utilization ○ If your cloud bill is in the millions, your savings can be in the millions!
  • 15. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 15 Auto Scaling Systems With Elastic Spark Streaming Thanks for listening. Q&A?