SlideShare una empresa de Scribd logo
1 de 30
Databricks Clusters in
Autopilot Mode
Prakash Chockalingam
October, 2017
Housekeeping
• Your connection will be muted
• Submit questions via the Q&A panel
• Questions will be answered at the end of the webinar
• Any outstanding questions will be answered in the Databricks Forum
(https://forums.databricks.com)
• Webinar will be recorded and attachments will be made available via
www.databricks.com
2
About Prakash
Prakash Chockalingam
● Product Manager at Databricks
● Works closely with customers
● Deep experience building large scale
distributed systems and machine learning
infrastructure at Netflix and Yahoo
3
4
Agenda
• About Databricks
• Challenges involved in building a big data platform
• How Databricks simplifies devops with lot of ‘automatic’
features in Databricks clusters.
• Demo of the key features that sets up the Databricks
clusters in an autopilot mode.
5
Accelerate innovation by unifying data science,
engineering and business
• Founded by the creators of Apache Spark
• Contributes 75% of the open source code, 10x
more than any other company
• Trained 40k+ Spark users on the Databricks
platform
VISIO
N
WHO WE
ARE
Unified Analytics Platform powered by Apache SparkPRODUC
T
6
Building a big data platform is hard
Availability Reliability Lower cloud cost Simplicity
Throughput Scalability Security
7
Cluster Reliability
Commodity hardware . Good performance/$
• But lot of things can go wrong - bad disks, flaky
instances, network errors, etc.
Heterogenous workloads running user code
• Requires proper isolation and fault tolerance.
Uneven distribution
• Data skew, bursty requests, etc
8
Data Throughput through Cluster
• Read/Write large number of records
efficiently
• Low latency under high throughput
traffic
• Right tradeoff with reliability and
throughput is required.
Instance Disk
Memory
Cloud Storage
HigherThroughput
LowerReliability
Available Size
9
Scalability
Scalability across different dimensions:
• Number of clusters
• Platform must be able to handle 100s of cluster requests.
• # of nodes in a cluster
• Platform must be able to handle 100s of nodes in a single cluster.
• Size of data
• Handle large volumes of data.
10
Cluster Availability
Must be easy to upgrade even with large scale
• Zero downtime upgrades for clusters so that no
production workloads are affected.
• Easy mechanisms to rollback
Instrument monitoring & alerting
• Easily detect & alert on failures
• Track performance & utilization
Fast recoverability from failures
11
Simplicity
Easy to use
• Interface must be intuitive and easy for the
developers to use
Debuggability
• Developers must be able to easily access
metrics and logs to troubleshoot their code.
12
Lowering cloud cost
Optimum resource utilization
• Fine-grained resource sharing
Elasticity
• Autoscaling resources
Leverage cloud features to optimize costs
• Spot vs on-demand
13
Security
• Firewall & Network ACLs to prevent attackers intruding
data flowing through clusters.
• Encryption of data at rest
• Temporary storage
• Permanent storage
• Encryption of data during transit
14
Databricks Clusters
Powered by Databricks Runtime
15
Databricks Clusters
• Resilient to transient cloud failures
• Optimized for high throughput
• High availability
• Scalable to handle thousands of nodes
• Bulletproof security
• Optimized for lowering your cloud costs
16
Databricks Clusters
in Autopilot Mode
17
Cluster in Autopilot mode
Just focus on your data; not the underlying
infrastructure. (#serverless)
18
Cluster in Autopilot mode
• Automatic scaling of compute
• Automatic scaling of instance storage
• Automatic recovery
• Automatic software updates
• Automatic caching
• Automatic start and termination
• Automatic configuration
• Automatic monitoring instrumentation
• Automatic resilience to spot price fluctuations
19
Autoscale compute
• Do not worry about how many machines are required for
your workload.
• Autoscale compute is based on Spark-native task tracking.
• Guarantees maximum utilization.
20
Autoscale local storage
• Spark requires lot of intermediate disk space.
• Coming up with the right disk space is very painful.
• Automatically scale local storage based on Spark’s disk
space requirements for your job.
21
Automatic Recovery
• Automatic recovery of cluster nodes
• If cluster nodes fail, they get automatically replaced with new
ones.
• Automatic recovery of cluster failures
• If the whole cluster becomes unresponsive for some reason, then
the cluster will be automatically recovered.
22
Automatic Software Updates
• Automatic push of latest updates to cluster’s side car
services every 2 weeks.
• Zero downtime of clusters during pushes.
• Automatic rollout of new Databricks runtime versions that
customers can choose when they create clusters.
23
Automatic Caching
• Automatically move parquet data in cloud storage to
instance’s local storage.
• Blazing fast throughput for repeatedly read data.
• Completely transparent to user.
24
Automatic Termination
• Automatically terminate clusters that are idle.
• Idle time is calculated based on fine-grained Spark task
tracking so that it is more accurate.
25
Automatic Start
• Automatically start clusters when you run commands.
• Auto-start and auto-terminate completely eliminate the
need to worry about underlying infrastructure
26
Automatic Monitoring
Instrumentation• Built-in Ganglia integration
• Visibility into Ganglia metrics even for terminated clusters
27
Automatic resilience to spot price
hikes• Leverage spot as much as possible and fallback to on-
demand for reliability
• Combine it with autoscaling and you can tremendously
reduce your cloud cost.
28
Auto Configuration
• Databricks Serverless auto-configures all the
functionalities out of the box for you.
• You can just specify the minimum parameters you care
about.
29
DEMO
Try Apache Spark in Databricks
Sign up for a free 14-day trial of Databricks
https://databricks.com/try-databricks
Additional Questions?
Contact us at http://go.databricks.com/contact-databricks
3
0

Más contenido relacionado

La actualidad más candente

Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...Tathagata Das
 
Always On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on CassandraAlways On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on CassandraRobbie Strickland
 
Deep dive into spark streaming
Deep dive into spark streamingDeep dive into spark streaming
Deep dive into spark streamingTao Li
 
Deep dive into stateful stream processing in structured streaming by Tathaga...
Deep dive into stateful stream processing in structured streaming  by Tathaga...Deep dive into stateful stream processing in structured streaming  by Tathaga...
Deep dive into stateful stream processing in structured streaming by Tathaga...Databricks
 
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...InfluxData
 
Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data...
Kenneth Knowles -  Apache Beam - A Unified Model for Batch and Streaming Data...Kenneth Knowles -  Apache Beam - A Unified Model for Batch and Streaming Data...
Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data...Flink Forward
 
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...Alexey Kharlamov
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingHari Shreedharan
 
Strava Labs: Exploring a Billion Activity Dataset from Athletes with Apache S...
Strava Labs: Exploring a Billion Activity Dataset from Athletes with Apache S...Strava Labs: Exploring a Billion Activity Dataset from Athletes with Apache S...
Strava Labs: Exploring a Billion Activity Dataset from Athletes with Apache S...Databricks
 
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...DataWorks Summit
 
Mantis: Netflix's Event Stream Processing System
Mantis: Netflix's Event Stream Processing SystemMantis: Netflix's Event Stream Processing System
Mantis: Netflix's Event Stream Processing SystemC4Media
 
Real-time driving score service using Flink
Real-time driving score service using FlinkReal-time driving score service using Flink
Real-time driving score service using FlinkDongwon Kim
 
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...DataStax
 
GoodFit: Multi-Resource Packing of Tasks with Dependencies
GoodFit: Multi-Resource Packing of Tasks with DependenciesGoodFit: Multi-Resource Packing of Tasks with Dependencies
GoodFit: Multi-Resource Packing of Tasks with DependenciesDataWorks Summit/Hadoop Summit
 
Productionizing your Streaming Jobs
Productionizing your Streaming JobsProductionizing your Streaming Jobs
Productionizing your Streaming JobsDatabricks
 
Introduction to near real time computing
Introduction to near real time computingIntroduction to near real time computing
Introduction to near real time computingTao Li
 
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLSebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLFlink Forward
 
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormDavorin Vukelic
 
Spark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan PuSpark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan PuSpark Summit
 
Dataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data ProcessingDataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data ProcessingDoiT International
 

La actualidad más candente (20)

Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
 
Always On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on CassandraAlways On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on Cassandra
 
Deep dive into spark streaming
Deep dive into spark streamingDeep dive into spark streaming
Deep dive into spark streaming
 
Deep dive into stateful stream processing in structured streaming by Tathaga...
Deep dive into stateful stream processing in structured streaming  by Tathaga...Deep dive into stateful stream processing in structured streaming  by Tathaga...
Deep dive into stateful stream processing in structured streaming by Tathaga...
 
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...
 
Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data...
Kenneth Knowles -  Apache Beam - A Unified Model for Batch and Streaming Data...Kenneth Knowles -  Apache Beam - A Unified Model for Batch and Streaming Data...
Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data...
 
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark Streaming
 
Strava Labs: Exploring a Billion Activity Dataset from Athletes with Apache S...
Strava Labs: Exploring a Billion Activity Dataset from Athletes with Apache S...Strava Labs: Exploring a Billion Activity Dataset from Athletes with Apache S...
Strava Labs: Exploring a Billion Activity Dataset from Athletes with Apache S...
 
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
 
Mantis: Netflix's Event Stream Processing System
Mantis: Netflix's Event Stream Processing SystemMantis: Netflix's Event Stream Processing System
Mantis: Netflix's Event Stream Processing System
 
Real-time driving score service using Flink
Real-time driving score service using FlinkReal-time driving score service using Flink
Real-time driving score service using Flink
 
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
 
GoodFit: Multi-Resource Packing of Tasks with Dependencies
GoodFit: Multi-Resource Packing of Tasks with DependenciesGoodFit: Multi-Resource Packing of Tasks with Dependencies
GoodFit: Multi-Resource Packing of Tasks with Dependencies
 
Productionizing your Streaming Jobs
Productionizing your Streaming JobsProductionizing your Streaming Jobs
Productionizing your Streaming Jobs
 
Introduction to near real time computing
Introduction to near real time computingIntroduction to near real time computing
Introduction to near real time computing
 
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLSebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
 
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache Storm
 
Spark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan PuSpark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan Pu
 
Dataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data ProcessingDataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data Processing
 

Similar a Databricks clusters in autopilot mode

HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyPeter Clapham
 
The impact of cloud NSBCon NY by Yves Goeleven
The impact of cloud NSBCon NY by Yves GoelevenThe impact of cloud NSBCon NY by Yves Goeleven
The impact of cloud NSBCon NY by Yves GoelevenParticular Software
 
Why Kubernetes as a container orchestrator is a right choice for running spar...
Why Kubernetes as a container orchestrator is a right choice for running spar...Why Kubernetes as a container orchestrator is a right choice for running spar...
Why Kubernetes as a container orchestrator is a right choice for running spar...DataWorks Summit
 
What is Cloud computing?
What is Cloud computing?What is Cloud computing?
What is Cloud computing?Richard Harvey
 
Stay productive_while_slicing_up_the_monolith
Stay productive_while_slicing_up_the_monolithStay productive_while_slicing_up_the_monolith
Stay productive_while_slicing_up_the_monolithMarkus Eisele
 
Infraestructura oracle
Infraestructura oracleInfraestructura oracle
Infraestructura oracleFran Navarro
 
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...Qubole
 
Nutanix and microsoft_webinar_oct_28
Nutanix and microsoft_webinar_oct_28Nutanix and microsoft_webinar_oct_28
Nutanix and microsoft_webinar_oct_28groberts52
 
Building Scalable Applications with Microsoft Azure
Building Scalable Applications with Microsoft AzureBuilding Scalable Applications with Microsoft Azure
Building Scalable Applications with Microsoft AzureFisnik Doko
 
Monitoring MySQL at scale
Monitoring MySQL at scaleMonitoring MySQL at scale
Monitoring MySQL at scaleOvais Tariq
 
Hpc lunch and learn
Hpc lunch and learnHpc lunch and learn
Hpc lunch and learnJohn D Almon
 
Cloudstack for beginners
Cloudstack for beginnersCloudstack for beginners
Cloudstack for beginnersJoseph Amirani
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservicesBigstep
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudyJohn Adams
 
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"Fwdays
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaHelena Edelson
 
Scylla Summit 2016: Compose on Containing the Database
Scylla Summit 2016: Compose on Containing the DatabaseScylla Summit 2016: Compose on Containing the Database
Scylla Summit 2016: Compose on Containing the DatabaseScyllaDB
 
mParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from CassandramParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from CassandraScyllaDB
 

Similar a Databricks clusters in autopilot mode (20)

HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 
The impact of cloud NSBCon NY by Yves Goeleven
The impact of cloud NSBCon NY by Yves GoelevenThe impact of cloud NSBCon NY by Yves Goeleven
The impact of cloud NSBCon NY by Yves Goeleven
 
Why Kubernetes as a container orchestrator is a right choice for running spar...
Why Kubernetes as a container orchestrator is a right choice for running spar...Why Kubernetes as a container orchestrator is a right choice for running spar...
Why Kubernetes as a container orchestrator is a right choice for running spar...
 
Running Galera Cluster on Microsoft Azure
Running Galera Cluster on Microsoft AzureRunning Galera Cluster on Microsoft Azure
Running Galera Cluster on Microsoft Azure
 
What is Cloud computing?
What is Cloud computing?What is Cloud computing?
What is Cloud computing?
 
Stay productive_while_slicing_up_the_monolith
Stay productive_while_slicing_up_the_monolithStay productive_while_slicing_up_the_monolith
Stay productive_while_slicing_up_the_monolith
 
Infraestructura oracle
Infraestructura oracleInfraestructura oracle
Infraestructura oracle
 
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
 
Nutanix and microsoft_webinar_oct_28
Nutanix and microsoft_webinar_oct_28Nutanix and microsoft_webinar_oct_28
Nutanix and microsoft_webinar_oct_28
 
Building Scalable Applications with Microsoft Azure
Building Scalable Applications with Microsoft AzureBuilding Scalable Applications with Microsoft Azure
Building Scalable Applications with Microsoft Azure
 
Monitoring MySQL at scale
Monitoring MySQL at scaleMonitoring MySQL at scale
Monitoring MySQL at scale
 
Hpc lunch and learn
Hpc lunch and learnHpc lunch and learn
Hpc lunch and learn
 
Cloudstack for beginners
Cloudstack for beginnersCloudstack for beginners
Cloudstack for beginners
 
Cloud stack for_beginners
Cloud stack for_beginnersCloud stack for_beginners
Cloud stack for_beginners
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservices
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
 
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and Akka
 
Scylla Summit 2016: Compose on Containing the Database
Scylla Summit 2016: Compose on Containing the DatabaseScylla Summit 2016: Compose on Containing the Database
Scylla Summit 2016: Compose on Containing the Database
 
mParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from CassandramParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from Cassandra
 

Último

Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 

Último (20)

Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 

Databricks clusters in autopilot mode

  • 1. Databricks Clusters in Autopilot Mode Prakash Chockalingam October, 2017
  • 2. Housekeeping • Your connection will be muted • Submit questions via the Q&A panel • Questions will be answered at the end of the webinar • Any outstanding questions will be answered in the Databricks Forum (https://forums.databricks.com) • Webinar will be recorded and attachments will be made available via www.databricks.com 2
  • 3. About Prakash Prakash Chockalingam ● Product Manager at Databricks ● Works closely with customers ● Deep experience building large scale distributed systems and machine learning infrastructure at Netflix and Yahoo 3
  • 4. 4 Agenda • About Databricks • Challenges involved in building a big data platform • How Databricks simplifies devops with lot of ‘automatic’ features in Databricks clusters. • Demo of the key features that sets up the Databricks clusters in an autopilot mode.
  • 5. 5 Accelerate innovation by unifying data science, engineering and business • Founded by the creators of Apache Spark • Contributes 75% of the open source code, 10x more than any other company • Trained 40k+ Spark users on the Databricks platform VISIO N WHO WE ARE Unified Analytics Platform powered by Apache SparkPRODUC T
  • 6. 6 Building a big data platform is hard Availability Reliability Lower cloud cost Simplicity Throughput Scalability Security
  • 7. 7 Cluster Reliability Commodity hardware . Good performance/$ • But lot of things can go wrong - bad disks, flaky instances, network errors, etc. Heterogenous workloads running user code • Requires proper isolation and fault tolerance. Uneven distribution • Data skew, bursty requests, etc
  • 8. 8 Data Throughput through Cluster • Read/Write large number of records efficiently • Low latency under high throughput traffic • Right tradeoff with reliability and throughput is required. Instance Disk Memory Cloud Storage HigherThroughput LowerReliability Available Size
  • 9. 9 Scalability Scalability across different dimensions: • Number of clusters • Platform must be able to handle 100s of cluster requests. • # of nodes in a cluster • Platform must be able to handle 100s of nodes in a single cluster. • Size of data • Handle large volumes of data.
  • 10. 10 Cluster Availability Must be easy to upgrade even with large scale • Zero downtime upgrades for clusters so that no production workloads are affected. • Easy mechanisms to rollback Instrument monitoring & alerting • Easily detect & alert on failures • Track performance & utilization Fast recoverability from failures
  • 11. 11 Simplicity Easy to use • Interface must be intuitive and easy for the developers to use Debuggability • Developers must be able to easily access metrics and logs to troubleshoot their code.
  • 12. 12 Lowering cloud cost Optimum resource utilization • Fine-grained resource sharing Elasticity • Autoscaling resources Leverage cloud features to optimize costs • Spot vs on-demand
  • 13. 13 Security • Firewall & Network ACLs to prevent attackers intruding data flowing through clusters. • Encryption of data at rest • Temporary storage • Permanent storage • Encryption of data during transit
  • 14. 14 Databricks Clusters Powered by Databricks Runtime
  • 15. 15 Databricks Clusters • Resilient to transient cloud failures • Optimized for high throughput • High availability • Scalable to handle thousands of nodes • Bulletproof security • Optimized for lowering your cloud costs
  • 17. 17 Cluster in Autopilot mode Just focus on your data; not the underlying infrastructure. (#serverless)
  • 18. 18 Cluster in Autopilot mode • Automatic scaling of compute • Automatic scaling of instance storage • Automatic recovery • Automatic software updates • Automatic caching • Automatic start and termination • Automatic configuration • Automatic monitoring instrumentation • Automatic resilience to spot price fluctuations
  • 19. 19 Autoscale compute • Do not worry about how many machines are required for your workload. • Autoscale compute is based on Spark-native task tracking. • Guarantees maximum utilization.
  • 20. 20 Autoscale local storage • Spark requires lot of intermediate disk space. • Coming up with the right disk space is very painful. • Automatically scale local storage based on Spark’s disk space requirements for your job.
  • 21. 21 Automatic Recovery • Automatic recovery of cluster nodes • If cluster nodes fail, they get automatically replaced with new ones. • Automatic recovery of cluster failures • If the whole cluster becomes unresponsive for some reason, then the cluster will be automatically recovered.
  • 22. 22 Automatic Software Updates • Automatic push of latest updates to cluster’s side car services every 2 weeks. • Zero downtime of clusters during pushes. • Automatic rollout of new Databricks runtime versions that customers can choose when they create clusters.
  • 23. 23 Automatic Caching • Automatically move parquet data in cloud storage to instance’s local storage. • Blazing fast throughput for repeatedly read data. • Completely transparent to user.
  • 24. 24 Automatic Termination • Automatically terminate clusters that are idle. • Idle time is calculated based on fine-grained Spark task tracking so that it is more accurate.
  • 25. 25 Automatic Start • Automatically start clusters when you run commands. • Auto-start and auto-terminate completely eliminate the need to worry about underlying infrastructure
  • 26. 26 Automatic Monitoring Instrumentation• Built-in Ganglia integration • Visibility into Ganglia metrics even for terminated clusters
  • 27. 27 Automatic resilience to spot price hikes• Leverage spot as much as possible and fallback to on- demand for reliability • Combine it with autoscaling and you can tremendously reduce your cloud cost.
  • 28. 28 Auto Configuration • Databricks Serverless auto-configures all the functionalities out of the box for you. • You can just specify the minimum parameters you care about.
  • 30. Try Apache Spark in Databricks Sign up for a free 14-day trial of Databricks https://databricks.com/try-databricks Additional Questions? Contact us at http://go.databricks.com/contact-databricks 3 0