Databricks clusters in autopilot mode

Databricks Clusters in
Autopilot Mode
Prakash Chockalingam
October, 2017

Housekeeping
• Your connection will be muted
• Submit questions via the Q&A panel
• Questions will be answered at the end of the webinar
• Any outstanding questions will be answered in the Databricks Forum
(https://forums.databricks.com)
• Webinar will be recorded and attachments will be made available via
www.databricks.com
2

About Prakash
Prakash Chockalingam
● Product Manager at Databricks
● Works closely with customers
● Deep experience building large scale
distributed systems and machine learning
infrastructure at Netflix and Yahoo
3

4
Agenda
• About Databricks
• Challenges involved in building a big data platform
• How Databricks simplifies devops with lot of ‘automatic’
features in Databricks clusters.
• Demo of the key features that sets up the Databricks
clusters in an autopilot mode.

5
Accelerate innovation by unifying data science,
engineering and business
• Founded by the creators of Apache Spark
• Contributes 75% of the open source code, 10x
more than any other company
• Trained 40k+ Spark users on the Databricks
platform
VISIO
N
WHO WE
ARE
Unified Analytics Platform powered by Apache SparkPRODUC
T

6
Building a big data platform is hard
Availability Reliability Lower cloud cost Simplicity
Throughput Scalability Security

7
Cluster Reliability
Commodity hardware . Good performance/$
• But lot of things can go wrong - bad disks, flaky
instances, network errors, etc.
Heterogenous workloads running user code
• Requires proper isolation and fault tolerance.
Uneven distribution
• Data skew, bursty requests, etc

8
Data Throughput through Cluster
• Read/Write large number of records
efficiently
• Low latency under high throughput
traffic
• Right tradeoff with reliability and
throughput is required.
Instance Disk
Memory
Cloud Storage
HigherThroughput
LowerReliability
Available Size

9
Scalability
Scalability across different dimensions:
• Number of clusters
• Platform must be able to handle 100s of cluster requests.
• # of nodes in a cluster
• Platform must be able to handle 100s of nodes in a single cluster.
• Size of data
• Handle large volumes of data.

10
Cluster Availability
Must be easy to upgrade even with large scale
• Zero downtime upgrades for clusters so that no
production workloads are affected.
• Easy mechanisms to rollback
Instrument monitoring & alerting
• Easily detect & alert on failures
• Track performance & utilization
Fast recoverability from failures

11
Simplicity
Easy to use
• Interface must be intuitive and easy for the
developers to use
Debuggability
• Developers must be able to easily access
metrics and logs to troubleshoot their code.

12
Lowering cloud cost
Optimum resource utilization
• Fine-grained resource sharing
Elasticity
• Autoscaling resources
Leverage cloud features to optimize costs
• Spot vs on-demand

13
Security
• Firewall & Network ACLs to prevent attackers intruding
data flowing through clusters.
• Encryption of data at rest
• Temporary storage
• Permanent storage
• Encryption of data during transit

14
Databricks Clusters
Powered by Databricks Runtime

15
Databricks Clusters
• Resilient to transient cloud failures
• Optimized for high throughput
• High availability
• Scalable to handle thousands of nodes
• Bulletproof security
• Optimized for lowering your cloud costs

16
Databricks Clusters
in Autopilot Mode

17
Cluster in Autopilot mode
Just focus on your data; not the underlying
infrastructure. (#serverless)

18
Cluster in Autopilot mode
• Automatic scaling of compute
• Automatic scaling of instance storage
• Automatic recovery
• Automatic software updates
• Automatic caching
• Automatic start and termination
• Automatic configuration
• Automatic monitoring instrumentation
• Automatic resilience to spot price fluctuations

19
Autoscale compute
• Do not worry about how many machines are required for
your workload.
• Autoscale compute is based on Spark-native task tracking.
• Guarantees maximum utilization.

20
Autoscale local storage
• Spark requires lot of intermediate disk space.
• Coming up with the right disk space is very painful.
• Automatically scale local storage based on Spark’s disk
space requirements for your job.

21
Automatic Recovery
• Automatic recovery of cluster nodes
• If cluster nodes fail, they get automatically replaced with new
ones.
• Automatic recovery of cluster failures
• If the whole cluster becomes unresponsive for some reason, then
the cluster will be automatically recovered.

22
Automatic Software Updates
• Automatic push of latest updates to cluster’s side car
services every 2 weeks.
• Zero downtime of clusters during pushes.
• Automatic rollout of new Databricks runtime versions that
customers can choose when they create clusters.

23
Automatic Caching
• Automatically move parquet data in cloud storage to
instance’s local storage.
• Blazing fast throughput for repeatedly read data.
• Completely transparent to user.

24
Automatic Termination
• Automatically terminate clusters that are idle.
• Idle time is calculated based on fine-grained Spark task
tracking so that it is more accurate.

25
Automatic Start
• Automatically start clusters when you run commands.
• Auto-start and auto-terminate completely eliminate the
need to worry about underlying infrastructure

26
Automatic Monitoring
Instrumentation• Built-in Ganglia integration
• Visibility into Ganglia metrics even for terminated clusters

27
Automatic resilience to spot price
hikes• Leverage spot as much as possible and fallback to on-
demand for reliability
• Combine it with autoscaling and you can tremendously
reduce your cloud cost.

28
Auto Configuration
• Databricks Serverless auto-configures all the
functionalities out of the box for you.
• You can just specify the minimum parameters you care
about.

Try Apache Spark in Databricks
Sign up for a free 14-day trial of Databricks
https://databricks.com/try-databricks
Additional Questions?
Contact us at http://go.databricks.com/contact-databricks
3
0

Databricks clusters in autopilot mode

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Databricks clusters in autopilot mode

Similar a Databricks clusters in autopilot mode (20)

Último

Último (20)

Databricks clusters in autopilot mode