SlideShare una empresa de Scribd logo
1 de 40
Descargar para leer sin conexión
Cost Efficiency Strategies for
Managed Apache Spark Services
Adi Polak
Microsoft
Find me on Social Media – Adi Polak
Twitter @adipolak
Medium – https://medium.com/@adipolak
Dev.to - dev.to/adipolak
LinkedIn - https://www.linkedin.com/in/adi-polak-68548365/
Agenda
▪ Motivation
▪ Tools
▪ Azure Databricks
▪ Cost Optimizations Strategies
▪ Wrap-up
Motivation
Start from the Beginning
Business idea ( Product Manager )
Prioritization Process (R&D)
Design & Build (Software Dev HLD)
HDL
▪ Requirements
▪ Features
▪ Architecture
▪ Test Plans
▪ Security
▪ Deployments
▪ Monitoring /Audit Trails
▪ Maintenance
High Level Design
Yeah, but why should I care about costs ?!
- Understand how Budget works – P&L
- Be able to influence Technical Decisions
- Culture of Financial Accountability
https://www.insightpartners.com/blog/product-leaders-are-rd-costs-part-of-your-strategic-discussions/
Tools
Many, many Services
Apache Spark & Cloud Computing delivery model
IaaS vs. PaaS vs. SaaS
Cloud Pricing Calculator
Azure Pricing Calculator - https://azure.microsoft.com/pricing/calculator/
AWS Pricing Calculator - https://calculator.aws/
GCP Pricing Calculator - https://cloud.google.com/products/calculator
Organize Resources for Cost Awareness
▪ Report and billings - Azure Cost Management
▪ Organize – Resources Groups and/or subscriptions
control, reporting, and attribution of costs
Subscription and Billing models
• Pay as you go
• Enterprise Agreements
• …
Azure Databricks
Where to run Spark Workloads
▪ Small - Mid-size Team
▪ Spark expertise
▪ Optimizations
▪ Machines –VMs
▪ Network
▪ Storage
▪ DBU
Kubernetes/IaaS vs. Azure Databricks ( Managed Spark Service )
▪ Bigger Team
▪ K8s + Spark expertise
▪ Optimizations Experts
▪ Machines – VMs
▪ Network
▪ Storage
Resource Consumed:
Plan Tiers
Premium vs Standard
Performance
Security
Monitoring
Databricks Units
▪ DATA ENGINEERING LIGHT
▪ DATA ENGINEERING
▪ DATA ANALYTICS
Three levels of service, AWS + Azure have the same levels
Databricks Data Engineering Light supports:
▪ scheduled JAR, Python, or spark-submit job
▪ Only.
Databricks Light does NOT support:
▪ Delta Lake
▪ Autopilot features such as autoscaling
▪ Highly concurrent, all-purpose clusters
▪ Notebooks, dashboards, and collaboration features
▪ Connectors to various data sources and BI tools
▪ Databricks Light is a runtime environment for jobs (or “automated
workloads”).
DBU: Standard vs. Premium
DBU Standard Premium
Analytics 0.4 0.55
Engineering 0.15 0.5
Engineering Light 0.07 0.22
https://bit.ly/2Tp5Zkh
Workloads Examples
▪ Scheduled Job - Data Engineer
▪ On Demand Job – triggered - Data Engineer / BI / Analytics
▪ Exploratory – Interactive - BI/ML
VMs and DBUs
Prosenjit Chakraborty - https://medium.com/@cprosenjit/azure-databricks-cost-optimizations-5e1e17b39125
Scenario Breakdown
▪ # VMs = 400
▪ Hours run = 1
▪ Cores in VM = 4
▪ General workload
Cost - Engineering VS. Engineering Light - Standart
$$$ = (#VMs*VMs $/hour + $RuntimeType*#DBUs)*(1-performance factor)
156.6
140.94
125.28
109.62
93.96
78.3
132.6 132.6 132.6 132.6 132.6 132.6
0
20
40
60
80
100
120
140
160
180
1 0.9 0.8 0.7 0.6 0.5
Engineering Engineering Light
1- Performance factor
Cost
#VMs = 400
VMs $/hour = 0.279
#DBUs = #VMs*0.75
$EngineeringLight = 0.07
$Engineering = 0.15
Cost - Engineering VS. Engineering Light - Premium
$$$ = (#VMs*VMs $/hour + $RuntimeType*#DBUs)*(1-performance factor)
#VMs = 400
VMs $/hour = 0.279
#DBUs = #VMs*0.75
$EngineeringLight = 0.22
$Engineering = 0. 5
261.6
235.44
209.28
183.12
156.96
130.8
177.6 177.6 177.6 177.6 177.6 177.6
0
50
100
150
200
250
300
1 0.9 0.8 0.7 0.6 0.5
Premium: Engineering VS. Engineering Light
Engineering Engineering Light
Cost 1- Performance factor
Cost Optimizations strategies
1 - Pre Purchase Plan – 1 & 3 years
2 – Select the right runtime & frameworks
• DeltaLake
▪ PySpark Pandas UDF
▪ Photon Engine
3 – Don’t use tmp/local files system storage
▪ dbutils storage is RA-GRS (read-access geo-
redundant storage) - you might not need this type
of storage!
https://bit.ly/2TdXsAi
Cost Optimizations tips
Manage Spending limit
▪ Per subscription
▪ Per management group
▪ Per resource group
▪ Enable Quota alerts
Enable AutoScale
▪ Scaling machines up and down automatically
https://bit.ly/3dMOROK
VMs
▪ Think about your needs
Thank You!
@adipolak

Más contenido relacionado

La actualidad más candente

Introduction to Azure Data Factory
Introduction to Azure Data FactoryIntroduction to Azure Data Factory
Introduction to Azure Data FactorySlava Kokaev
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks FundamentalsDalibor Wijas
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureDatabricks
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure DatabricksJames Serra
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...DATAVERSITY
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake OverviewJames Serra
 
Emerging Trends in Data Engineering
Emerging Trends in Data EngineeringEmerging Trends in Data Engineering
Emerging Trends in Data EngineeringAnanth PackkilDurai
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
DataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven OrganizationsDataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven OrganizationsEllen Friedman
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMatei Zaharia
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks DeltaDatabricks
 
Azure Data Factory Data Flow
Azure Data Factory Data FlowAzure Data Factory Data Flow
Azure Data Factory Data FlowMark Kromer
 
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptxFedoRam1
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardParis Data Engineers !
 
1- Introduction of Azure data factory.pptx
1- Introduction of Azure data factory.pptx1- Introduction of Azure data factory.pptx
1- Introduction of Azure data factory.pptxBRIJESH KUMAR
 
Azure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the CloudAzure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the CloudMark Kromer
 
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)Kent Graziano
 

La actualidad más candente (20)

Introduction to Azure Data Factory
Introduction to Azure Data FactoryIntroduction to Azure Data Factory
Introduction to Azure Data Factory
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks Fundamentals
 
Msbi Architecture
Msbi ArchitectureMsbi Architecture
Msbi Architecture
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Emerging Trends in Data Engineering
Emerging Trends in Data EngineeringEmerging Trends in Data Engineering
Emerging Trends in Data Engineering
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Screw DevOps, Let's Talk DataOps
Screw DevOps, Let's Talk DataOpsScrew DevOps, Let's Talk DataOps
Screw DevOps, Let's Talk DataOps
 
DataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven OrganizationsDataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven Organizations
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse Technology
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
 
Azure Data Factory Data Flow
Azure Data Factory Data FlowAzure Data Factory Data Flow
Azure Data Factory Data Flow
 
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptx
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
 
1- Introduction of Azure data factory.pptx
1- Introduction of Azure data factory.pptx1- Introduction of Azure data factory.pptx
1- Introduction of Azure data factory.pptx
 
Azure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the CloudAzure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the Cloud
 
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
 

Similar a Cost Efficiency Strategies for Managed Apache Spark Service

Sunrun slide for informatica summit - Harish Ramachandraiah
Sunrun slide for informatica summit - Harish RamachandraiahSunrun slide for informatica summit - Harish Ramachandraiah
Sunrun slide for informatica summit - Harish RamachandraiahHarish Ramachandraiah
 
20180701 - 1st Meeting - Data Science Orientation
20180701 - 1st Meeting - Data Science Orientation20180701 - 1st Meeting - Data Science Orientation
20180701 - 1st Meeting - Data Science OrientationDuc Lai Trung Minh
 
Neo4j GraphTour New York_EY Presentation_Michael Moore
Neo4j GraphTour New York_EY Presentation_Michael MooreNeo4j GraphTour New York_EY Presentation_Michael Moore
Neo4j GraphTour New York_EY Presentation_Michael MooreNeo4j
 
Why the Microsoft 365 Administrator should care about the Power Platform Gove...
Why the Microsoft 365 Administrator should care about the Power Platform Gove...Why the Microsoft 365 Administrator should care about the Power Platform Gove...
Why the Microsoft 365 Administrator should care about the Power Platform Gove...Sara Barbosa
 
Using Power BI and Azure as analytics engine for business applications
Using Power BI and Azure as analytics engine for business applicationsUsing Power BI and Azure as analytics engine for business applications
Using Power BI and Azure as analytics engine for business applicationsDigital Illustrated
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyNeo4j
 
(PUGDV04) Embedding Powerapps into a Power BI Dashboard.pptx
(PUGDV04) Embedding Powerapps into a Power BI Dashboard.pptx(PUGDV04) Embedding Powerapps into a Power BI Dashboard.pptx
(PUGDV04) Embedding Powerapps into a Power BI Dashboard.pptxaditya555320
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksDatabricks
 
Shop talk - Project Server 2013
Shop talk - Project Server 2013Shop talk - Project Server 2013
Shop talk - Project Server 2013Chris Givens
 
The Microsoft Well Architected Framework For Data Analytics
The Microsoft Well Architected Framework For Data AnalyticsThe Microsoft Well Architected Framework For Data Analytics
The Microsoft Well Architected Framework For Data AnalyticsStephanie Locke
 
Solution Design & Architecture.pptx
Solution Design & Architecture.pptxSolution Design & Architecture.pptx
Solution Design & Architecture.pptxNikhileshSathyavarap
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimSpark Summit
 
Streamlining Workflows: Unleashing Automation with Azure and Power Automate
Streamlining Workflows: Unleashing Automation with Azure and Power AutomateStreamlining Workflows: Unleashing Automation with Azure and Power Automate
Streamlining Workflows: Unleashing Automation with Azure and Power AutomateHamida Rebai Trabelsi
 
Measure and increase developer productivity with help of Severless by Kazulki...
Measure and increase developer productivity with help of Severless by Kazulki...Measure and increase developer productivity with help of Severless by Kazulki...
Measure and increase developer productivity with help of Severless by Kazulki...Vadym Kazulkin
 
Microsoft Flow advanced: tips, pitfalls, problems and warnings to be known be...
Microsoft Flow advanced: tips, pitfalls, problems and warnings to be known be...Microsoft Flow advanced: tips, pitfalls, problems and warnings to be known be...
Microsoft Flow advanced: tips, pitfalls, problems and warnings to be known be...BIWUG
 
Microsoft for BI and DW: Using the Right Tool for the Job
Microsoft for BI and DW: Using the Right Tool for the JobMicrosoft for BI and DW: Using the Right Tool for the Job
Microsoft for BI and DW: Using the Right Tool for the JobSenturus
 
How to Build TOGAF Architectures With System Architect (2).ppt
How to Build TOGAF Architectures With System Architect (2).pptHow to Build TOGAF Architectures With System Architect (2).ppt
How to Build TOGAF Architectures With System Architect (2).pptStevenShing
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy Neo4j
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyNeo4j
 
Airflow techtonic template
Airflow   techtonic templateAirflow   techtonic template
Airflow techtonic templateSampath Kumar
 

Similar a Cost Efficiency Strategies for Managed Apache Spark Service (20)

Sunrun slide for informatica summit - Harish Ramachandraiah
Sunrun slide for informatica summit - Harish RamachandraiahSunrun slide for informatica summit - Harish Ramachandraiah
Sunrun slide for informatica summit - Harish Ramachandraiah
 
20180701 - 1st Meeting - Data Science Orientation
20180701 - 1st Meeting - Data Science Orientation20180701 - 1st Meeting - Data Science Orientation
20180701 - 1st Meeting - Data Science Orientation
 
Neo4j GraphTour New York_EY Presentation_Michael Moore
Neo4j GraphTour New York_EY Presentation_Michael MooreNeo4j GraphTour New York_EY Presentation_Michael Moore
Neo4j GraphTour New York_EY Presentation_Michael Moore
 
Why the Microsoft 365 Administrator should care about the Power Platform Gove...
Why the Microsoft 365 Administrator should care about the Power Platform Gove...Why the Microsoft 365 Administrator should care about the Power Platform Gove...
Why the Microsoft 365 Administrator should care about the Power Platform Gove...
 
Using Power BI and Azure as analytics engine for business applications
Using Power BI and Azure as analytics engine for business applicationsUsing Power BI and Azure as analytics engine for business applications
Using Power BI and Azure as analytics engine for business applications
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
 
(PUGDV04) Embedding Powerapps into a Power BI Dashboard.pptx
(PUGDV04) Embedding Powerapps into a Power BI Dashboard.pptx(PUGDV04) Embedding Powerapps into a Power BI Dashboard.pptx
(PUGDV04) Embedding Powerapps into a Power BI Dashboard.pptx
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
 
Shop talk - Project Server 2013
Shop talk - Project Server 2013Shop talk - Project Server 2013
Shop talk - Project Server 2013
 
The Microsoft Well Architected Framework For Data Analytics
The Microsoft Well Architected Framework For Data AnalyticsThe Microsoft Well Architected Framework For Data Analytics
The Microsoft Well Architected Framework For Data Analytics
 
Solution Design & Architecture.pptx
Solution Design & Architecture.pptxSolution Design & Architecture.pptx
Solution Design & Architecture.pptx
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
 
Streamlining Workflows: Unleashing Automation with Azure and Power Automate
Streamlining Workflows: Unleashing Automation with Azure and Power AutomateStreamlining Workflows: Unleashing Automation with Azure and Power Automate
Streamlining Workflows: Unleashing Automation with Azure and Power Automate
 
Measure and increase developer productivity with help of Severless by Kazulki...
Measure and increase developer productivity with help of Severless by Kazulki...Measure and increase developer productivity with help of Severless by Kazulki...
Measure and increase developer productivity with help of Severless by Kazulki...
 
Microsoft Flow advanced: tips, pitfalls, problems and warnings to be known be...
Microsoft Flow advanced: tips, pitfalls, problems and warnings to be known be...Microsoft Flow advanced: tips, pitfalls, problems and warnings to be known be...
Microsoft Flow advanced: tips, pitfalls, problems and warnings to be known be...
 
Microsoft for BI and DW: Using the Right Tool for the Job
Microsoft for BI and DW: Using the Right Tool for the JobMicrosoft for BI and DW: Using the Right Tool for the Job
Microsoft for BI and DW: Using the Right Tool for the Job
 
How to Build TOGAF Architectures With System Architect (2).ppt
How to Build TOGAF Architectures With System Architect (2).pptHow to Build TOGAF Architectures With System Architect (2).ppt
How to Build TOGAF Architectures With System Architect (2).ppt
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
 
Airflow techtonic template
Airflow   techtonic templateAirflow   techtonic template
Airflow techtonic template
 

Más de Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionDatabricks
 

Más de Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
 

Último

Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachBoston Institute of Analytics
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...gajnagarg
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...gajnagarg
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...gajnagarg
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...gajnagarg
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...amitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...gajnagarg
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 

Último (20)

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 

Cost Efficiency Strategies for Managed Apache Spark Service

  • 1. Cost Efficiency Strategies for Managed Apache Spark Services Adi Polak Microsoft
  • 2. Find me on Social Media – Adi Polak Twitter @adipolak Medium – https://medium.com/@adipolak Dev.to - dev.to/adipolak LinkedIn - https://www.linkedin.com/in/adi-polak-68548365/
  • 3. Agenda ▪ Motivation ▪ Tools ▪ Azure Databricks ▪ Cost Optimizations Strategies ▪ Wrap-up
  • 5. Start from the Beginning Business idea ( Product Manager ) Prioritization Process (R&D) Design & Build (Software Dev HLD)
  • 6. HDL ▪ Requirements ▪ Features ▪ Architecture ▪ Test Plans ▪ Security ▪ Deployments ▪ Monitoring /Audit Trails ▪ Maintenance High Level Design
  • 7. Yeah, but why should I care about costs ?! - Understand how Budget works – P&L - Be able to influence Technical Decisions - Culture of Financial Accountability
  • 11. Apache Spark & Cloud Computing delivery model IaaS vs. PaaS vs. SaaS
  • 12. Cloud Pricing Calculator Azure Pricing Calculator - https://azure.microsoft.com/pricing/calculator/ AWS Pricing Calculator - https://calculator.aws/ GCP Pricing Calculator - https://cloud.google.com/products/calculator
  • 13. Organize Resources for Cost Awareness ▪ Report and billings - Azure Cost Management ▪ Organize – Resources Groups and/or subscriptions control, reporting, and attribution of costs
  • 14.
  • 15. Subscription and Billing models • Pay as you go • Enterprise Agreements • …
  • 17. Where to run Spark Workloads ▪ Small - Mid-size Team ▪ Spark expertise ▪ Optimizations ▪ Machines –VMs ▪ Network ▪ Storage ▪ DBU Kubernetes/IaaS vs. Azure Databricks ( Managed Spark Service ) ▪ Bigger Team ▪ K8s + Spark expertise ▪ Optimizations Experts ▪ Machines – VMs ▪ Network ▪ Storage
  • 20.
  • 21.
  • 23. Databricks Units ▪ DATA ENGINEERING LIGHT ▪ DATA ENGINEERING ▪ DATA ANALYTICS Three levels of service, AWS + Azure have the same levels
  • 24. Databricks Data Engineering Light supports: ▪ scheduled JAR, Python, or spark-submit job ▪ Only.
  • 25. Databricks Light does NOT support: ▪ Delta Lake ▪ Autopilot features such as autoscaling ▪ Highly concurrent, all-purpose clusters ▪ Notebooks, dashboards, and collaboration features ▪ Connectors to various data sources and BI tools ▪ Databricks Light is a runtime environment for jobs (or “automated workloads”).
  • 26. DBU: Standard vs. Premium DBU Standard Premium Analytics 0.4 0.55 Engineering 0.15 0.5 Engineering Light 0.07 0.22 https://bit.ly/2Tp5Zkh
  • 27. Workloads Examples ▪ Scheduled Job - Data Engineer ▪ On Demand Job – triggered - Data Engineer / BI / Analytics ▪ Exploratory – Interactive - BI/ML
  • 28. VMs and DBUs Prosenjit Chakraborty - https://medium.com/@cprosenjit/azure-databricks-cost-optimizations-5e1e17b39125
  • 29. Scenario Breakdown ▪ # VMs = 400 ▪ Hours run = 1 ▪ Cores in VM = 4 ▪ General workload
  • 30. Cost - Engineering VS. Engineering Light - Standart $$$ = (#VMs*VMs $/hour + $RuntimeType*#DBUs)*(1-performance factor) 156.6 140.94 125.28 109.62 93.96 78.3 132.6 132.6 132.6 132.6 132.6 132.6 0 20 40 60 80 100 120 140 160 180 1 0.9 0.8 0.7 0.6 0.5 Engineering Engineering Light 1- Performance factor Cost #VMs = 400 VMs $/hour = 0.279 #DBUs = #VMs*0.75 $EngineeringLight = 0.07 $Engineering = 0.15
  • 31. Cost - Engineering VS. Engineering Light - Premium $$$ = (#VMs*VMs $/hour + $RuntimeType*#DBUs)*(1-performance factor) #VMs = 400 VMs $/hour = 0.279 #DBUs = #VMs*0.75 $EngineeringLight = 0.22 $Engineering = 0. 5 261.6 235.44 209.28 183.12 156.96 130.8 177.6 177.6 177.6 177.6 177.6 177.6 0 50 100 150 200 250 300 1 0.9 0.8 0.7 0.6 0.5 Premium: Engineering VS. Engineering Light Engineering Engineering Light Cost 1- Performance factor
  • 33. 1 - Pre Purchase Plan – 1 & 3 years
  • 34. 2 – Select the right runtime & frameworks • DeltaLake ▪ PySpark Pandas UDF ▪ Photon Engine
  • 35. 3 – Don’t use tmp/local files system storage ▪ dbutils storage is RA-GRS (read-access geo- redundant storage) - you might not need this type of storage! https://bit.ly/2TdXsAi
  • 37. Manage Spending limit ▪ Per subscription ▪ Per management group ▪ Per resource group ▪ Enable Quota alerts
  • 38. Enable AutoScale ▪ Scaling machines up and down automatically https://bit.ly/3dMOROK
  • 39. VMs ▪ Think about your needs