SlideShare una empresa de Scribd logo
1 de 56
Descargar para leer sin conexión
AI at Scale
Adi Polak | Microsoft
@adipolak
Start from the beginning
• Massive datasets
• Intelligent product
@adipolak
• Apache Spark
• Databricks for Production
• Build machine learning models with
Pipelines
• Cognitive Services and Apache Spark
• Data Engineer vs Data Scientist
• MMLSpark
Overview
@adipolak
What is
Apache Spark
What is Apache Spark?
@adipolak
Motivation
@adipolak
Apache Spark architecture
@adipolak
Apache Spark submit job
storage storage storage
@adipolak
Framework libraries- Software Stracture
@adipolak
Python, R & .NET for Apache Spark
@adipolak
Embedded Service
PySpark
Spark Scala Process
Pyspark
Protocol
Spark Worker
@adipolak
Structured
Streaming
MLlib GraphFrame TensorFrames
SQL SparkSession / DataFrame / Dataset APIs
Data Source
Connectors
Catalyst Optimization & Tungsten Execution
Spark Core (RDD APIs)
SQL
API
More API
@adipolak
Why Apache Spark is fast and efficient?
CATALYST OPTIMIZER IN MEMORY
COMPUTING
TUNGSTEN
@adipolak
CATALYST
@adipolak
Fundamentals of Catalyst Optimizer
SUB
Attribute(x) SUB
some_func(1) some_func(2)
Tree Rules
SUB
Attribute(x) some_func(-1)
@adipolak
Spark SQL Execution Plan
Logical optimization –> Optimization rules
• Constant folding
• Predicate pushdown
• Projection pruning
• …
Catalyst
Frontend Backend
@adipolak
Apache Spark
Connectors
Spark Connector for SQL Server
@adipolak
Spark connector for Cosmos DB
@adipolak
Many more connectors…
@adipolak
How to run
Apache
Spark on
Azure?
HDInsight
Azure Databricks
AKS
Azure Databricks
What is Azure Databricks
@adipolak
Workspace and Notebooks
Build for Production
• - Databricks limits and Azure limits
• - Workspace isolation
• - vnet
• - Security and key vault
• - Enable log analytics for monitoring
Databricks limits and Azure limits
• Key Databricks limits ( per workspace):
• - 150 concurrent jobs
• - 150 max notebooks
• - 1000 max hourly job submission
Key Azure limits :
- Storage accounts per region per
subscription: 250
- VMs per subscription per
region: 25,000
- Resource groups per subscription: 980
@adipolak
Workspace isolation
@adipolak
Cluster modes and their characteristics
@adipolak
Batch vs. Interactive workloads
@adipolak
Data Scientists and Analysts –
Hight concurrency mode
@adipolak
Data Engineers
@adipolak
Troubleshooting with cluster logs
• Azure Databricks provides three kinds of logging of cluster-related
activity:
• Cluster event logs, which capture cluster lifecycle events, like creation,
termination, configuration edits, and so on.
• Apache Spark driver and worker logs, which you can use for
debugging.
• Cluster init-script logs, valuable for debugging init scripts.
@adipolak
Cluster logs
Cluster
Metric
Report
@adipolak
Log Analytics Workspace
• Leverage Azure log analytics and connect it to the workspace
• with init scripts and visualize it using Kibana or other tools.
@adipolak
What
About
Machine
Learning?
Gather Data
ML Process / Basic Life Cycle
Feature Extract, Clean and Normalize
Select algorithm
Evaluate model
Data visualization
4
5
1
2
3
Repeat!
@adipolak
H O W ?
M L P i p e l i n e s !
@adipolak
What are pipelines?
Tools
Spark
Streaming
Spark ML/Other
Spark SQL MLflow
@adipolak
@adipolak
Demo
@adipolak
But in real life:
Accuracy < 0.5
AUC L
Results are far
from being good
Building machine learning models
from scratch is hard!
@adipolak
We need a Data Science
@adipolak
Our Data scientist needs tools
@adipolak
Stand on the
shoulder of giants
@adipolak
Azure Cognitive Services
@adipolak
Cognitive Services capabilities
Infuse your apps, websites, and bots with human-like intelligence
A variety of real-world applications
Vision Speech
Intent: PlayCall
Language Knowledge Search
Let’s Combine them
+
@adipolak
Apache Spark + Cognitive Services
@adipolak
@adipolak
Data Engineer vs Data Scientist
Advanced programming
Distributed systems
Data pipeline
Basic analytics
Data Engineer
Advance math/statistics
ML/AI
Advanced analytics
Basic programming
Data Science
Big Data constrains
Both should understand:
@adipolak
MMLSpark:
Microsoft Machine
Learning for
Apache Spark
@adipolak
aka.ms/mlflow_and_azure_ml
aka.ms/azure_databricks
aka.ms/twitter_sentiment_analysis
Want to learn more?
@adipolak
Thank you! Dank u! Merci!

Más contenido relacionado

La actualidad más candente

Distributed ML/DL with Ignite ML Module Using Apache Spark as Database
Distributed ML/DL with Ignite ML Module Using Apache Spark as DatabaseDistributed ML/DL with Ignite ML Module Using Apache Spark as Database
Distributed ML/DL with Ignite ML Module Using Apache Spark as Database
Databricks
 
An AI-Powered Chatbot to Simplify Apache Spark Performance Management
An AI-Powered Chatbot to Simplify Apache Spark Performance ManagementAn AI-Powered Chatbot to Simplify Apache Spark Performance Management
An AI-Powered Chatbot to Simplify Apache Spark Performance Management
Databricks
 

La actualidad más candente (20)

5 reasons why spark is in demand!
5 reasons why spark is in demand!5 reasons why spark is in demand!
5 reasons why spark is in demand!
 
Spark ML Pipeline serving
Spark ML Pipeline servingSpark ML Pipeline serving
Spark ML Pipeline serving
 
Distributed ML/DL with Ignite ML Module Using Apache Spark as Database
Distributed ML/DL with Ignite ML Module Using Apache Spark as DatabaseDistributed ML/DL with Ignite ML Module Using Apache Spark as Database
Distributed ML/DL with Ignite ML Module Using Apache Spark as Database
 
Scalable Monitoring Using Apache Spark and Friends with Utkarsh Bhatnagar
Scalable Monitoring Using Apache Spark and Friends with Utkarsh BhatnagarScalable Monitoring Using Apache Spark and Friends with Utkarsh Bhatnagar
Scalable Monitoring Using Apache Spark and Friends with Utkarsh Bhatnagar
 
Internals of Speeding up PySpark with Arrow
 Internals of Speeding up PySpark with Arrow Internals of Speeding up PySpark with Arrow
Internals of Speeding up PySpark with Arrow
 
An AI-Powered Chatbot to Simplify Apache Spark Performance Management
An AI-Powered Chatbot to Simplify Apache Spark Performance ManagementAn AI-Powered Chatbot to Simplify Apache Spark Performance Management
An AI-Powered Chatbot to Simplify Apache Spark Performance Management
 
Spark at Airbnb
Spark at AirbnbSpark at Airbnb
Spark at Airbnb
 
The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...
The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...
The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...
 
Headaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous ApplicationsHeadaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous Applications
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
 
Scaling ML-Based Threat Detection For Production Cyber Attacks
Scaling ML-Based Threat Detection For Production Cyber AttacksScaling ML-Based Threat Detection For Production Cyber Attacks
Scaling ML-Based Threat Detection For Production Cyber Attacks
 
Spline 2 - Vision and Architecture Overview
Spline 2 - Vision and Architecture OverviewSpline 2 - Vision and Architecture Overview
Spline 2 - Vision and Architecture Overview
 
Delight: An Improved Apache Spark UI, Free, and Cross-Platform
Delight: An Improved Apache Spark UI, Free, and Cross-PlatformDelight: An Improved Apache Spark UI, Free, and Cross-Platform
Delight: An Improved Apache Spark UI, Free, and Cross-Platform
 
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache SparkData-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
 
How Apache Spark Is Helping Tame the Wild West of Wi-Fi
How Apache Spark Is Helping Tame the Wild West of Wi-FiHow Apache Spark Is Helping Tame the Wild West of Wi-Fi
How Apache Spark Is Helping Tame the Wild West of Wi-Fi
 
Tuning ML Models: Scaling, Workflows, and Architecture
Tuning ML Models: Scaling, Workflows, and ArchitectureTuning ML Models: Scaling, Workflows, and Architecture
Tuning ML Models: Scaling, Workflows, and Architecture
 
Visualizing AutoTrader Traffic in Near Real-Time with Spark Streaming-(Jon Gr...
Visualizing AutoTrader Traffic in Near Real-Time with Spark Streaming-(Jon Gr...Visualizing AutoTrader Traffic in Near Real-Time with Spark Streaming-(Jon Gr...
Visualizing AutoTrader Traffic in Near Real-Time with Spark Streaming-(Jon Gr...
 
Koalas: Unifying Spark and pandas APIs
Koalas: Unifying Spark and pandas APIsKoalas: Unifying Spark and pandas APIs
Koalas: Unifying Spark and pandas APIs
 
Model Parallelism in Spark ML Cross-Validation with Nick Pentreath and Bryan ...
Model Parallelism in Spark ML Cross-Validation with Nick Pentreath and Bryan ...Model Parallelism in Spark ML Cross-Validation with Nick Pentreath and Bryan ...
Model Parallelism in Spark ML Cross-Validation with Nick Pentreath and Bryan ...
 
Adi Polak - Light up the Spark in Catalyst by avoiding UDFs - Codemotion Mila...
Adi Polak - Light up the Spark in Catalyst by avoiding UDFs - Codemotion Mila...Adi Polak - Light up the Spark in Catalyst by avoiding UDFs - Codemotion Mila...
Adi Polak - Light up the Spark in Catalyst by avoiding UDFs - Codemotion Mila...
 

Similar a AI at Scale

Media_Entertainment_Veriticals
Media_Entertainment_VeriticalsMedia_Entertainment_Veriticals
Media_Entertainment_Veriticals
Peyman Mohajerian
 
2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3 2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3
Chester Chen
 
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Simplilearn
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
Simplilearn
 
Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks
Databricks
 

Similar a AI at Scale (20)

An Insider’s Guide to Maximizing Spark SQL Performance
 An Insider’s Guide to Maximizing Spark SQL Performance An Insider’s Guide to Maximizing Spark SQL Performance
An Insider’s Guide to Maximizing Spark SQL Performance
 
Fighting Fraud with Apache Spark
Fighting Fraud with Apache SparkFighting Fraud with Apache Spark
Fighting Fraud with Apache Spark
 
Spark + AI Summit 2020 イベント概要
Spark + AI Summit 2020 イベント概要Spark + AI Summit 2020 イベント概要
Spark + AI Summit 2020 イベント概要
 
Media_Entertainment_Veriticals
Media_Entertainment_VeriticalsMedia_Entertainment_Veriticals
Media_Entertainment_Veriticals
 
Apache Spark in Scientific Applications
Apache Spark in Scientific ApplicationsApache Spark in Scientific Applications
Apache Spark in Scientific Applications
 
Apache Spark in Scientific Applciations
Apache Spark in Scientific ApplciationsApache Spark in Scientific Applciations
Apache Spark in Scientific Applciations
 
IBM Strategy for Spark
IBM Strategy for SparkIBM Strategy for Spark
IBM Strategy for Spark
 
What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3
 
2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3 2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3
 
End-to-End Data Pipelines with Apache Spark
End-to-End Data Pipelines with Apache SparkEnd-to-End Data Pipelines with Apache Spark
End-to-End Data Pipelines with Apache Spark
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on Databricks
 
Apache® Spark™ 1.6 presented by Databricks co-founder Patrick Wendell
Apache® Spark™ 1.6 presented by Databricks co-founder Patrick WendellApache® Spark™ 1.6 presented by Databricks co-founder Patrick Wendell
Apache® Spark™ 1.6 presented by Databricks co-founder Patrick Wendell
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User Group
 
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
 
ASPgems - kappa architecture
ASPgems - kappa architectureASPgems - kappa architecture
ASPgems - kappa architecture
 
Apache Spark - A High Level overview
Apache Spark - A High Level overviewApache Spark - A High Level overview
Apache Spark - A High Level overview
 
Started with-apache-spark
Started with-apache-sparkStarted with-apache-spark
Started with-apache-spark
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
 
Apache spark 2.4 and beyond
Apache spark 2.4 and beyondApache spark 2.4 and beyond
Apache spark 2.4 and beyond
 
Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks
 

Más de Adi Polak

Más de Adi Polak (7)

Demystifying Apache Spark
Demystifying Apache SparkDemystifying Apache Spark
Demystifying Apache Spark
 
Burst workloads Cutting costs with Kubernetes and Virtual Kubelet
Burst workloads Cutting costs with Kubernetes and Virtual KubeletBurst workloads Cutting costs with Kubernetes and Virtual Kubelet
Burst workloads Cutting costs with Kubernetes and Virtual Kubelet
 
From desktop to the cloud, cutting costs with Virtual kubelet and ACI
From desktop to the cloud, cutting costs with Virtual kubelet and ACIFrom desktop to the cloud, cutting costs with Virtual kubelet and ACI
From desktop to the cloud, cutting costs with Virtual kubelet and ACI
 
ETL – Everything you need to know
ETL – Everything you need to knowETL – Everything you need to know
ETL – Everything you need to know
 
Evolution of VS code Java ecosystem
Evolution of VS code Java ecosystemEvolution of VS code Java ecosystem
Evolution of VS code Java ecosystem
 
Make it clean - scala clean code
Make it clean - scala clean codeMake it clean - scala clean code
Make it clean - scala clean code
 
Spark UDFs are EviL, Catalyst to the rEsCue!
Spark UDFs are EviL, Catalyst to the rEsCue!Spark UDFs are EviL, Catalyst to the rEsCue!
Spark UDFs are EviL, Catalyst to the rEsCue!
 

Último

AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
ankushspencer015
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
Tonystark477637
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Dr.Costas Sachpazis
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Christo Ananth
 

Último (20)

Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 

AI at Scale