SlideShare una empresa de Scribd logo
1 de 75
Descargar para leer sin conexión
https://academy.microsoft.com/en-us/professional-program/
https://channel9.msdn.com/
https://www.wintellectnow.com/Home/Instructor?instructorId=Mark_Tabladillo
The opportunity and challenge of data science in
enterprises
Opportunity: 17% had a well-developed Predictive/Prescriptive Analytics
program in place, while 80% planned on implementing such a program within
five years – Dataversity 2015 Survey
Challenge: Only 27% of the big data projects are regarded as successful –
CapGenimi 2014
Tools & data platforms have matured -
Still a major gap in executing on the potential
One reason: Process challenge in Data Science
Organization
Collaboration
Quality
Knowledge Accumulation
Agility
Global Teams
• Geographic Locations
Team Growth
• Onboard New
Members Rapidly
Varied Use Cases
• Industries and Use
Cases
Diverse DS
Backgrounds
• DS have diverse
backgrounds,
experiences with
tools, languages
Data Science lifecycle
 Primary stages:
Lifecycle
Time
Science One
Envisioning
Scoping
Science Two
Science Three
Science Four
Business Understanding
Data Preparation
Modeling
Deployment
Time
Business Understanding
Data Preparation
Modeling
Deployment
Time
Business Understanding
Form 10-K 2016
Microsoft | Amplifying Human Ingenuity
Enhance traditional line-of-
business analytics solutions with
Machine Learning
Solve complex business
problems with Deep Learning
Engage customers with
intelligent automated solutions
Services
Infrastructure
Tools
Azure AI Services
Azure Infrastructure
Tools
© Microsoft Corporation
Fast, easy, and collaborative Apache Spark-based analytics platform
Azure Databricks for deep learning modeling
Tools InfrastructureFrameworks
Leverage powerful GPU-enabled VMs
pre-configured for deep neural
network training
Use HorovodEstimator via a native
runtime to enable build deep learning
models with a few lines of code
Full Python and Scala support for
transfer learning on images
Automatically store metadata in
Azure Database with geo-replication
for fault tolerance
Use built-in hyperparameter tuning
via Spark MLLib to quickly drive
model progress
Simultaneously collaborate within
notebooks environments to streamline
model development
Load images natively in Spark
DataFrames to automatically decode
them for manipulation at scale
Improve performance 10x-100x over
traditional Spark deployments with
an optimized environment
Seamlessly use TensorFlow, Microsoft
Cognitive Toolkit, Caffe2, Keras, and more
© Microsoft Corporation
Additional deep learning tools
Automatically scale virtual machine clusters with GPUs
or CPUs
Develop your models with long-running batch jobs,
iterative experimentation, and interactive training
Support for any deep learning or machine learning
framework
Designed and pre-configured specifically for GPU-
enabled instances
Fully integrated with Azure AI training service to provide
capacity for parallelized AI training at scale
Get started in seconds with example scripts and sample
data sets
Batch AI Deep Learning Virtual Machine
© Microsoft Corporation
Deploy AI models to devices on the edge
IoT EdgeModel managementMachine learning
Azure
Databricks
Azure ML
Services
Qualcomm
QCS603
Vision AI dev. kit
Pre trained
Solutions
IDEs
AI Frameworks
© Microsoft Corporation
Native
TensorFlow
Caffe
Deploy AI models to devices on the edge flow
Message passing
Co-located
Scoring file
Service or app
Trained/retrained
model
Additional
images
Original NW
e.g. MobileNet
© Microsoft Corporation
Machine learning and deep learning, when to use what?
Code first
(On-prem)
ML Server
On-prem
Hadoop
SQL Server
(cloud)
AML (Preview)
SQL Server Hadoop Azure Batch DSVM Spark
Visual tooling
(cloud)
AML Studio
Notebooks Jobs
Azure Databricks
Spark
What engine(s) do
you want to use?
Deployment target
Which experience
do you want?
Build with Spark
or other engines?
Python
TensorFlow, Keras, MS Cognitive
Toolkit, ONNX, Caffe2
Spark ML, SparkR,
SparklyR
TensorFlow, Keras, MS Cognitive Toolkit,
ONNX, Caffe2
https://azure.microsoft.com/en-us/global-
infrastructure/services/
https://portal.azure.com
https://docs.microsoft.com/en-us/azure/batch-ai/quickstart-
tensorflow-training-cli
S P A R K : A B R I E F H I S T O R Y
A P A C H E S P A R K
An unified, open source, parallel, data processing framework for Big Data Analytics
Spark Core Engine
Spark SQL
Interactive
Queries
Spark Structured
Streaming
Stream processing
Spark MLlib
Machine
Learning
Yarn Mesos
Standalone
Scheduler
Spark MLlib
Machine
Learning
Spark
Streaming
Stream processing
GraphX
Graph
Computation
Azure Databricks
Databricks Spark as a managed service on Azure
CONTROL EASE OF USE
Azure Data Lake
Analytics
Azure Data Lake Store
Azure Storage
Any Hadoop technology,
any distribution
Workload optimized,
managed clusters
Data Engineering in a
Job-as-a-service model
Azure Marketplace
HDP | CDH | MapR
Azure Data Lake
Analytics
IaaS Clusters Managed Clusters Big Data as-a-service
Azure HDInsight
Frictionless & Optimized
Spark clusters
Azure Databricks
BIGDATA
STORAGE
BIGDATA
ANALYTICS
ReducedAdministration
K N O W I N G T H E V A R I O U S B I G D A T A S O L U T I O N S
Azure HDInsight
What It Is
• Hortonworks distribution as a first party service on Azure
• Big Data engines support – Hadoop Projects, Hive on Tez, Hive
LLAP, Spark, HBase, Storm, Kafka, R Server
• Best-in-class developer tooling and Monitoring capabilities
• Enterprise Features
• VNET support (join existing VNETs)
• Ranger support (Kerberos based Security)
• Log Analytics via OMS
• Orchestration via Azure Data Factory
• Available in most Azure Regions (27) including Gov Cloud
and Federal Clouds
Guidance
• Customer needs Hadoop technologies other than, or in addition
to Spark
• Customer prefers Hortonworks Spark distribution to stay closer
to OSS codebase and/or ‘Lift and Shift’ from on-premises
deployments
• Customer has specific project requirements that are only
available on HDInsight
Azure Databricks
What It Is
• Databricks’ Spark service as a first party service on Azure
• Single engine for Batch, Streaming, ML and Graph
• Best-in-class notebooks experience for optimal productivity and
collaboration
• Enterprise Features
• Native Integration with Azure for Security via AAD (OAuth)
• Optimized engine for better performance and scalability
• RBAC for Notebooks and APIs
• Auto-scaling and cluster termination capabilities
• Native integration with SQL DW and other Azure services
• Serverless pools for easier management of resources
Guidance
• Customer needs the best option for Spark on Azure
• Customer teams are comfortable with notebooks and Spark
• Customers need Auto-scaling and
• Customer needs to build integrated and performant data
pipelines
• Customer is comfortable with limited regional availability (3 in
preview, 8 by GA)
Azure ML
What It Is
• Azure first party service for Machine Learning
• Leverage existing ML libraries or extend with Python and R
• Targets emerging data scientists with drag & drop offering
• Targets professional data scientists with
– Experimentation service
– Model management service
– Works with customers IDE of choice
Guidance
• Azure Machine Learning Studio is a GUI based ML tool for
emerging Data Scientists to experiment and operationalize with
least friction
• Azure Machine Learning Workbench is not a compute engine &
uses external engines for Compute, including SQL Server and
Spark
• AML deploys models to HDI Spark currently
• AML should be able to deploy Azure Databricks in the near future
L O O K I N G A C R O S S T H E O F F E R I N G S
Azure Databricks
Core Concepts
P R O V I S I O N I N G A Z U R E D A T A B R I C K S W O R K S P A C E
G E N E R A L S P A R K C L U S T E R A R C H I T E C T U R E
Data Sources (HDFS, SQL, NoSQL, …)
Cluster Manager
Worker Node Worker Node Worker Node
Driver Program
SparkContext
A Z U R E D A T A B R I C K S C L U S T E R A R C H I T E C T U R E
Azure DB
for
PostgreSQL
Webapp
Azure Compute
Cluster
Manager
Databricks’ Azure Account User’s Azure Account
Azure Compute
Spark
Driver
Azure Compute
Spark
Worker
Azure Compute
Spark
Worker
Jobs
FileSystem
Service
Spark
History
Server
Log
Daemon
Log
Daemon
C L U S T E R M A N A G E R A R C H I T E C T U R E
JobsWebapp
Cluster Manager
Cluster
Azure Compute
Spark
Driver
Azure Compute
Spark
Worker
Azure Compute
Spark
Worker
Database
Instances, Clusters, Libraries, Hive Metastore, …
Cluster
Azure Compute
Spark
Driver
Azure Compute
Spark
Worker
Azure Compute
Spark
Worker
Instance Manager
Container
Management
Library Manager
Secure Collaboration
D A T A B R I C K S A C C E S S C O N T R O L
Access control can be defined at the user level via the Admin Console
Workspace Access
Control
Defines who can who can view, edit, and run notebooks in
their workspace
Cluster Access Control
Allows users to who can attach to, restart, and manage
(resize/delete) clusters.
Allows Admins to specify which users have permissions to
create clusters
Jobs Access Control
Allows owners of a job to control who can view job results or
manage runs of a job (run now/cancel)
REST API Tokens
Allows users to use personal access tokens instead of
passwords to access the Databricks REST API
Databricks
Access
Control
Clusters
C L U S T E R S
▪ Azure Databricks clusters are the set of Azure Linux VMs that
host the Spark Worker and Driver Nodes
▪ Your Spark application code (i.e. Jobs) runs on the provisioned
clusters.
▪ Azure Databricks clusters are launched in your subscription—but
are managed through the Azure Databricks portal.
▪ Azure Databricks provides a comprehensive set of graphical
wizards to manage the complete lifecycle of clusters—from
creation to termination.
A Z U R E D A T A B R I C K S N O T E B O O K S O V E R V I E W
Notebooks are a popular way to develop, and run, Spark Applications
▪ Notebooks are not only for authoring Spark applications but
can be run/executed directly on clusters
• Shift+Enter
•
•
▪ Notebooks support fine grained permissions—so they can be
securely shared with colleagues for collaboration (see
following slide for details on permissions and abilities)
▪ Notebooks are well-suited for prototyping, rapid
development, exploration, discovery and iterative
development Notebooks typically consist of code, data, visualization, comments and notes
M I X I N G L A N G U A G E S I N N O T E B O O K S
You can mix multiple languages in the same notebook
Normally a notebook is associated with a specific language. However, with Azure Databricks notebooks, you can
mix multiple languages in the same notebook. This is done using the language magic command:
• %python Allows you to execute python code in a notebook (even if that notebook is not python)
• %sql Allows you to execute sql code in a notebook (even if that notebook is not sql).
• %r Allows you to execute r code in a notebook (even if that notebook is not r).
• %scala Allows you to execute scala code in a notebook (even if that notebook is not scala).
• %sh Allows you to execute shell code in your notebook.
• %fs Allows you to use Databricks Utilities - dbutils filesystem commands.
• %md To include rendered markdown
L I B R A R I E S O V E R V I E W
Enables external code to be imported and stored into a Workspace
D A T A B R I C K S S P A R K I S F A S T
Benchmarks have shown Databricks to often have better performance than alternatives
SOURCE: Benchmarking Big Data SQL Platforms in the Cloud
S P A R K S Q L O V E R V I E W
Spark SQL is a distributed SQL query engine for processing structured data
L O C A L A N D G L O B A L T A B L E S
Databricks registers global
tables to the Hive metastore and
makes them available across all
clusters.
Only global tables are visible in
the Tables pane
Azure Databricks Tables
Databricks does not registers local
tables in the Hive metastore and
are only available within one
cluster.
Also known as temporary tables
A Z U R E S Q L D W I N T E G R A T I O N
Integration enables structured data from SQL DW to be included in Spark Analytics
Azure SQL Data Warehouse is a SQL-based fully managed, petabyte-scale cloud solution for data warehousing
Azure Databricks Azure SQL DW
▪ You can bring in data from Azure SQL
DW to perform advanced analytics that
require both structured and unstructured
data.
▪ Currently you can access data in Azure
SQL DW via the JDBC driver. From within
your spark code you can access just like
any other JDBC data source.
▪ If Azure SQL DW is authenticated via
AAD then Azure Databricks user can
seamlessly access Azure SQL DW.
P O W E R B I I N T E G R A T I O N
Enables powerful visualization of data in Spark with Power BI
Power BI Desktop can connect to Azure Databricks
clusters to query data using JDBC/ODBC server that
runs on the driver node.
• This server listens on port 10000 and it is not accessible
outside the subnet where the cluster is running.
• Azure Databricks uses a public HTTPS gateway
• The JDBC/ODBC connection information can be obtained
from the Cluster UI directly as shown in the figure.
• When establishing the connection, you can use a Personal
Access Token to authenticate to the cluster gateway. Only
users who have attach permissions can access the cluster
via the JDBC/ ODBC endpoint.
• In Power BI desktop you can setup the connection by
choosing the ODBC data source in the “Get Data” option.
C O S M O S D B I N T E G R A T I O N
The Spark connector enables real-time analytics over globally distributed data in Azure Cosmos DB
▪ With Spark connector for Azure Cosmos DB, Apache Spark
can now interact with all Azure Cosmos DB data models:
Documents, Tables, and Graphs.
• efficiently exploits the native Azure Cosmos DB managed indexes
and enables updateable columns when performing analytics.
• utilizes push-down predicate filtering against fast-changing
globally-distributed data
▪ Some use-cases for Azure Cosmos DB + Spark include:
• Streaming Extract, Transformation, and Loading of data (ETL)
• Data enrichment
• Trigger event detection
• Complex session analysis and personalization
• Visual data exploration and interactive analysis
• Notebook experience for data exploration, information sharing,
and collaboration
Azure Cosmos DB is Microsoft's globally distributed, multi-model database service for mission-critical applications
The connector uses the Azure DocumentDB Java SDK
and moves data directly between Spark worker nodes
and Cosmos DB data nodes
A Z U R E B L O B S T O R A G E I N T E G R A T I O N
Data can be read from Azure Blob Storage using the Hadoop FileSystem interface. Data can be read from public storage accounts
without any additional settings. To read data from a private storage account, you need to set an account key or a Shared Access
Signature (SAS) in your notebook
spark.conf.set ( "fs.azure.account.key.{Your Storage Account Name}.blob.core.windows.net", "{Your Storage Account Access Key}")
Setting up an account key
Setting up a SAS for a given container:
spark.conf.set( "fs.azure.sas.{Your Container Name}.{Your Storage Account Name}.blob.core.windows.net", "{Your SAS For The Given Container}")
Once an account key or a SAS is setup, you can use standard Spark and Databricks APIs to read from the storage account:
val df = spark.read.parquet("wasbs://{Your Container Name}@m{Your Storage Account name}.blob.core.windows.net/{Your Directory Name}")
dbutils.fs.ls("wasbs://{Your ntainer Name}@{Your Storage Account Name}.blob.core.windows.net/{Your Directory Name}")
A Z U R E D A T A L A K E I N T E G R A T I O N
To read from your Data Lake Store account, you can configure Spark to use service credentials with the following snippet in
your notebook
spark.conf.set("dfs.adls.oauth2.access.token.provider.type", "ClientCredential")
spark.conf.set("dfs.adls.oauth2.client.id", "{YOUR SERVICE CLIENT ID}")
spark.conf.set("dfs.adls.oauth2.credential", "{YOUR SERVICE CREDENTIALS}")
spark.conf.set("dfs.adls.oauth2.refresh.url", "https://login.windows.net/{YOUR DIRECTORY ID}/oauth2/token")
After providing credentials, you can read from Data Lake Store using standard APIs:
val df = spark.read.parquet("adl://{YOUR DATA LAKE STORE ACCOUNT NAME}.azuredatalakestore.net/{YOUR DIRECTORY NAME}")
dbutils.fs.list("adl://{YOUR DATA LAKE STORE ACCOUNT NAME}.azuredatalakestore.net/{YOUR DIRECTORY NAME}")
S P A R K M L A L G O R I T H M S
Spark ML
Algorithms
S P A R K S T R U C T U R E D S T R E A M I N G O V E R V I E W
▪ Unifies streaming, interactive and batch queries—a single API for both
static bounded data and streaming unbounded data.
▪ Runs on Spark SQL. Uses the Spark SQL Dataset/DataFrame API used
for batch processing of static data.
▪ Runs incrementally and continuously and updates the results as data
streams in.
▪ Supports app development in Scala, Java, Python and R.
▪ Supports streaming aggregations, event-time windows, windowed
grouped aggregation, stream-to-batch joins.
▪ Features streaming deduplication, multiple output modes and APIs for
managing/monitoring streaming queries.
▪ Built-in sources: Kafka, File source (json, csv, text, parquet)
A unified system for end-to-end fault-tolerant, exactly-once stateful stream processing
D A T A B R I C K S R E S T A P I
Cluster API Create/edit/delete clusters
DBFS API Interact with the Databricks File System
Groups API Manage groups of users
Instance Profile API
Allows admins to add, list, and remove instances
profiles that users can launch clusters with
Job API Create/edit/delete jobs
Library API Create/edit/delete libraries
Workspace API List/import/export/delete notebooks/folders
Databricks
REST API
D A T A B R I C K S A P I - A U T H E N T I C A T I O N
Personal access tokens or passwords can be used to authenticate and access Databricks REST APIs
-H "Authorization: Bearer TOKEN_VALUE"
© Microsoft Corporation
A side-by-side comparison of capabilities and features
Model deployment options
Scoring interface
provided
Deployment
environments
Scalability of scoring
interface
Scoring
requirements
Model packaging
SQL Server or SQL Database
T-SQL stored procedure
SQL Server 2017 database instance
on-premises or in Azure VM
Need to author Python or R code within
a T-SQL stored procedure that loads the
trained model from a table where it is
stored and applies it in scoring.
Serialized to table
Limited to capacity of single server
Azure Databricks
Notebook or Job
Load the trained model from storage
and apply to scoring in notebook in
Python, Scala, R, or SQL.
Serialized to storage
Azure Databricks cluster, model export
Can scale across cluster resources
Web service
Create a Docker image that contains
scoring service, model, and
dependencies
Docker image
SQL Server, Hadoop
Scales by deploying more instances
in Azure Container Services
Azure Machine Learning
AKS, ACI edge via AML
IoT, IoT edge via AML
AKS, ACI
IoT, IoT edge
Spark and Batch AI
https://portal.azure.com
Big Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft Azure

Más contenido relacionado

La actualidad más candente

How Azure Databricks helped make IoT Analytics a Reality with Janath Manohara...
How Azure Databricks helped make IoT Analytics a Reality with Janath Manohara...How Azure Databricks helped make IoT Analytics a Reality with Janath Manohara...
How Azure Databricks helped make IoT Analytics a Reality with Janath Manohara...
Databricks
 

La actualidad más candente (20)

Azure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha DittmannAzure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha Dittmann
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
 
Spark as a Service with Azure Databricks
Spark as a Service with Azure DatabricksSpark as a Service with Azure Databricks
Spark as a Service with Azure Databricks
 
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
 
A developer's introduction to big data processing with Azure Databricks
A developer's introduction to big data processing with Azure DatabricksA developer's introduction to big data processing with Azure Databricks
A developer's introduction to big data processing with Azure Databricks
 
Leveraging Azure Databricks to minimize time to insight by combining Batch an...
Leveraging Azure Databricks to minimize time to insight by combining Batch an...Leveraging Azure Databricks to minimize time to insight by combining Batch an...
Leveraging Azure Databricks to minimize time to insight by combining Batch an...
 
Ai & Data Analytics 2018 - Azure Databricks for data scientist
Ai & Data Analytics 2018 - Azure Databricks for data scientistAi & Data Analytics 2018 - Azure Databricks for data scientist
Ai & Data Analytics 2018 - Azure Databricks for data scientist
 
How Azure Databricks helped make IoT Analytics a Reality with Janath Manohara...
How Azure Databricks helped make IoT Analytics a Reality with Janath Manohara...How Azure Databricks helped make IoT Analytics a Reality with Janath Manohara...
How Azure Databricks helped make IoT Analytics a Reality with Janath Manohara...
 
Data Lakes with Azure Databricks
Data Lakes with Azure DatabricksData Lakes with Azure Databricks
Data Lakes with Azure Databricks
 
5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for Analytics5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for Analytics
 
Einstieg in Machine Learning für Datenbankentwickler
Einstieg in Machine Learning für DatenbankentwicklerEinstieg in Machine Learning für Datenbankentwickler
Einstieg in Machine Learning für Datenbankentwickler
 
Big Data on Azure Tutorial
Big Data on Azure TutorialBig Data on Azure Tutorial
Big Data on Azure Tutorial
 
Building Advanced Analytics Pipelines with Azure Databricks
Building Advanced Analytics Pipelines with Azure DatabricksBuilding Advanced Analytics Pipelines with Azure Databricks
Building Advanced Analytics Pipelines with Azure Databricks
 
Cortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Azure Data LakeCortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Azure Data Lake
 
Azure Lowlands: An intro to Azure Data Lake
Azure Lowlands: An intro to Azure Data LakeAzure Lowlands: An intro to Azure Data Lake
Azure Lowlands: An intro to Azure Data Lake
 
Azure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAzure data bricks by Eugene Polonichko
Azure data bricks by Eugene Polonichko
 
Overview on Azure Machine Learning
Overview on Azure Machine LearningOverview on Azure Machine Learning
Overview on Azure Machine Learning
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Cortana Analytics Suite
Cortana Analytics SuiteCortana Analytics Suite
Cortana Analytics Suite
 

Similar a Big Data Adavnced Analytics on Microsoft Azure

Borys Rybak “How to make your data smart with Artificial Intelligence and Mac...
Borys Rybak “How to make your data smart with Artificial Intelligence and Mac...Borys Rybak “How to make your data smart with Artificial Intelligence and Mac...
Borys Rybak “How to make your data smart with Artificial Intelligence and Mac...
Lviv Startup Club
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload Deployment
Databricks
 

Similar a Big Data Adavnced Analytics on Microsoft Azure (20)

Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AI
 
201908 Overview of Automated ML
201908 Overview of Automated ML201908 Overview of Automated ML
201908 Overview of Automated ML
 
Making Data Scientists Productive in Azure
Making Data Scientists Productive in AzureMaking Data Scientists Productive in Azure
Making Data Scientists Productive in Azure
 
Sergii Baidachnyi ITEM 2018
Sergii Baidachnyi ITEM 2018Sergii Baidachnyi ITEM 2018
Sergii Baidachnyi ITEM 2018
 
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptx
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake Event
 
Introduction to Machine learning and Deep Learning
Introduction to Machine learning and Deep LearningIntroduction to Machine learning and Deep Learning
Introduction to Machine learning and Deep Learning
 
Deep Learning Technical Pitch Deck
Deep Learning Technical Pitch DeckDeep Learning Technical Pitch Deck
Deep Learning Technical Pitch Deck
 
Sky High With Azure
Sky High With AzureSky High With Azure
Sky High With Azure
 
Big Data Advanced Analytics on Microsoft Azure 201904
Big Data Advanced Analytics on Microsoft Azure 201904Big Data Advanced Analytics on Microsoft Azure 201904
Big Data Advanced Analytics on Microsoft Azure 201904
 
MLflow and Azure Machine Learning—The Power Couple for ML Lifecycle Management
MLflow and Azure Machine Learning—The Power Couple for ML Lifecycle ManagementMLflow and Azure Machine Learning—The Power Couple for ML Lifecycle Management
MLflow and Azure Machine Learning—The Power Couple for ML Lifecycle Management
 
Developing and deploying AI solutions on the cloud using Team Data Science Pr...
Developing and deploying AI solutions on the cloud using Team Data Science Pr...Developing and deploying AI solutions on the cloud using Team Data Science Pr...
Developing and deploying AI solutions on the cloud using Team Data Science Pr...
 
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
 
Borys Rybak “How to make your data smart with Artificial Intelligence and Mac...
Borys Rybak “How to make your data smart with Artificial Intelligence and Mac...Borys Rybak “How to make your data smart with Artificial Intelligence and Mac...
Borys Rybak “How to make your data smart with Artificial Intelligence and Mac...
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload Deployment
 
.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014
 
Azure databricks c sharp corner toronto feb 2019 heather grandy
Azure databricks c sharp corner toronto feb 2019 heather grandyAzure databricks c sharp corner toronto feb 2019 heather grandy
Azure databricks c sharp corner toronto feb 2019 heather grandy
 
A practical guidance of the enterprise machine learning
A practical guidance of the enterprise machine learning A practical guidance of the enterprise machine learning
A practical guidance of the enterprise machine learning
 
2018 11 14 Artificial Intelligence and Machine Learning in Azure
2018 11 14 Artificial Intelligence and Machine Learning in Azure2018 11 14 Artificial Intelligence and Machine Learning in Azure
2018 11 14 Artificial Intelligence and Machine Learning in Azure
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
 

Más de Mark Tabladillo

Más de Mark Tabladillo (20)

How to find low-cost or free data science resources 202006
How to find low-cost or free data science resources 202006How to find low-cost or free data science resources 202006
How to find low-cost or free data science resources 202006
 
201909 Automated ML for Developers
201909 Automated ML for Developers201909 Automated ML for Developers
201909 Automated ML for Developers
 
201906 01 Introduction to ML.NET 1.0
201906 01 Introduction to ML.NET 1.0201906 01 Introduction to ML.NET 1.0
201906 01 Introduction to ML.NET 1.0
 
201906 04 Overview of Automated ML June 2019
201906 04 Overview of Automated ML June 2019201906 04 Overview of Automated ML June 2019
201906 04 Overview of Automated ML June 2019
 
201906 03 Introduction to NimbusML
201906 03 Introduction to NimbusML201906 03 Introduction to NimbusML
201906 03 Introduction to NimbusML
 
201906 02 Introduction to AutoML with ML.NET 1.0
201906 02 Introduction to AutoML with ML.NET 1.0201906 02 Introduction to AutoML with ML.NET 1.0
201906 02 Introduction to AutoML with ML.NET 1.0
 
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...201905 Azure Certification DP-100: Designing and Implementing a Data Science ...
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...
 
Managing Enterprise Data Science 201904
Managing Enterprise Data Science 201904Managing Enterprise Data Science 201904
Managing Enterprise Data Science 201904
 
Advanced Analytics with Power BI 201808
Advanced Analytics with Power BI 201808Advanced Analytics with Power BI 201808
Advanced Analytics with Power BI 201808
 
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
 
Machine learning services with SQL Server 2017
Machine learning services with SQL Server 2017Machine learning services with SQL Server 2017
Machine learning services with SQL Server 2017
 
Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612
 
How Big Companies plan to use Our Big Data 201610
How Big Companies plan to use Our Big Data 201610How Big Companies plan to use Our Big Data 201610
How Big Companies plan to use Our Big Data 201610
 
Georgia Tech Data Science Hackathon September 2016
Georgia Tech Data Science Hackathon September 2016Georgia Tech Data Science Hackathon September 2016
Georgia Tech Data Science Hackathon September 2016
 
Microsoft Data Science Technologies 201608
Microsoft Data Science Technologies 201608Microsoft Data Science Technologies 201608
Microsoft Data Science Technologies 201608
 
Insider's guide to azure machine learning 201606
Insider's guide to azure machine learning 201606Insider's guide to azure machine learning 201606
Insider's guide to azure machine learning 201606
 
Window functions for Data Science
Window functions for Data ScienceWindow functions for Data Science
Window functions for Data Science
 
Microsoft Technologies for Data Science 201601
Microsoft Technologies for Data Science 201601Microsoft Technologies for Data Science 201601
Microsoft Technologies for Data Science 201601
 
Microsoft Data Science Technologies: Back Office Edition
Microsoft Data Science Technologies: Back Office EditionMicrosoft Data Science Technologies: Back Office Edition
Microsoft Data Science Technologies: Back Office Edition
 
Microsoft Data Science Technologies 201510
Microsoft Data Science Technologies 201510Microsoft Data Science Technologies 201510
Microsoft Data Science Technologies 201510
 

Último

Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
HyderabadDolls
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
HyderabadDolls
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
gajnagarg
 

Último (20)

Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 

Big Data Adavnced Analytics on Microsoft Azure

  • 1.
  • 2.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8. The opportunity and challenge of data science in enterprises Opportunity: 17% had a well-developed Predictive/Prescriptive Analytics program in place, while 80% planned on implementing such a program within five years – Dataversity 2015 Survey Challenge: Only 27% of the big data projects are regarded as successful – CapGenimi 2014 Tools & data platforms have matured - Still a major gap in executing on the potential
  • 9. One reason: Process challenge in Data Science Organization Collaboration Quality Knowledge Accumulation Agility Global Teams • Geographic Locations Team Growth • Onboard New Members Rapidly Varied Use Cases • Industries and Use Cases Diverse DS Backgrounds • DS have diverse backgrounds, experiences with tools, languages
  • 10. Data Science lifecycle  Primary stages: Lifecycle
  • 11.
  • 12.
  • 16.
  • 17.
  • 18.
  • 20. Microsoft | Amplifying Human Ingenuity Enhance traditional line-of- business analytics solutions with Machine Learning Solve complex business problems with Deep Learning Engage customers with intelligent automated solutions
  • 21.
  • 23. Azure AI Services Azure Infrastructure Tools
  • 24. © Microsoft Corporation Fast, easy, and collaborative Apache Spark-based analytics platform Azure Databricks for deep learning modeling Tools InfrastructureFrameworks Leverage powerful GPU-enabled VMs pre-configured for deep neural network training Use HorovodEstimator via a native runtime to enable build deep learning models with a few lines of code Full Python and Scala support for transfer learning on images Automatically store metadata in Azure Database with geo-replication for fault tolerance Use built-in hyperparameter tuning via Spark MLLib to quickly drive model progress Simultaneously collaborate within notebooks environments to streamline model development Load images natively in Spark DataFrames to automatically decode them for manipulation at scale Improve performance 10x-100x over traditional Spark deployments with an optimized environment Seamlessly use TensorFlow, Microsoft Cognitive Toolkit, Caffe2, Keras, and more
  • 25. © Microsoft Corporation Additional deep learning tools Automatically scale virtual machine clusters with GPUs or CPUs Develop your models with long-running batch jobs, iterative experimentation, and interactive training Support for any deep learning or machine learning framework Designed and pre-configured specifically for GPU- enabled instances Fully integrated with Azure AI training service to provide capacity for parallelized AI training at scale Get started in seconds with example scripts and sample data sets Batch AI Deep Learning Virtual Machine
  • 26. © Microsoft Corporation Deploy AI models to devices on the edge IoT EdgeModel managementMachine learning Azure Databricks Azure ML Services Qualcomm QCS603 Vision AI dev. kit Pre trained Solutions IDEs AI Frameworks
  • 27. © Microsoft Corporation Native TensorFlow Caffe Deploy AI models to devices on the edge flow Message passing Co-located Scoring file Service or app Trained/retrained model Additional images Original NW e.g. MobileNet
  • 28. © Microsoft Corporation Machine learning and deep learning, when to use what? Code first (On-prem) ML Server On-prem Hadoop SQL Server (cloud) AML (Preview) SQL Server Hadoop Azure Batch DSVM Spark Visual tooling (cloud) AML Studio Notebooks Jobs Azure Databricks Spark What engine(s) do you want to use? Deployment target Which experience do you want? Build with Spark or other engines? Python TensorFlow, Keras, MS Cognitive Toolkit, ONNX, Caffe2 Spark ML, SparkR, SparklyR TensorFlow, Keras, MS Cognitive Toolkit, ONNX, Caffe2
  • 29.
  • 32.
  • 34. S P A R K : A B R I E F H I S T O R Y
  • 35. A P A C H E S P A R K An unified, open source, parallel, data processing framework for Big Data Analytics Spark Core Engine Spark SQL Interactive Queries Spark Structured Streaming Stream processing Spark MLlib Machine Learning Yarn Mesos Standalone Scheduler Spark MLlib Machine Learning Spark Streaming Stream processing GraphX Graph Computation
  • 36. Azure Databricks Databricks Spark as a managed service on Azure
  • 37. CONTROL EASE OF USE Azure Data Lake Analytics Azure Data Lake Store Azure Storage Any Hadoop technology, any distribution Workload optimized, managed clusters Data Engineering in a Job-as-a-service model Azure Marketplace HDP | CDH | MapR Azure Data Lake Analytics IaaS Clusters Managed Clusters Big Data as-a-service Azure HDInsight Frictionless & Optimized Spark clusters Azure Databricks BIGDATA STORAGE BIGDATA ANALYTICS ReducedAdministration K N O W I N G T H E V A R I O U S B I G D A T A S O L U T I O N S
  • 38. Azure HDInsight What It Is • Hortonworks distribution as a first party service on Azure • Big Data engines support – Hadoop Projects, Hive on Tez, Hive LLAP, Spark, HBase, Storm, Kafka, R Server • Best-in-class developer tooling and Monitoring capabilities • Enterprise Features • VNET support (join existing VNETs) • Ranger support (Kerberos based Security) • Log Analytics via OMS • Orchestration via Azure Data Factory • Available in most Azure Regions (27) including Gov Cloud and Federal Clouds Guidance • Customer needs Hadoop technologies other than, or in addition to Spark • Customer prefers Hortonworks Spark distribution to stay closer to OSS codebase and/or ‘Lift and Shift’ from on-premises deployments • Customer has specific project requirements that are only available on HDInsight Azure Databricks What It Is • Databricks’ Spark service as a first party service on Azure • Single engine for Batch, Streaming, ML and Graph • Best-in-class notebooks experience for optimal productivity and collaboration • Enterprise Features • Native Integration with Azure for Security via AAD (OAuth) • Optimized engine for better performance and scalability • RBAC for Notebooks and APIs • Auto-scaling and cluster termination capabilities • Native integration with SQL DW and other Azure services • Serverless pools for easier management of resources Guidance • Customer needs the best option for Spark on Azure • Customer teams are comfortable with notebooks and Spark • Customers need Auto-scaling and • Customer needs to build integrated and performant data pipelines • Customer is comfortable with limited regional availability (3 in preview, 8 by GA) Azure ML What It Is • Azure first party service for Machine Learning • Leverage existing ML libraries or extend with Python and R • Targets emerging data scientists with drag & drop offering • Targets professional data scientists with – Experimentation service – Model management service – Works with customers IDE of choice Guidance • Azure Machine Learning Studio is a GUI based ML tool for emerging Data Scientists to experiment and operationalize with least friction • Azure Machine Learning Workbench is not a compute engine & uses external engines for Compute, including SQL Server and Spark • AML deploys models to HDI Spark currently • AML should be able to deploy Azure Databricks in the near future L O O K I N G A C R O S S T H E O F F E R I N G S
  • 40. P R O V I S I O N I N G A Z U R E D A T A B R I C K S W O R K S P A C E
  • 41. G E N E R A L S P A R K C L U S T E R A R C H I T E C T U R E Data Sources (HDFS, SQL, NoSQL, …) Cluster Manager Worker Node Worker Node Worker Node Driver Program SparkContext
  • 42. A Z U R E D A T A B R I C K S C L U S T E R A R C H I T E C T U R E Azure DB for PostgreSQL Webapp Azure Compute Cluster Manager Databricks’ Azure Account User’s Azure Account Azure Compute Spark Driver Azure Compute Spark Worker Azure Compute Spark Worker Jobs FileSystem Service Spark History Server Log Daemon Log Daemon
  • 43. C L U S T E R M A N A G E R A R C H I T E C T U R E JobsWebapp Cluster Manager Cluster Azure Compute Spark Driver Azure Compute Spark Worker Azure Compute Spark Worker Database Instances, Clusters, Libraries, Hive Metastore, … Cluster Azure Compute Spark Driver Azure Compute Spark Worker Azure Compute Spark Worker Instance Manager Container Management Library Manager
  • 45. D A T A B R I C K S A C C E S S C O N T R O L Access control can be defined at the user level via the Admin Console Workspace Access Control Defines who can who can view, edit, and run notebooks in their workspace Cluster Access Control Allows users to who can attach to, restart, and manage (resize/delete) clusters. Allows Admins to specify which users have permissions to create clusters Jobs Access Control Allows owners of a job to control who can view job results or manage runs of a job (run now/cancel) REST API Tokens Allows users to use personal access tokens instead of passwords to access the Databricks REST API Databricks Access Control
  • 47. C L U S T E R S ▪ Azure Databricks clusters are the set of Azure Linux VMs that host the Spark Worker and Driver Nodes ▪ Your Spark application code (i.e. Jobs) runs on the provisioned clusters. ▪ Azure Databricks clusters are launched in your subscription—but are managed through the Azure Databricks portal. ▪ Azure Databricks provides a comprehensive set of graphical wizards to manage the complete lifecycle of clusters—from creation to termination.
  • 48.
  • 49.
  • 50.
  • 51. A Z U R E D A T A B R I C K S N O T E B O O K S O V E R V I E W Notebooks are a popular way to develop, and run, Spark Applications ▪ Notebooks are not only for authoring Spark applications but can be run/executed directly on clusters • Shift+Enter • • ▪ Notebooks support fine grained permissions—so they can be securely shared with colleagues for collaboration (see following slide for details on permissions and abilities) ▪ Notebooks are well-suited for prototyping, rapid development, exploration, discovery and iterative development Notebooks typically consist of code, data, visualization, comments and notes
  • 52. M I X I N G L A N G U A G E S I N N O T E B O O K S You can mix multiple languages in the same notebook Normally a notebook is associated with a specific language. However, with Azure Databricks notebooks, you can mix multiple languages in the same notebook. This is done using the language magic command: • %python Allows you to execute python code in a notebook (even if that notebook is not python) • %sql Allows you to execute sql code in a notebook (even if that notebook is not sql). • %r Allows you to execute r code in a notebook (even if that notebook is not r). • %scala Allows you to execute scala code in a notebook (even if that notebook is not scala). • %sh Allows you to execute shell code in your notebook. • %fs Allows you to use Databricks Utilities - dbutils filesystem commands. • %md To include rendered markdown
  • 53. L I B R A R I E S O V E R V I E W Enables external code to be imported and stored into a Workspace
  • 54.
  • 55. D A T A B R I C K S S P A R K I S F A S T Benchmarks have shown Databricks to often have better performance than alternatives SOURCE: Benchmarking Big Data SQL Platforms in the Cloud
  • 56.
  • 57. S P A R K S Q L O V E R V I E W Spark SQL is a distributed SQL query engine for processing structured data
  • 58. L O C A L A N D G L O B A L T A B L E S Databricks registers global tables to the Hive metastore and makes them available across all clusters. Only global tables are visible in the Tables pane Azure Databricks Tables Databricks does not registers local tables in the Hive metastore and are only available within one cluster. Also known as temporary tables
  • 59. A Z U R E S Q L D W I N T E G R A T I O N Integration enables structured data from SQL DW to be included in Spark Analytics Azure SQL Data Warehouse is a SQL-based fully managed, petabyte-scale cloud solution for data warehousing Azure Databricks Azure SQL DW ▪ You can bring in data from Azure SQL DW to perform advanced analytics that require both structured and unstructured data. ▪ Currently you can access data in Azure SQL DW via the JDBC driver. From within your spark code you can access just like any other JDBC data source. ▪ If Azure SQL DW is authenticated via AAD then Azure Databricks user can seamlessly access Azure SQL DW.
  • 60. P O W E R B I I N T E G R A T I O N Enables powerful visualization of data in Spark with Power BI Power BI Desktop can connect to Azure Databricks clusters to query data using JDBC/ODBC server that runs on the driver node. • This server listens on port 10000 and it is not accessible outside the subnet where the cluster is running. • Azure Databricks uses a public HTTPS gateway • The JDBC/ODBC connection information can be obtained from the Cluster UI directly as shown in the figure. • When establishing the connection, you can use a Personal Access Token to authenticate to the cluster gateway. Only users who have attach permissions can access the cluster via the JDBC/ ODBC endpoint. • In Power BI desktop you can setup the connection by choosing the ODBC data source in the “Get Data” option.
  • 61. C O S M O S D B I N T E G R A T I O N The Spark connector enables real-time analytics over globally distributed data in Azure Cosmos DB ▪ With Spark connector for Azure Cosmos DB, Apache Spark can now interact with all Azure Cosmos DB data models: Documents, Tables, and Graphs. • efficiently exploits the native Azure Cosmos DB managed indexes and enables updateable columns when performing analytics. • utilizes push-down predicate filtering against fast-changing globally-distributed data ▪ Some use-cases for Azure Cosmos DB + Spark include: • Streaming Extract, Transformation, and Loading of data (ETL) • Data enrichment • Trigger event detection • Complex session analysis and personalization • Visual data exploration and interactive analysis • Notebook experience for data exploration, information sharing, and collaboration Azure Cosmos DB is Microsoft's globally distributed, multi-model database service for mission-critical applications The connector uses the Azure DocumentDB Java SDK and moves data directly between Spark worker nodes and Cosmos DB data nodes
  • 62. A Z U R E B L O B S T O R A G E I N T E G R A T I O N Data can be read from Azure Blob Storage using the Hadoop FileSystem interface. Data can be read from public storage accounts without any additional settings. To read data from a private storage account, you need to set an account key or a Shared Access Signature (SAS) in your notebook spark.conf.set ( "fs.azure.account.key.{Your Storage Account Name}.blob.core.windows.net", "{Your Storage Account Access Key}") Setting up an account key Setting up a SAS for a given container: spark.conf.set( "fs.azure.sas.{Your Container Name}.{Your Storage Account Name}.blob.core.windows.net", "{Your SAS For The Given Container}") Once an account key or a SAS is setup, you can use standard Spark and Databricks APIs to read from the storage account: val df = spark.read.parquet("wasbs://{Your Container Name}@m{Your Storage Account name}.blob.core.windows.net/{Your Directory Name}") dbutils.fs.ls("wasbs://{Your ntainer Name}@{Your Storage Account Name}.blob.core.windows.net/{Your Directory Name}")
  • 63. A Z U R E D A T A L A K E I N T E G R A T I O N To read from your Data Lake Store account, you can configure Spark to use service credentials with the following snippet in your notebook spark.conf.set("dfs.adls.oauth2.access.token.provider.type", "ClientCredential") spark.conf.set("dfs.adls.oauth2.client.id", "{YOUR SERVICE CLIENT ID}") spark.conf.set("dfs.adls.oauth2.credential", "{YOUR SERVICE CREDENTIALS}") spark.conf.set("dfs.adls.oauth2.refresh.url", "https://login.windows.net/{YOUR DIRECTORY ID}/oauth2/token") After providing credentials, you can read from Data Lake Store using standard APIs: val df = spark.read.parquet("adl://{YOUR DATA LAKE STORE ACCOUNT NAME}.azuredatalakestore.net/{YOUR DIRECTORY NAME}") dbutils.fs.list("adl://{YOUR DATA LAKE STORE ACCOUNT NAME}.azuredatalakestore.net/{YOUR DIRECTORY NAME}")
  • 64.
  • 65. S P A R K M L A L G O R I T H M S Spark ML Algorithms
  • 66.
  • 67. S P A R K S T R U C T U R E D S T R E A M I N G O V E R V I E W ▪ Unifies streaming, interactive and batch queries—a single API for both static bounded data and streaming unbounded data. ▪ Runs on Spark SQL. Uses the Spark SQL Dataset/DataFrame API used for batch processing of static data. ▪ Runs incrementally and continuously and updates the results as data streams in. ▪ Supports app development in Scala, Java, Python and R. ▪ Supports streaming aggregations, event-time windows, windowed grouped aggregation, stream-to-batch joins. ▪ Features streaming deduplication, multiple output modes and APIs for managing/monitoring streaming queries. ▪ Built-in sources: Kafka, File source (json, csv, text, parquet) A unified system for end-to-end fault-tolerant, exactly-once stateful stream processing
  • 68.
  • 69. D A T A B R I C K S R E S T A P I Cluster API Create/edit/delete clusters DBFS API Interact with the Databricks File System Groups API Manage groups of users Instance Profile API Allows admins to add, list, and remove instances profiles that users can launch clusters with Job API Create/edit/delete jobs Library API Create/edit/delete libraries Workspace API List/import/export/delete notebooks/folders Databricks REST API
  • 70. D A T A B R I C K S A P I - A U T H E N T I C A T I O N Personal access tokens or passwords can be used to authenticate and access Databricks REST APIs -H "Authorization: Bearer TOKEN_VALUE"
  • 71. © Microsoft Corporation A side-by-side comparison of capabilities and features Model deployment options Scoring interface provided Deployment environments Scalability of scoring interface Scoring requirements Model packaging SQL Server or SQL Database T-SQL stored procedure SQL Server 2017 database instance on-premises or in Azure VM Need to author Python or R code within a T-SQL stored procedure that loads the trained model from a table where it is stored and applies it in scoring. Serialized to table Limited to capacity of single server Azure Databricks Notebook or Job Load the trained model from storage and apply to scoring in notebook in Python, Scala, R, or SQL. Serialized to storage Azure Databricks cluster, model export Can scale across cluster resources Web service Create a Docker image that contains scoring service, model, and dependencies Docker image SQL Server, Hadoop Scales by deploying more instances in Azure Container Services Azure Machine Learning AKS, ACI edge via AML IoT, IoT edge via AML AKS, ACI IoT, IoT edge Spark and Batch AI