SlideShare una empresa de Scribd logo
1 de 27
Descargar para leer sin conexión
Mark Hamilton, Microsoft, marhamil@microsoft.com
Anand Raman, Microsoft, aram@microsoft.com
The Azure Cognitive Services on
Spark: Clusters with Embedded
Intelligent Services
#UnifiedAnalytics #SparkAISummit
Overview
• The Cognitive Services on Spark
– Basic Usage
– Fluent Design
• HTTP on Spark
– Architecture and Principles
• Clusters with Embedded Services
– Kubernetes, Databricks
• Examples
– GANs + the Metropolitan Museum of Art
2#UnifiedAnalytics #SparkAISummit
Motivation
• Azure Cognitive Services
provide high quality pre-
built intelligent services
• No need for time intensive
model training or
deployment
• Can quickly create
intelligent applications
• Leverage Microsoft
Research and Azure ML
3#UnifiedAnalytics #SparkAISummit
• http://www.seeingai.com
Object, scene, and
activity detection
Face recognition
and identification
Celebrity and landmark
recognition
Emotion recognition
Text and handwriting
recognition (OCR)
Customizable image
recognition
Video metadata, audio,
and keyframe extraction
and analysis
Explicit or offensive
content moderation
Speech transcription
(speech-to-text)
Custom speech models for
unique vocabularies or
complex environment
Text-to-speech
Custom Voice
Real-time speech translation
Customizable speech
transcription and translation
Speaker identification
and verification
Language detection
Named entity recognition
Key phrase extraction
Text sentiment analysis
Multilingual and contextual
spell checking
Explicit or offensive text
content moderation
PII detection for text
moderation
Text translation
Customizable text translation
Contextual language
understanding
Q&A extraction from
unstructured text
Knowledge base creation
from collections of Q&As
Semantic matching for
knowledge bases
Customizable content
personalization learning
Ad-free web, news, image,
and video search results
Trends for video, news
Image identification,
classification and
knowledge extraction
Identification of similar
images and products
Named entity recognition
and classification
Knowledge acquisition
for named entities
Search query autosuggest
Ad-free custom search
engine creation
Vision Speech Language Knowledge Search
Azure Cognitive Services on Spark
• Easy to use integration
between Spark and the
Azure Cognitive
Services
• Composable and
pipelinable with all other
SparkML models!
• Python, Scala, R (Beta)
5#UnifiedAnalytics #SparkAISummit
val df = new TextSentiment()
.setTextCol(“text”)
.setOutputCol(“sentiment”)
.transform(inputs)
http://www.seeingai.com
Fluent API for Advanced Orchestration
• Any parameter can be set with a dataframe
column or with a single value
7#UnifiedAnalytics #SparkAISummit
new BingImageSearch()
.setQueryCol(“queries”)
queries
Cat
Dog
Antelope
Car
Bob Ross
Get results for multiple search terms:
new BingImageSearch()
.setQuery(“cats”)
.setOffsetCol(“offsets”)
Fluent API for Advanced Orchestration
• Any parameter can be set with a dataframe
column or with a single value
8#UnifiedAnalytics #SparkAISummit
offsets
0
100
200
300
400
Get the first N pages of Bing for a
specific term:
Fluent API for Advanced Orchestration
• Any parameter can be set with a dataframe
column or with a single value
9#UnifiedAnalytics #SparkAISummit
offsets queries keys
0 Cat 17…
100 Cat 17…
0 Tree 3e…
100 Tree 4q…
0 Car G1…
Get the get fist 200 results for many
terms using several different accounts:
new BingImageSearch()
.setQueryCol(“queries”)
.setOffsetCol(“offsets”)
.setKeyCol(“keys”)
High Performance Capabilities OOTB
• Asynchronous Parallelism (P)
• Automatic Batching (B)
• Automatic Retries
– Exponential Back-offs
(EBO)
– Backpressure (BP)
10#UnifiedAnalytics #SparkAISummit
Features Time (s) Errors #
None 30.8 18993
EBO+BP 1163.0 0
EBO+BP+B 57.1 0
EBO+BP+B+P 49.7 0
10 nodes, 20k Requests, 1k req/min
limited service
• Full Integration between
HTTP Protocol and
Spark SQL
• Spark as a Microservice
Orchestrator
• Spark + X
11#UnifiedAnalytics #SparkAISummit
df = SimpleHTTPTransformer()
.setInputParser(JSONInputParser())
.setOutputParser(JSONOutputParser()
.setDataType(schema))
.setOutputCol("results")
.setUrl(…)
on
12#UnifiedAnalytics #SparkAISummit
on
Spark Worker
Partition Partition Partition
Client Client Client
Web Service
Spark Worker
Partition Partition Partition
Client Client Client
Local
Service
Local
Service
Local
Service
HTTP
Requests
and
Responses
Cognitive Service Containers
13#UnifiedAnalytics #SparkAISummit
Now In Public Preview
• No app changes & Compatible with full Cognitive Services
feature-set
• Support for 6 key AI capabilities:
• Key Phrase Extraction
• Language Detection
• Sentiment Analysis
• Face & Emotion Detection
• OCR / Text Recognition
• Language Understanding
• Run & manage locally, Try for free
• Connect to Billing service for report back, unified billing with
on-cloud and off-cloud transactions
• Additional Capabilities coming soon (e.g. Speech)
#UnifiedAnalytics #SparkAISummit 14
Clusters with Embedded Services
• Deploy cognitive
services directly onto
cluster worker nodes
• Bring the compute to the
data
• Use low latency in-
machine networking Spark Worker
Spark Scala Process
PySpark
Local
Cognitive
Service
Pyspark
Protocol HTTP
Azure Kubernetes Service + Helm
• Works on any k8s cluster
• Helm: Package Manager
for Kubernetes
15#UnifiedAnalytics #SparkAISummit
Kubernetes (AKS, ACS, GKE, On-Prem etc)
K8s workerK8s worker
Spark
Worker
Spark
Worker
K8s worker
Cognitive
Service
Container
HTTP on Spark
Spark
Worker
Cognitive
Service
Container
HTTP on Spark
Spark
Worker
Cognitive
Service
Container
HTTP on Spark
Spark
Serving
Load
Balancer
Jupyter,
Zepplin,
LIVY, or
Spark
Submit LB
Zepplin
Jupyter
Storage or
other
Databases
Cloud
Cognitive
Services
Spark Serving Hotpath
HTTP on Spark
Spark Readers
REST Requests to
Deployed Models
Submit Jobs, Run Notebooks,
Manage Cluster, etc
Users / Apps
helm repo add mmlspark 
https://dbanda.github.io/charts
helm install mmlspark/spark 
--set localTextApi=true
Dalitso Banda, dbanda@microsoft.com
Microsoft AI Development Acceleration Program
Creating a Visual Search Engine for
the Metropolitan Museum of Art
16#UnifiedAnalytics #SparkAISummit
https://gen.studio
Intelligent Image Annotation
• The MET
Released 400k
Images under
Open Access
• Pipe images
through
Computer Vision
API to annotate
image for
searching
17#UnifiedAnalytics #SparkAISummit
A picture
containing a
person
A picture
containing a
glass, cup
A fish
swimming
underwater
Query
Image:
Describe
Image
Output:
Deep
Feature
Nearest
Neighbors:
Reverse Image Search Architecture
18#UnifiedAnalytics #SparkAISummit
Filters from Zeiler + Fergus 2013
Query
Image
ResNet
Featurizer
Deep
Features
Closest
Match
Fast Nearest
Neighbor
Lookup
MMLSpark SparkML LSH or Annoy
Example Nearest Neighbors
19#UnifiedAnalytics #SparkAISummit
QueryImages
Nearest
Neighbors
Spark x Azure Search
• Azure Search Sink for
Spark
• Allows for pushing
thousands of documents
per second into Azure
Search instances
• Built on HTTP on Spark
• Use to create search APIs
on top of Spark Dataframe
20#UnifiedAnalytics #SparkAISummit
21#UnifiedAnalytics #SparkAISummit
Microsoft Machine Learning for
Apache Spark v0.16
Microsoft’s Open Source
Contributions to Apache Spark
www.aka.ms/spark Azure/mmlspark
Cognitive
Services
Spark
Serving
Model
Interpretability
LightGBM
Gradient Boosting
Deep Networks
with CNTK
HTTP on
Spark
Conclusions
• Can now embed
Cognitive Services into
Spark Workflows
• Can harness Spark
Cluster for
Microservices
• Get started now with
interactive examples!
22#UnifiedAnalytics #SparkAISummit
www.aka.ms/spark
Contact:
marhamil@microsoft.com
mmlspark-support@microsoft.com
Azure/mmlspark
Help us advance Spark:
Thanks To
• Sudarshan Raghunathan
• Ilya Matiach
• Microsoft NERD Garage Team + MIT Externship Program
• Microsoft Development Acceleration Team:
– Dalitso Banda, Casey Hong, Karthik Rajendran, Manon Knoertzer,
Tayo Amuneke, Alejandro Buendia
• Pablo Castro, Chris Hoder, Ryan Gaspar, Henrik Neilsen,
Joseph Sirosh, Andrew Schonhoffer, Daniel Ciborowski,
Markus Cosowicz
• Azure CAT, AzureML, and Azure Search Teams
23#UnifiedAnalytics #SparkAISummit
Backup Slides
24#UnifiedAnalytics #SparkAISummit
Real or
Generated
?
Noise
Vector
Generator
Generated
Image
Training Data
Discriminator
Real or
Generated
?
Learned
Noise
Vector
Generator Generated
Image
Target Image
Pretrained ResNet 50
𝐿𝑜𝑠𝑠 𝑝𝑖𝑥𝑒𝑙 + 𝐿𝑜𝑠𝑠𝑠𝑒𝑚𝑎𝑛𝑡𝑖𝑐 × 𝜆
Inverted
Noise Vector
1
Inverted
Noise Vector
2
𝐺−1
𝐺−1
𝐺 𝐺 𝐺 𝐺 𝐺 𝐺
Code Space
Interpolation

Más contenido relacionado

La actualidad más candente

Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks
 
Hyperloglog Project
Hyperloglog ProjectHyperloglog Project
Hyperloglog ProjectKendrick Lo
 
Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...
Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...
Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...Spark Summit
 
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014Julien Le Dem
 
How EnerKey Using InfluxDB Saves Customers Millions by Detecting Energy Usage...
How EnerKey Using InfluxDB Saves Customers Millions by Detecting Energy Usage...How EnerKey Using InfluxDB Saves Customers Millions by Detecting Energy Usage...
How EnerKey Using InfluxDB Saves Customers Millions by Detecting Energy Usage...InfluxData
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architectureAdam Doyle
 
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...Databricks
 
Delta Lake with Azure Databricks
Delta Lake with Azure DatabricksDelta Lake with Azure Databricks
Delta Lake with Azure DatabricksDustin Vannoy
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Databricks
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to RedisDvir Volk
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumTathastu.ai
 
Storing time series data with Apache Cassandra
Storing time series data with Apache CassandraStoring time series data with Apache Cassandra
Storing time series data with Apache CassandraPatrick McFadin
 
Apache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In PracticeApache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In PracticeDremio Corporation
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in JavaRuben Badaró
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsFlink Forward
 
Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)Rick Branson
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm Chandler Huang
 

La actualidad más candente (20)

Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
 
The delta architecture
The delta architectureThe delta architecture
The delta architecture
 
Hyperloglog Project
Hyperloglog ProjectHyperloglog Project
Hyperloglog Project
 
Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...
Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...
Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...
 
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
 
How EnerKey Using InfluxDB Saves Customers Millions by Detecting Energy Usage...
How EnerKey Using InfluxDB Saves Customers Millions by Detecting Energy Usage...How EnerKey Using InfluxDB Saves Customers Millions by Detecting Energy Usage...
How EnerKey Using InfluxDB Saves Customers Millions by Detecting Energy Usage...
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architecture
 
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
 
Delta Lake with Azure Databricks
Delta Lake with Azure DatabricksDelta Lake with Azure Databricks
Delta Lake with Azure Databricks
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and Debezium
 
Kibana overview
Kibana overviewKibana overview
Kibana overview
 
Storing time series data with Apache Cassandra
Storing time series data with Apache CassandraStoring time series data with Apache Cassandra
Storing time series data with Apache Cassandra
 
Apache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In PracticeApache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In Practice
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in Java
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
 
Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)
 
Apache Ranger Hive Metastore Security
Apache Ranger Hive Metastore Security Apache Ranger Hive Metastore Security
Apache Ranger Hive Metastore Security
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 

Similar a The Azure Cognitive Services on Spark: Clusters with Embedded Intelligent Services

아마존의 딥러닝 기술 활용 사례 - 윤석찬 (AWS 테크니컬 에반젤리스트)
아마존의 딥러닝 기술 활용 사례 - 윤석찬 (AWS 테크니컬 에반젤리스트)아마존의 딥러닝 기술 활용 사례 - 윤석찬 (AWS 테크니컬 에반젤리스트)
아마존의 딥러닝 기술 활용 사례 - 윤석찬 (AWS 테크니컬 에반젤리스트)Amazon Web Services Korea
 
Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...
Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...
Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...Databricks
 
Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019Iulian Pintoiu
 
Hamburg Data Science Meetup - MLOps with a Feature Store
Hamburg Data Science Meetup - MLOps with a Feature StoreHamburg Data Science Meetup - MLOps with a Feature Store
Hamburg Data Science Meetup - MLOps with a Feature StoreMoritz Meister
 
AI for Good at Microsoft
AI for Good at MicrosoftAI for Good at Microsoft
AI for Good at MicrosoftMark Hamilton
 
Spark + AI Summit 2020 イベント概要
Spark + AI Summit 2020 イベント概要Spark + AI Summit 2020 イベント概要
Spark + AI Summit 2020 イベント概要Paulo Gutierrez
 
Data analytics master class: predict hotel revenue
Data analytics master class: predict hotel revenueData analytics master class: predict hotel revenue
Data analytics master class: predict hotel revenueKris Peeters
 
An Insider’s Guide to Maximizing Spark SQL Performance
 An Insider’s Guide to Maximizing Spark SQL Performance An Insider’s Guide to Maximizing Spark SQL Performance
An Insider’s Guide to Maximizing Spark SQL PerformanceTakuya UESHIN
 
Media_Entertainment_Veriticals
Media_Entertainment_VeriticalsMedia_Entertainment_Veriticals
Media_Entertainment_VeriticalsPeyman Mohajerian
 
Real Time Machine Learning Visualization With Spark
Real Time Machine Learning Visualization With SparkReal Time Machine Learning Visualization With Spark
Real Time Machine Learning Visualization With SparkChester Chen
 
An AI-Powered Chatbot to Simplify Apache Spark Performance Management
An AI-Powered Chatbot to Simplify Apache Spark Performance ManagementAn AI-Powered Chatbot to Simplify Apache Spark Performance Management
An AI-Powered Chatbot to Simplify Apache Spark Performance ManagementDatabricks
 
Getting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on KubernetesGetting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on KubernetesDatabricks
 
What's new in spark 2.0?
What's new in spark 2.0?What's new in spark 2.0?
What's new in spark 2.0?Örjan Lundberg
 
Running Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using KubernetesRunning Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using KubernetesDatabricks
 
Ml ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupMl ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupJim Dowling
 
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...ScyllaDB
 
ETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics
ETL to ML: Use Apache Spark as an end to end tool for Advanced AnalyticsETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics
ETL to ML: Use Apache Spark as an end to end tool for Advanced AnalyticsMiklos Christine
 
Scaling Up Machine Learning Experimentation at Tubi 5x and Beyond
Scaling Up Machine Learning Experimentation at Tubi 5x and BeyondScaling Up Machine Learning Experimentation at Tubi 5x and Beyond
Scaling Up Machine Learning Experimentation at Tubi 5x and BeyondScyllaDB
 
Bhadale group of companies - Our project works
Bhadale group of companies - Our project worksBhadale group of companies - Our project works
Bhadale group of companies - Our project worksVijayananda Mohire
 
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...Spark Summit
 

Similar a The Azure Cognitive Services on Spark: Clusters with Embedded Intelligent Services (20)

아마존의 딥러닝 기술 활용 사례 - 윤석찬 (AWS 테크니컬 에반젤리스트)
아마존의 딥러닝 기술 활용 사례 - 윤석찬 (AWS 테크니컬 에반젤리스트)아마존의 딥러닝 기술 활용 사례 - 윤석찬 (AWS 테크니컬 에반젤리스트)
아마존의 딥러닝 기술 활용 사례 - 윤석찬 (AWS 테크니컬 에반젤리스트)
 
Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...
Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...
Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...
 
Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019
 
Hamburg Data Science Meetup - MLOps with a Feature Store
Hamburg Data Science Meetup - MLOps with a Feature StoreHamburg Data Science Meetup - MLOps with a Feature Store
Hamburg Data Science Meetup - MLOps with a Feature Store
 
AI for Good at Microsoft
AI for Good at MicrosoftAI for Good at Microsoft
AI for Good at Microsoft
 
Spark + AI Summit 2020 イベント概要
Spark + AI Summit 2020 イベント概要Spark + AI Summit 2020 イベント概要
Spark + AI Summit 2020 イベント概要
 
Data analytics master class: predict hotel revenue
Data analytics master class: predict hotel revenueData analytics master class: predict hotel revenue
Data analytics master class: predict hotel revenue
 
An Insider’s Guide to Maximizing Spark SQL Performance
 An Insider’s Guide to Maximizing Spark SQL Performance An Insider’s Guide to Maximizing Spark SQL Performance
An Insider’s Guide to Maximizing Spark SQL Performance
 
Media_Entertainment_Veriticals
Media_Entertainment_VeriticalsMedia_Entertainment_Veriticals
Media_Entertainment_Veriticals
 
Real Time Machine Learning Visualization With Spark
Real Time Machine Learning Visualization With SparkReal Time Machine Learning Visualization With Spark
Real Time Machine Learning Visualization With Spark
 
An AI-Powered Chatbot to Simplify Apache Spark Performance Management
An AI-Powered Chatbot to Simplify Apache Spark Performance ManagementAn AI-Powered Chatbot to Simplify Apache Spark Performance Management
An AI-Powered Chatbot to Simplify Apache Spark Performance Management
 
Getting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on KubernetesGetting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on Kubernetes
 
What's new in spark 2.0?
What's new in spark 2.0?What's new in spark 2.0?
What's new in spark 2.0?
 
Running Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using KubernetesRunning Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using Kubernetes
 
Ml ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupMl ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science Meetup
 
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
 
ETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics
ETL to ML: Use Apache Spark as an end to end tool for Advanced AnalyticsETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics
ETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics
 
Scaling Up Machine Learning Experimentation at Tubi 5x and Beyond
Scaling Up Machine Learning Experimentation at Tubi 5x and BeyondScaling Up Machine Learning Experimentation at Tubi 5x and Beyond
Scaling Up Machine Learning Experimentation at Tubi 5x and Beyond
 
Bhadale group of companies - Our project works
Bhadale group of companies - Our project worksBhadale group of companies - Our project works
Bhadale group of companies - Our project works
 
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
 

Más de Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionDatabricks
 

Más de Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
 

Último

STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptxSTOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptxFurkanTasci3
 
Empowering Decisions A Guide to Embedded Analytics
Empowering Decisions A Guide to Embedded AnalyticsEmpowering Decisions A Guide to Embedded Analytics
Empowering Decisions A Guide to Embedded AnalyticsGain Insights
 
Stochastic Dynamic Programming and You.pptx
Stochastic Dynamic Programming and You.pptxStochastic Dynamic Programming and You.pptx
Stochastic Dynamic Programming and You.pptxjkmrshll88
 
Data Collection from Social Media Platforms
Data Collection from Social Media PlatformsData Collection from Social Media Platforms
Data Collection from Social Media PlatformsMahmoud Yasser
 
Air Con Energy Rating Info411 Presentation.pdf
Air Con Energy Rating Info411 Presentation.pdfAir Con Energy Rating Info411 Presentation.pdf
Air Con Energy Rating Info411 Presentation.pdfJasonBoboKyaw
 
Bengaluru Tableau UG event- 2nd March 2024 Q1
Bengaluru Tableau UG event- 2nd March 2024 Q1Bengaluru Tableau UG event- 2nd March 2024 Q1
Bengaluru Tableau UG event- 2nd March 2024 Q1bengalurutug
 
Data Analytics Fundamentals: data analytics types.potx
Data Analytics Fundamentals: data analytics types.potxData Analytics Fundamentals: data analytics types.potx
Data Analytics Fundamentals: data analytics types.potxEmmanuel Dauda
 
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptxSTOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptxFurkanTasci3
 
Understanding the Impact of video length on student performance
Understanding the Impact of video length on student performanceUnderstanding the Impact of video length on student performance
Understanding the Impact of video length on student performancePrithaVashisht1
 
TCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI PipelinesTCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI PipelinesTimothy Spann
 
The market for cross-border mortgages in Europe
The market for cross-border mortgages in EuropeThe market for cross-border mortgages in Europe
The market for cross-border mortgages in Europe321k
 
2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-Profits2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-ProfitsTimothy Spann
 
How to Build an Experimentation Culture for Data-Driven Product Development
How to Build an Experimentation Culture for Data-Driven Product DevelopmentHow to Build an Experimentation Culture for Data-Driven Product Development
How to Build an Experimentation Culture for Data-Driven Product DevelopmentAggregage
 
Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...
Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...
Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...ferisulianta.com
 
Microeconomic Group Presentation Apple.pdf
Microeconomic Group Presentation Apple.pdfMicroeconomic Group Presentation Apple.pdf
Microeconomic Group Presentation Apple.pdfmxlos0
 
Unleashing Datas Potential - Mastering Precision with FCO-IM
Unleashing Datas Potential - Mastering Precision with FCO-IMUnleashing Datas Potential - Mastering Precision with FCO-IM
Unleashing Datas Potential - Mastering Precision with FCO-IMMarco Wobben
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsNeo4j
 
Brain Tumor Detection with Machine Learning.pptx
Brain Tumor Detection with Machine Learning.pptxBrain Tumor Detection with Machine Learning.pptx
Brain Tumor Detection with Machine Learning.pptxShammiRai3
 
Deloitte+RedCross_Talk to your data with Knowledge-enriched Generative AI.ppt...
Deloitte+RedCross_Talk to your data with Knowledge-enriched Generative AI.ppt...Deloitte+RedCross_Talk to your data with Knowledge-enriched Generative AI.ppt...
Deloitte+RedCross_Talk to your data with Knowledge-enriched Generative AI.ppt...Neo4j
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...PrithaVashisht1
 

Último (20)

STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptxSTOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptx
 
Empowering Decisions A Guide to Embedded Analytics
Empowering Decisions A Guide to Embedded AnalyticsEmpowering Decisions A Guide to Embedded Analytics
Empowering Decisions A Guide to Embedded Analytics
 
Stochastic Dynamic Programming and You.pptx
Stochastic Dynamic Programming and You.pptxStochastic Dynamic Programming and You.pptx
Stochastic Dynamic Programming and You.pptx
 
Data Collection from Social Media Platforms
Data Collection from Social Media PlatformsData Collection from Social Media Platforms
Data Collection from Social Media Platforms
 
Air Con Energy Rating Info411 Presentation.pdf
Air Con Energy Rating Info411 Presentation.pdfAir Con Energy Rating Info411 Presentation.pdf
Air Con Energy Rating Info411 Presentation.pdf
 
Bengaluru Tableau UG event- 2nd March 2024 Q1
Bengaluru Tableau UG event- 2nd March 2024 Q1Bengaluru Tableau UG event- 2nd March 2024 Q1
Bengaluru Tableau UG event- 2nd March 2024 Q1
 
Data Analytics Fundamentals: data analytics types.potx
Data Analytics Fundamentals: data analytics types.potxData Analytics Fundamentals: data analytics types.potx
Data Analytics Fundamentals: data analytics types.potx
 
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptxSTOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptx
 
Understanding the Impact of video length on student performance
Understanding the Impact of video length on student performanceUnderstanding the Impact of video length on student performance
Understanding the Impact of video length on student performance
 
TCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI PipelinesTCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI Pipelines
 
The market for cross-border mortgages in Europe
The market for cross-border mortgages in EuropeThe market for cross-border mortgages in Europe
The market for cross-border mortgages in Europe
 
2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-Profits2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-Profits
 
How to Build an Experimentation Culture for Data-Driven Product Development
How to Build an Experimentation Culture for Data-Driven Product DevelopmentHow to Build an Experimentation Culture for Data-Driven Product Development
How to Build an Experimentation Culture for Data-Driven Product Development
 
Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...
Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...
Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...
 
Microeconomic Group Presentation Apple.pdf
Microeconomic Group Presentation Apple.pdfMicroeconomic Group Presentation Apple.pdf
Microeconomic Group Presentation Apple.pdf
 
Unleashing Datas Potential - Mastering Precision with FCO-IM
Unleashing Datas Potential - Mastering Precision with FCO-IMUnleashing Datas Potential - Mastering Precision with FCO-IM
Unleashing Datas Potential - Mastering Precision with FCO-IM
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge Graphs
 
Brain Tumor Detection with Machine Learning.pptx
Brain Tumor Detection with Machine Learning.pptxBrain Tumor Detection with Machine Learning.pptx
Brain Tumor Detection with Machine Learning.pptx
 
Deloitte+RedCross_Talk to your data with Knowledge-enriched Generative AI.ppt...
Deloitte+RedCross_Talk to your data with Knowledge-enriched Generative AI.ppt...Deloitte+RedCross_Talk to your data with Knowledge-enriched Generative AI.ppt...
Deloitte+RedCross_Talk to your data with Knowledge-enriched Generative AI.ppt...
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...
 

The Azure Cognitive Services on Spark: Clusters with Embedded Intelligent Services

  • 1. Mark Hamilton, Microsoft, marhamil@microsoft.com Anand Raman, Microsoft, aram@microsoft.com The Azure Cognitive Services on Spark: Clusters with Embedded Intelligent Services #UnifiedAnalytics #SparkAISummit
  • 2. Overview • The Cognitive Services on Spark – Basic Usage – Fluent Design • HTTP on Spark – Architecture and Principles • Clusters with Embedded Services – Kubernetes, Databricks • Examples – GANs + the Metropolitan Museum of Art 2#UnifiedAnalytics #SparkAISummit
  • 3. Motivation • Azure Cognitive Services provide high quality pre- built intelligent services • No need for time intensive model training or deployment • Can quickly create intelligent applications • Leverage Microsoft Research and Azure ML 3#UnifiedAnalytics #SparkAISummit • http://www.seeingai.com
  • 4. Object, scene, and activity detection Face recognition and identification Celebrity and landmark recognition Emotion recognition Text and handwriting recognition (OCR) Customizable image recognition Video metadata, audio, and keyframe extraction and analysis Explicit or offensive content moderation Speech transcription (speech-to-text) Custom speech models for unique vocabularies or complex environment Text-to-speech Custom Voice Real-time speech translation Customizable speech transcription and translation Speaker identification and verification Language detection Named entity recognition Key phrase extraction Text sentiment analysis Multilingual and contextual spell checking Explicit or offensive text content moderation PII detection for text moderation Text translation Customizable text translation Contextual language understanding Q&A extraction from unstructured text Knowledge base creation from collections of Q&As Semantic matching for knowledge bases Customizable content personalization learning Ad-free web, news, image, and video search results Trends for video, news Image identification, classification and knowledge extraction Identification of similar images and products Named entity recognition and classification Knowledge acquisition for named entities Search query autosuggest Ad-free custom search engine creation Vision Speech Language Knowledge Search
  • 5. Azure Cognitive Services on Spark • Easy to use integration between Spark and the Azure Cognitive Services • Composable and pipelinable with all other SparkML models! • Python, Scala, R (Beta) 5#UnifiedAnalytics #SparkAISummit val df = new TextSentiment() .setTextCol(“text”) .setOutputCol(“sentiment”) .transform(inputs)
  • 7. Fluent API for Advanced Orchestration • Any parameter can be set with a dataframe column or with a single value 7#UnifiedAnalytics #SparkAISummit new BingImageSearch() .setQueryCol(“queries”) queries Cat Dog Antelope Car Bob Ross Get results for multiple search terms:
  • 8. new BingImageSearch() .setQuery(“cats”) .setOffsetCol(“offsets”) Fluent API for Advanced Orchestration • Any parameter can be set with a dataframe column or with a single value 8#UnifiedAnalytics #SparkAISummit offsets 0 100 200 300 400 Get the first N pages of Bing for a specific term:
  • 9. Fluent API for Advanced Orchestration • Any parameter can be set with a dataframe column or with a single value 9#UnifiedAnalytics #SparkAISummit offsets queries keys 0 Cat 17… 100 Cat 17… 0 Tree 3e… 100 Tree 4q… 0 Car G1… Get the get fist 200 results for many terms using several different accounts: new BingImageSearch() .setQueryCol(“queries”) .setOffsetCol(“offsets”) .setKeyCol(“keys”)
  • 10. High Performance Capabilities OOTB • Asynchronous Parallelism (P) • Automatic Batching (B) • Automatic Retries – Exponential Back-offs (EBO) – Backpressure (BP) 10#UnifiedAnalytics #SparkAISummit Features Time (s) Errors # None 30.8 18993 EBO+BP 1163.0 0 EBO+BP+B 57.1 0 EBO+BP+B+P 49.7 0 10 nodes, 20k Requests, 1k req/min limited service
  • 11. • Full Integration between HTTP Protocol and Spark SQL • Spark as a Microservice Orchestrator • Spark + X 11#UnifiedAnalytics #SparkAISummit df = SimpleHTTPTransformer() .setInputParser(JSONInputParser()) .setOutputParser(JSONOutputParser() .setDataType(schema)) .setOutputCol("results") .setUrl(…) on
  • 12. 12#UnifiedAnalytics #SparkAISummit on Spark Worker Partition Partition Partition Client Client Client Web Service Spark Worker Partition Partition Partition Client Client Client Local Service Local Service Local Service HTTP Requests and Responses
  • 13. Cognitive Service Containers 13#UnifiedAnalytics #SparkAISummit Now In Public Preview • No app changes & Compatible with full Cognitive Services feature-set • Support for 6 key AI capabilities: • Key Phrase Extraction • Language Detection • Sentiment Analysis • Face & Emotion Detection • OCR / Text Recognition • Language Understanding • Run & manage locally, Try for free • Connect to Billing service for report back, unified billing with on-cloud and off-cloud transactions • Additional Capabilities coming soon (e.g. Speech)
  • 14. #UnifiedAnalytics #SparkAISummit 14 Clusters with Embedded Services • Deploy cognitive services directly onto cluster worker nodes • Bring the compute to the data • Use low latency in- machine networking Spark Worker Spark Scala Process PySpark Local Cognitive Service Pyspark Protocol HTTP
  • 15. Azure Kubernetes Service + Helm • Works on any k8s cluster • Helm: Package Manager for Kubernetes 15#UnifiedAnalytics #SparkAISummit Kubernetes (AKS, ACS, GKE, On-Prem etc) K8s workerK8s worker Spark Worker Spark Worker K8s worker Cognitive Service Container HTTP on Spark Spark Worker Cognitive Service Container HTTP on Spark Spark Worker Cognitive Service Container HTTP on Spark Spark Serving Load Balancer Jupyter, Zepplin, LIVY, or Spark Submit LB Zepplin Jupyter Storage or other Databases Cloud Cognitive Services Spark Serving Hotpath HTTP on Spark Spark Readers REST Requests to Deployed Models Submit Jobs, Run Notebooks, Manage Cluster, etc Users / Apps helm repo add mmlspark https://dbanda.github.io/charts helm install mmlspark/spark --set localTextApi=true Dalitso Banda, dbanda@microsoft.com Microsoft AI Development Acceleration Program
  • 16. Creating a Visual Search Engine for the Metropolitan Museum of Art 16#UnifiedAnalytics #SparkAISummit https://gen.studio
  • 17. Intelligent Image Annotation • The MET Released 400k Images under Open Access • Pipe images through Computer Vision API to annotate image for searching 17#UnifiedAnalytics #SparkAISummit A picture containing a person A picture containing a glass, cup A fish swimming underwater Query Image: Describe Image Output: Deep Feature Nearest Neighbors:
  • 18. Reverse Image Search Architecture 18#UnifiedAnalytics #SparkAISummit Filters from Zeiler + Fergus 2013 Query Image ResNet Featurizer Deep Features Closest Match Fast Nearest Neighbor Lookup MMLSpark SparkML LSH or Annoy
  • 19. Example Nearest Neighbors 19#UnifiedAnalytics #SparkAISummit QueryImages Nearest Neighbors
  • 20. Spark x Azure Search • Azure Search Sink for Spark • Allows for pushing thousands of documents per second into Azure Search instances • Built on HTTP on Spark • Use to create search APIs on top of Spark Dataframe 20#UnifiedAnalytics #SparkAISummit
  • 21. 21#UnifiedAnalytics #SparkAISummit Microsoft Machine Learning for Apache Spark v0.16 Microsoft’s Open Source Contributions to Apache Spark www.aka.ms/spark Azure/mmlspark Cognitive Services Spark Serving Model Interpretability LightGBM Gradient Boosting Deep Networks with CNTK HTTP on Spark
  • 22. Conclusions • Can now embed Cognitive Services into Spark Workflows • Can harness Spark Cluster for Microservices • Get started now with interactive examples! 22#UnifiedAnalytics #SparkAISummit www.aka.ms/spark Contact: marhamil@microsoft.com mmlspark-support@microsoft.com Azure/mmlspark Help us advance Spark:
  • 23. Thanks To • Sudarshan Raghunathan • Ilya Matiach • Microsoft NERD Garage Team + MIT Externship Program • Microsoft Development Acceleration Team: – Dalitso Banda, Casey Hong, Karthik Rajendran, Manon Knoertzer, Tayo Amuneke, Alejandro Buendia • Pablo Castro, Chris Hoder, Ryan Gaspar, Henrik Neilsen, Joseph Sirosh, Andrew Schonhoffer, Daniel Ciborowski, Markus Cosowicz • Azure CAT, AzureML, and Azure Search Teams 23#UnifiedAnalytics #SparkAISummit
  • 26. Learned Noise Vector Generator Generated Image Target Image Pretrained ResNet 50 𝐿𝑜𝑠𝑠 𝑝𝑖𝑥𝑒𝑙 + 𝐿𝑜𝑠𝑠𝑠𝑒𝑚𝑎𝑛𝑡𝑖𝑐 × 𝜆
  • 27. Inverted Noise Vector 1 Inverted Noise Vector 2 𝐺−1 𝐺−1 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 Code Space Interpolation