SlideShare una empresa de Scribd logo
1 de 24
Confidential - do not distribute
Hotels.com’s journey to becoming
anAlgorithmic Business
Matthew Fryer
VP,ChiefDataScienceOfficer
mfryer@hotels.com
Confidential - do not distribute
Part of Expedia, Inc. family
>385,000 properties
89 countries
39 languages
>30m Hotels.com Rewards Members
Home of Captain Obvious
Billions of Recommendations, based on real-time Data per day
Hotels.com
Confidential - do not distribute
Confidential - do not distribute
Confidential - do not distribute
5
Data Science Engineering Front End Development
Confidential - do not distribute
“Artificial Intelligence Will Be
Travel’s Next Big Thing”
Barry Diller
Chairman & Senior Executive,
Expedia, Inc.
3M’s are disruptive
technology
Mobile
Messaging / NLP
Machine Learning
Confidential - do not distribute
Confidential - do not distribute 8
Core Elements of our Data Science Cloud Platform
Databricks Unified Platform
Maestro – Our Internally Developed
Platform on AWS
(EMR, Spark, R-Studio, Intellij,SBT, Jupyter,
Zeppelin, Unit / QA, Metastore,Apache Airflow,
Keras, Tensorflow)
Proof of Concept on Google
Cloud, Beam, Spark &
Tensorflow
Confidential - do not distribute
Databricks Unified Platform
Chart is in 1 hour blocks, y axis = number of 32 core instances
9
• Key asset to the success of data science at Hotels.com
• Key in driving up data scientist productivity / efficiency / flexibility
• Helps make our data science lifecycle operate much easier and
faster driving speed to market
• Reliable / secure + facilitates ‘Highly Elastic’ workflows exploiting
cost effective spot instance on AWS.
Confidential - do not distribute
The hidden secret of data science and AI
Typically data scientists are investing large amounts of
time in feature / data engineering areas which are ripe for
a technology solution
10
Confidential - do not distribute 11
ALPs – Algorithm Lifecycle Pipeline Service
The end to end ML Platform
Confidential - do not distribute
Site Data
TrainingScoring & serving
Hotels.com
Training
Real-time
scoring / bandit
Ingestion
Cache
Service
Data pipelines
Data set generation,
feature extraction
Reporting
Train & deploy
model
Update feedback loop
with CTR, GP etc.
Clickstream
Experiment
Experiment
Store &
serve scores
Assign
variant
Calculate scores
Data
pipelines
Frameworks
& Platforms
Lifecycle
/ Deploy
Develop and
maintain ML/ AI
pipelines
Methods to
research & exploit
ML & AI
innovation
Implement ML / AI
in production
Data
capture
Accessible data
Confidential - do not distribute
Reference: The Influence of Visuals in Online Hotel Research and Booking Behaviour
Images are an important factor while choosing a hotel
13
0% 10% 20% 30% 40% 50% 60% 70% 80%
Loyalty Program
Reviews
Hotel Brand
Star Rating
Destination Info
Images
Hotel Info
Factors other than price/location
Very Imporant/Important Important Very Important
Confidential - do not distribute
Computer Vision problems we try to tackle
14
Near Duplicate Detection
Scene Classification Image Ranking
Confidential - do not distribute 15
Tagged as Bathroom
Confidential - do not distribute 16
GPU’s quickly became key, took a large effort to optimize using
Keras + Tensorflow (Inception v3 + ResNet)
493
67
20
7
4
1
10
100
1000
12-CPU 1-GPU 1-GPU +
limited cache
16-GPU +
limited cache
16-GPU + full
cache
Days CIFAR2
Expedia Small
15
2.5
0
5
10
15
20
16-GPU + full cache Optimized
Days
Confidential - do not distribute
Near Duplicate Detection: Real world examples
17
Non-Duplicates – probability 100%
Non-Duplicates – probability 95.91%
Duplicates – probability 97.98%
Duplicates – probability 98.43%
Confidential - do not distribute
ROOM/BATHROOM
Using the model: Real world examples
18
EXTERIOR/HOTEL INTERIOR/SEATING_LO
BBY
ROOM/LIVING_ROOM
ROOM/GUESTROOM
FACILITIES/DINING
INTERIOR/SEATING_LOBBY
FACILITIES/POOL
Confidential - do not distribute
Accuracy & Confusion Matrix
19
• After many manual / long
winded iterations and
regularization processes
tuning hyperparameters
• We achieved good
accuracy and low
confusion matrix
Confidential - do not distribute
Optimizing the photo order for improved customer
experiences
20
Original Model
Reference: Radisson Blu Edwardian Berkshire Hotel, London
Confidential - do not distribute
Finding the right hotel in our marketplace is core to
our customers needs.
Confidential - do not distribute
Kensington
Bloomsbury
Heathrow
Canary
Wharf
Paddington
Westminster
London City
Airport
Chelsea
Battersea
Wimbledon
Wembley
City of
London
As an example different user segments like to stay in
different locations
Confidential - do not distribute 23
Utility
Utility
Utility
just browsing! BOOK!Intent
(click)
Confidential - do not distribute
Thank you
mfryer@hotels.com
https://uk.linkedin.com/in/matthewfryer
@mattfryer

Más contenido relacionado

Más de Jen Aman

Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In SparkYggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In SparkJen Aman
 
Time-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersTime-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersJen Aman
 
Deploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using SparkDeploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using SparkJen Aman
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityJen Aman
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityJen Aman
 
Low Latency Execution For Apache Spark
Low Latency Execution For Apache SparkLow Latency Execution For Apache Spark
Low Latency Execution For Apache SparkJen Aman
 
Efficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out DatabasesEfficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out DatabasesJen Aman
 
Livy: A REST Web Service For Apache Spark
Livy: A REST Web Service For Apache SparkLivy: A REST Web Service For Apache Spark
Livy: A REST Web Service For Apache SparkJen Aman
 
GPU Computing With Apache Spark And Python
GPU Computing With Apache Spark And PythonGPU Computing With Apache Spark And Python
GPU Computing With Apache Spark And PythonJen Aman
 
Spark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 FuriousSpark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 FuriousJen Aman
 
Building Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemMLBuilding Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemMLJen Aman
 
Spark on Mesos
Spark on MesosSpark on Mesos
Spark on MesosJen Aman
 
Elasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlibElasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlibJen Aman
 
Spark at Bloomberg: Dynamically Composable Analytics
Spark at Bloomberg:  Dynamically Composable Analytics Spark at Bloomberg:  Dynamically Composable Analytics
Spark at Bloomberg: Dynamically Composable Analytics Jen Aman
 
Spark Uber Development Kit
Spark Uber Development KitSpark Uber Development Kit
Spark Uber Development KitJen Aman
 
EclairJS = Node.Js + Apache Spark
EclairJS = Node.Js + Apache SparkEclairJS = Node.Js + Apache Spark
EclairJS = Node.Js + Apache SparkJen Aman
 
Spark: Interactive To Production
Spark: Interactive To ProductionSpark: Interactive To Production
Spark: Interactive To ProductionJen Aman
 
High-Performance Python On Spark
High-Performance Python On SparkHigh-Performance Python On Spark
High-Performance Python On SparkJen Aman
 
Scalable Deep Learning Platform On Spark In Baidu
Scalable Deep Learning Platform On Spark In BaiduScalable Deep Learning Platform On Spark In Baidu
Scalable Deep Learning Platform On Spark In BaiduJen Aman
 
Scaling Machine Learning To Billions Of Parameters
Scaling Machine Learning To Billions Of ParametersScaling Machine Learning To Billions Of Parameters
Scaling Machine Learning To Billions Of ParametersJen Aman
 

Más de Jen Aman (20)

Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In SparkYggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
 
Time-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersTime-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity Clusters
 
Deploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using SparkDeploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using Spark
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
 
Low Latency Execution For Apache Spark
Low Latency Execution For Apache SparkLow Latency Execution For Apache Spark
Low Latency Execution For Apache Spark
 
Efficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out DatabasesEfficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out Databases
 
Livy: A REST Web Service For Apache Spark
Livy: A REST Web Service For Apache SparkLivy: A REST Web Service For Apache Spark
Livy: A REST Web Service For Apache Spark
 
GPU Computing With Apache Spark And Python
GPU Computing With Apache Spark And PythonGPU Computing With Apache Spark And Python
GPU Computing With Apache Spark And Python
 
Spark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 FuriousSpark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 Furious
 
Building Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemMLBuilding Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemML
 
Spark on Mesos
Spark on MesosSpark on Mesos
Spark on Mesos
 
Elasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlibElasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlib
 
Spark at Bloomberg: Dynamically Composable Analytics
Spark at Bloomberg:  Dynamically Composable Analytics Spark at Bloomberg:  Dynamically Composable Analytics
Spark at Bloomberg: Dynamically Composable Analytics
 
Spark Uber Development Kit
Spark Uber Development KitSpark Uber Development Kit
Spark Uber Development Kit
 
EclairJS = Node.Js + Apache Spark
EclairJS = Node.Js + Apache SparkEclairJS = Node.Js + Apache Spark
EclairJS = Node.Js + Apache Spark
 
Spark: Interactive To Production
Spark: Interactive To ProductionSpark: Interactive To Production
Spark: Interactive To Production
 
High-Performance Python On Spark
High-Performance Python On SparkHigh-Performance Python On Spark
High-Performance Python On Spark
 
Scalable Deep Learning Platform On Spark In Baidu
Scalable Deep Learning Platform On Spark In BaiduScalable Deep Learning Platform On Spark In Baidu
Scalable Deep Learning Platform On Spark In Baidu
 
Scaling Machine Learning To Billions Of Parameters
Scaling Machine Learning To Billions Of ParametersScaling Machine Learning To Billions Of Parameters
Scaling Machine Learning To Billions Of Parameters
 

Último

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 

Último (20)

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 

Hotels.com's Journey to Becoming an Algorithmic Business...Exponential Growth in Data Science Whilst Migrating to Apache Spark+Cloud All at the Same Time with Matt Fryer

  • 1. Confidential - do not distribute Hotels.com’s journey to becoming anAlgorithmic Business Matthew Fryer VP,ChiefDataScienceOfficer mfryer@hotels.com
  • 2. Confidential - do not distribute Part of Expedia, Inc. family >385,000 properties 89 countries 39 languages >30m Hotels.com Rewards Members Home of Captain Obvious Billions of Recommendations, based on real-time Data per day Hotels.com
  • 3. Confidential - do not distribute
  • 4. Confidential - do not distribute
  • 5. Confidential - do not distribute 5 Data Science Engineering Front End Development
  • 6. Confidential - do not distribute “Artificial Intelligence Will Be Travel’s Next Big Thing” Barry Diller Chairman & Senior Executive, Expedia, Inc. 3M’s are disruptive technology Mobile Messaging / NLP Machine Learning
  • 7. Confidential - do not distribute
  • 8. Confidential - do not distribute 8 Core Elements of our Data Science Cloud Platform Databricks Unified Platform Maestro – Our Internally Developed Platform on AWS (EMR, Spark, R-Studio, Intellij,SBT, Jupyter, Zeppelin, Unit / QA, Metastore,Apache Airflow, Keras, Tensorflow) Proof of Concept on Google Cloud, Beam, Spark & Tensorflow
  • 9. Confidential - do not distribute Databricks Unified Platform Chart is in 1 hour blocks, y axis = number of 32 core instances 9 • Key asset to the success of data science at Hotels.com • Key in driving up data scientist productivity / efficiency / flexibility • Helps make our data science lifecycle operate much easier and faster driving speed to market • Reliable / secure + facilitates ‘Highly Elastic’ workflows exploiting cost effective spot instance on AWS.
  • 10. Confidential - do not distribute The hidden secret of data science and AI Typically data scientists are investing large amounts of time in feature / data engineering areas which are ripe for a technology solution 10
  • 11. Confidential - do not distribute 11 ALPs – Algorithm Lifecycle Pipeline Service The end to end ML Platform
  • 12. Confidential - do not distribute Site Data TrainingScoring & serving Hotels.com Training Real-time scoring / bandit Ingestion Cache Service Data pipelines Data set generation, feature extraction Reporting Train & deploy model Update feedback loop with CTR, GP etc. Clickstream Experiment Experiment Store & serve scores Assign variant Calculate scores Data pipelines Frameworks & Platforms Lifecycle / Deploy Develop and maintain ML/ AI pipelines Methods to research & exploit ML & AI innovation Implement ML / AI in production Data capture Accessible data
  • 13. Confidential - do not distribute Reference: The Influence of Visuals in Online Hotel Research and Booking Behaviour Images are an important factor while choosing a hotel 13 0% 10% 20% 30% 40% 50% 60% 70% 80% Loyalty Program Reviews Hotel Brand Star Rating Destination Info Images Hotel Info Factors other than price/location Very Imporant/Important Important Very Important
  • 14. Confidential - do not distribute Computer Vision problems we try to tackle 14 Near Duplicate Detection Scene Classification Image Ranking
  • 15. Confidential - do not distribute 15 Tagged as Bathroom
  • 16. Confidential - do not distribute 16 GPU’s quickly became key, took a large effort to optimize using Keras + Tensorflow (Inception v3 + ResNet) 493 67 20 7 4 1 10 100 1000 12-CPU 1-GPU 1-GPU + limited cache 16-GPU + limited cache 16-GPU + full cache Days CIFAR2 Expedia Small 15 2.5 0 5 10 15 20 16-GPU + full cache Optimized Days
  • 17. Confidential - do not distribute Near Duplicate Detection: Real world examples 17 Non-Duplicates – probability 100% Non-Duplicates – probability 95.91% Duplicates – probability 97.98% Duplicates – probability 98.43%
  • 18. Confidential - do not distribute ROOM/BATHROOM Using the model: Real world examples 18 EXTERIOR/HOTEL INTERIOR/SEATING_LO BBY ROOM/LIVING_ROOM ROOM/GUESTROOM FACILITIES/DINING INTERIOR/SEATING_LOBBY FACILITIES/POOL
  • 19. Confidential - do not distribute Accuracy & Confusion Matrix 19 • After many manual / long winded iterations and regularization processes tuning hyperparameters • We achieved good accuracy and low confusion matrix
  • 20. Confidential - do not distribute Optimizing the photo order for improved customer experiences 20 Original Model Reference: Radisson Blu Edwardian Berkshire Hotel, London
  • 21. Confidential - do not distribute Finding the right hotel in our marketplace is core to our customers needs.
  • 22. Confidential - do not distribute Kensington Bloomsbury Heathrow Canary Wharf Paddington Westminster London City Airport Chelsea Battersea Wimbledon Wembley City of London As an example different user segments like to stay in different locations
  • 23. Confidential - do not distribute 23 Utility Utility Utility just browsing! BOOK!Intent (click)
  • 24. Confidential - do not distribute Thank you mfryer@hotels.com https://uk.linkedin.com/in/matthewfryer @mattfryer

Notas del editor

  1. Comments I checked the IR website for the latest data that we have made public (including Annual Report) Was planning to only briefly linger on this slide, will call out a few data points especially recommendation volume + loyalty member etc
  2. Comments (This slide has a build on it, you can see it by slideshow view) General thankyou to Spark Summit and Databricks for inviting me Share goal of presentation, eg highlight focus on transforming customer experiences with algorithms Hotels.com / our move to spark / cloud in the last year and share some of the interesting things we are doing Link to the slide : highlight it used to feel like there was data everywhere the size of the torch is growing every day
  3. Comments Create / Build of new Data Science Function, Move to Public Cloud (mainly AWS + some Azure / GCP) from On Prem, Move to Spark from SAS / Core Hadoop in all in the last year As per the title , comment we are entering a golden age of data science where we can now use data to find patterns, build algo to help customer experiences, Imagine the world when we are enter adulthood aka maturity Given the potential I think we are all toddlers with so much more to learn and figure out Better to be fast first (example of testing and freedom to innovate) and ideally often being correct is a bonus!
  4. Comments It has taken complete teamwork from across the business to deliver success and well aligned pipelines i) Built in creating a data science function in the last 2 years, it is team effort and data science / algorithms sit on the back on the workhorse of engineers ii) This allows algorithms to make choices and understand patterns to optimize for customer experiences rather than limited optimization. iii) Part of the secret has been matching data scientists with dedicated data, network, devops and software engineers on the platform iv) Create a community (big group hug) to share approach and work together for success Overall we have >20 amazing data scientist / >15 dedicated data science engineers + growing fast + 100’s of analysts and engineers
  5. Comments machine learning and artificial intelligence will combine to manage companies’ big data troves and there will be layers of innovation “tacked onto distribution systems.” Key has been support from the very top of the company Call out support from the very top has been vital to move forward at pace with wider organization alignment Dara’s comment from last earning call of the 3 M’s and organic Intelligence I think AI is a good deal down the road. I think right now, we are more dependent on OI, organic intelligence, here; of folks here at the company. I think as far as disruptive technology, I do like to talk about the 3 Ms, and it's not disruptive. It's just happening. One is mobile for us. And right now with most brands, over 1/3 of our transactions are mobile. Over half of our traffic is mobile. And the cool thing about mobile is it's always on and it gives you location context. The second M for us that's emerging especially in the APAC markets, are messaging. And what messaging does for us is it allows two-way communication at any time, but it also combines identity with that communication. And once you have identity, you can start communicating with someone on a one-to-one basis. Most of our systems right now are built to serve the average. This is a consumer where you come to Expedia, most of our systems are built to serve the average consumer. Now more and more we can optimize to the specific customer, and you combine that with a third M, which is machine learning, it is only possible to optimize to the individual based on very significant amounts of data, very significant amounts of interaction so that you can start treating every single customer in a different way. You can go back to the olden days when your travel agent knew exactly what you wanted. This is going to be disruptive, but it's going to be a slow disruption as we learn more
  6. Comments i)Hotels.com / Expedia in the first travel revolution, empowered consumers globally with the transparency of price, variety, choice and content ii)Machine Learning / Algorithms are creating the next travel revolution transforming consumer experiences and effectively powering the turnaround of the travel agent iii) Future of having the conversation with the travel agent with a modern twist. to having a messaging (20 years we re-invented travel and democratised travel and information (green screen around), now we can with data science and spark power personalised experiences and give customers access to the best experience of both. iv) Machine Learning is now at the strategic core to the growth and future of Hotels.com
  7. Databricks Optimised for Data Science Easy to use UI (Notebook) Advanced Job Scheduling Spot pool capability Great for algorithm development & feature engineering (aka ETL) Awesome support from Spark Engineers Maestro – AWS Platform Integrated platform for large model development and deployment Advanced cluster support including Maven / Artifactory Maestro Framework (Internal extension to Spark ML) Individual environment per Data Scientist Fast to R&D / Fully ephemeral Google Launched PoC on Google Cloud Evaluate Google approach to AI / Machine Learning including Tensorflow GPU NLP / Vision API’s ML Engine Datalab notebooks Apache Bean / Dataflow It is the code-base responsible for building Machine Learning models for HCOM. It is developed in-house using Scala 2.11 & Spark 2.0 (migrating to 2.1) It is a ML framework which: Standardizes and speeds up the way we build models. Provides all the necessary tools for training, testing & validating models Google PoC
  8. Facilitated use of ‘extreme elasticity’ incl spot instances Saving on cost whilst using huge compute power Speed to market dramatically increases Spot instance costs ~10-20% of On Demand cost Databricks is making things easy and doing image classification across the Expedia portfolio   Highlight value prop that you get w Databricks over open source Spark    Works out of the box. Elasticity, ease of use, notebooks, etc. 
  9. Across millions of hotel and user submitted images Critical use case on mobile
  10. Comment Highlight images are not algo optimized historically + Now have 100 of thousands of User Photos to categorise and sort Built in Spark and Tensorflow, Convolution Neural Net Approach with some surprisingly good accuracy, would recommend everyone to try there hand at deep learning.
  11. Target: Detect near-duplicate images on the PDP. Dataset: A synthetic dataset produced by applying transformations on hotel photos. Size ~ 6 million images. Network: A custom Siamese network on top of the Scene Detection classifier. Results: 99.97% accuracy on the synthetic dataset. Validated on real world images. Important to use your own data, we obtained 82% from off the shelf deep learning API’s (such as Google Vision API etc.)
  12. Linking in customer feedback loops with the neural nets to begin optimizing the most relevant sort of images for different customers.
  13. Comments Personalisation especially MicroSegmentation is crucial, taking max signals, spark has enabled us to cope with the scale of data Popularity balanced with Diversity / Quality and Niche Customer Needs All in the context of linking searches of users doing 4-9 different searches.
  14. Size increase of 20x the data, covers attribution of all customer clicks (typically 4-9 searches per user) 10x the data columns Facilitates personalization / microsegments