SlideShare una empresa de Scribd logo
1 de 49
Descargar para leer sin conexión
Cooking Predictions
A real case in the hotel sector
Andrés González
Big Data Prediction Manager
andresg@clevertask.com
Twitter: @data_lytics
CleverTask Solutions SL - Big Data Business Unit 3
Agenda
Business Need1
“Cooking” Predictions2
Gathering ingredients3
Cleaning and Transforming4
The recipe (the model)5
Tasting the dish6
CleverTask Solutions SL - Big Data Business Unit 4
Hotel Sector
• % room occupation.
• Cancellation risk.
• Income.
CleverTask Solutions SL - Big Data Business Unit 5
Business Need
Predict client’s
NATIONALITY
BEFORE
client
check-in
CleverTask Solutions SL - Big Data Business Unit 6
Staff Arrangement
Languages
CleverTask Solutions SL - Big Data Business Unit 7
Prepare Activities
CleverTask Solutions SL - Big Data Business Unit 8
Kitchen Arrangement
CleverTask Solutions SL - Big Data Business Unit 9
Customize Stay
CleverTask Solutions SL - Big Data Business Unit 10
… Details Make the
Difference
In short, because…
CleverTask Solutions SL - Big Data Business Unit 11
Machine Learning basics
CleverTask Solutions SL - Big Data Business Unit 12
Machine Learning basics
Can you find patterns in this data?
CleverTask Solutions SL - Big Data Business Unit
13
Machine Learning basics
Historical Data Training Prediction
New Data Re-Training
CleverTask Solutions SL - Big Data Business Unit 14
Agenda
Business Need1
“Cooking” Predictions2
Gathering ingredients3
Cleaning and Transforming4
The recipe (the model)5
Tasting the dish6
CleverTask Solutions SL - Big Data Business Unit
Tasting the Dish
Cooking
Transforming
15
“Cooking” Predictions2
Go to the market to buy ingredients
Cleaning
CleverTask Solutions SL - Big Data Business Unit
Evaluating Prediction Quality
Training the Model
Transforming and Feature Engineering
15
“Cooking” Predictions2
Gathering RAW data
Cleaning Data
CleverTask Solutions SL - Big Data Business Unit 16
Agenda
Business Need1
“Cooking” Predictions2
Gathering ingredients3
Cleaning and Transforming4
The recipe (the model)5
Tasting the dish6
CleverTask Solutions SL - Big Data Business Unit 17
Where does Data come from?
Own Website
Partners Websites
RAW Data
CleverTask Solutions SL - Big Data Business Unit 18
RAW Data
One year historical
reservation data
(.xlsx file)
Characteristics
•260.000 reservations
•80 fields
•57 categorical
•9 numeric
•10 date
•3 text
•1 incorrect field
•Size: 150 MB
CleverTask Solutions SL - Big Data Business Unit 19
RAW Data
CleverTask Solutions SL - Big Data Business Unit 20
Agenda
Business Need1
“Cooking” Predictions2
Gathering ingredients3
Cleaning and Transforming4
The recipe (the model)5
Tasting the dish6
CleverTask Solutions SL - Big Data Business Unit
“Dirty” RAW Data
Gathering Data
21
The Process
New Fields
1 3 4
Transformation
and Feature
Engineering
“Clean” Data
Calculated Fields
2
Cleaning Model
CleverTask Solutions SL - Big Data Business Unit 22
Data Cleaning
CleverTask Solutions SL - Big Data Business Unit 22
Data Cleaning
CleverTask Solutions SL - Big Data Business Unit 22
Data Cleaning
CleverTask Solutions SL - Big Data Business Unit 22
Data Cleaning
CleverTask Solutions SL - Big Data Business Unit 22
Data Cleaning
CleverTask Solutions SL - Big Data Business Unit 22
Data Cleaning
CleverTask Solutions SL - Big Data Business Unit 23
Data Cleaning
Row Deletion
• Reservations without
check-in
• Cancelled reservations
• Rows with errors
Column Deletion
• IDs vs names
• Columns with little data
Other Actions
• Give dates a format
• Delete accents
• Transform .xlsx -> .csv
CleverTask Solutions SL - Big Data Business Unit 24
Clean Dataset
Clean
•150.000 reservations
•46 fields
•26 categorical
•9 numeric
•10 data
•1 text
•Size: 75MB
Dirty
•260.000 reservations
•80 fields
•57 categorical
•9 numeric
•10 data
•3 text
•1 incorrect field
•Size: 150 MB
CleverTask Solutions SL - Big Data Business Unit
“Dirty” RAW Data
Gathering Data
25
The Process
New Fields
1 3 4
Transformations
and Feature
Engineering
“Clean” Data
Calculated Fields
2
Cleaning Model
CleverTask Solutions SL - Big Data Business Unit 26
Transformations
Country Grouping
•A lot of countries to predict
(210)
•Some countries have very few
instances
•Grouping objective: mín. 1% of
total instances
• Does not affect business
objective
•Total number of groups: 20
New Fields
• RESERV_ANTICIPATION (calculated):
(reservation date - checkin date)
• COUNTRY_HOTEL (name of the
country)
• HOTEL_STARS (1-5)
CleverTask Solutions SL - Big Data Business Unit 27
Clean Dataset
Clean
•150.000 reservations
•46 fields
•Size: 75MB
Dirty
•260.000 reservations
•80 fields
•Size: 150 MB
Transformed
•150.000 registers
•49 fields
•Size: 80MB
CleverTask Solutions SL - Big Data Business Unit 28
What is Feature Engineering
Extract signal from noise
CleverTask Solutions SL - Big Data Business Unit 29
Feature Engineering
Techniques
• Detecta fields (features) that are predictorss
(signal) and bypass those that are not (noise)
• Dependand fields (pax, days, pax*days)
• Needless fields (reservation number)
• Fields with very little data
• Random fields (minute and second of reservation)
• Domain knowledge
• Experience
• Recursive cycle
CleverTask Solutions SL - Big Data Business Unit 30
Field
Selection
Algorithm
Adjustment
Prediction
Quality
Evaluation
Recursive Feature
Engineering
CleverTask Solutions SL - Big Data Business Unit 31
Clean Dataset
Clean
•150.000 reservations
•46 fields
•Size: 75MB
Dirty
•260.000 reservations
•80 fields
•Size: 150 MB
Transformed
•150.000 registers
•49 fields
•Size: 80MB
Final Dataset
•150.000 registers
•10 fields
•Size: 55MB
CleverTask Solutions SL - Big Data Business Unit 32
Agenda
Business Need1
“Cooking” Predictions2
Gathering ingredients3
Cleaning and Transforming4
The recipe (the model)5
Tasting the dish6
CleverTask Solutions SL - Big Data Business Unit 33
The Process
“Dirty” RAW Data
New Fields
1 3 4
Gathering Data
Transformation
and Feature
Engineering
“Clean” Data
Calculated
2
Cleaning Modeling
CleverTask Solutions SL - Big Data Business Unit 34
Modeling
Training
Learning
CleverTask Solutions SL - Big Data Business Unit 35
Modeling
CleverTask Solutions SL - Big Data Business Unit 37
Agenda
Business Need1
“Cooking” Predictions2
Gathering ingredients3
Cleaning and Transforming4
The recipe (the model)5
Tasting the dish6
CleverTask Solutions SL - Big Data Business Unit 38
Quality Evaluation
80%
20% Evaluation
Training
Test
Dataset
100%
Modelo
CleverTask Solutions SL - Big Data Business Unit 39
Quality Evaluation
Accuracy Confusion Matrix
CleverTask Solutions SL - Big Data Business Unit 40
Quality Evaluation
54% 75%
CleverTask Solutions SL - Big Data Business Unit 41
Quality Evaluation
Predicted vs Real Distribution
CleverTask Solutions SL - Big Data Business Unit 42
Cooking Predictions
80%
20%
Tasting the Dish
Cooking
Transforming
Go to the market to buy ingredients
Cleaning
CleverTask Solutions SL - Big Data Business Unit 42
Cooking Predictions
80%
20%
Evaluating Prediction Quality
Training the Model
Transforming and Feature Engineering
Gathering RAW data
Cleaning Data
CleverTask Solutions SL - Big Data Business Unit 43
Other Techniques
Ensembles Clusters
Weight Analysis Anomaly Detection
CleverTask Solutions SL - Big Data Business Unit 44
END
email: andresg@clevertask.com
Twitter: @data_lytics
www.clevertask.com

Más contenido relacionado

Destacado

Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
Lior Rokach
 

Destacado (10)

Rebuilding Journalism: Winning the battle for attention
Rebuilding Journalism: Winning the battle for attentionRebuilding Journalism: Winning the battle for attention
Rebuilding Journalism: Winning the battle for attention
 
Tech Talk: Analytics at CA – What’s Cooking? Project Jarvis
Tech Talk: Analytics at CA – What’s Cooking? Project JarvisTech Talk: Analytics at CA – What’s Cooking? Project Jarvis
Tech Talk: Analytics at CA – What’s Cooking? Project Jarvis
 
Healthcare Transformation through IOT
Healthcare Transformation through IOTHealthcare Transformation through IOT
Healthcare Transformation through IOT
 
Cooking up the Semantic Web
Cooking up the Semantic WebCooking up the Semantic Web
Cooking up the Semantic Web
 
MusicMood - Machine Learning in Automatic Music Mood Prediction Based on Song...
MusicMood - Machine Learning in Automatic Music Mood Prediction Based on Song...MusicMood - Machine Learning in Automatic Music Mood Prediction Based on Song...
MusicMood - Machine Learning in Automatic Music Mood Prediction Based on Song...
 
Analytics and Data Mining Industry Overview
Analytics and Data Mining Industry OverviewAnalytics and Data Mining Industry Overview
Analytics and Data Mining Industry Overview
 
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
 
Introduction to Data Mining and Big Data Analytics
Introduction to Data Mining and Big Data AnalyticsIntroduction to Data Mining and Big Data Analytics
Introduction to Data Mining and Big Data Analytics
 
Big Data & Analytics Architecture
Big Data & Analytics ArchitectureBig Data & Analytics Architecture
Big Data & Analytics Architecture
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 

Similar a L9. Real World Machine Learning - Cooking Predictions

Essential Prerequisites for Maximizing Success from Big Data
Essential Prerequisites for Maximizing Success from Big DataEssential Prerequisites for Maximizing Success from Big Data
Essential Prerequisites for Maximizing Success from Big Data
Society of Petroleum Engineers
 
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
confluent
 

Similar a L9. Real World Machine Learning - Cooking Predictions (20)

[161] 데이터사이언스팀 빌딩
[161] 데이터사이언스팀 빌딩[161] 데이터사이언스팀 빌딩
[161] 데이터사이언스팀 빌딩
 
Solving Performance Problems on Hadoop
Solving Performance Problems on HadoopSolving Performance Problems on Hadoop
Solving Performance Problems on Hadoop
 
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with TableauWebinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
 
Data Modeling IoT and Time Series data in NoSQL
Data Modeling IoT and Time Series data in NoSQLData Modeling IoT and Time Series data in NoSQL
Data Modeling IoT and Time Series data in NoSQL
 
Slides: Success Stories for Data-to-Cloud
Slides: Success Stories for Data-to-CloudSlides: Success Stories for Data-to-Cloud
Slides: Success Stories for Data-to-Cloud
 
MLSEV. Use Case: The Data-Driven Factory
MLSEV. Use Case: The Data-Driven FactoryMLSEV. Use Case: The Data-Driven Factory
MLSEV. Use Case: The Data-Driven Factory
 
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsDutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
 
Digital Transformation Western Digital
Digital Transformation Western DigitalDigital Transformation Western Digital
Digital Transformation Western Digital
 
Essential Prerequisites for Maximizing Success from Big Data
Essential Prerequisites for Maximizing Success from Big DataEssential Prerequisites for Maximizing Success from Big Data
Essential Prerequisites for Maximizing Success from Big Data
 
Data Warehousing 2016
Data Warehousing 2016Data Warehousing 2016
Data Warehousing 2016
 
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
 
Getting It Right Exactly Once: Principles for Streaming Architectures
Getting It Right Exactly Once: Principles for Streaming ArchitecturesGetting It Right Exactly Once: Principles for Streaming Architectures
Getting It Right Exactly Once: Principles for Streaming Architectures
 
Supercharging Smart Meter BIG DATA Analytics with Microsoft Azure Cloud- SRP ...
Supercharging Smart Meter BIG DATA Analytics with Microsoft Azure Cloud- SRP ...Supercharging Smart Meter BIG DATA Analytics with Microsoft Azure Cloud- SRP ...
Supercharging Smart Meter BIG DATA Analytics with Microsoft Azure Cloud- SRP ...
 
Time Series Analytics for Big Fast Data
Time Series Analytics for Big Fast DataTime Series Analytics for Big Fast Data
Time Series Analytics for Big Fast Data
 
How komatsu is driving operational efficiencies using io t and machine learni...
How komatsu is driving operational efficiencies using io t and machine learni...How komatsu is driving operational efficiencies using io t and machine learni...
How komatsu is driving operational efficiencies using io t and machine learni...
 
Engineering data quality
Engineering data qualityEngineering data quality
Engineering data quality
 
Western Digital Digitalization Story
Western Digital Digitalization StoryWestern Digital Digitalization Story
Western Digital Digitalization Story
 
VSSML18. Improving Operations with Machine Learning
VSSML18. Improving Operations with Machine LearningVSSML18. Improving Operations with Machine Learning
VSSML18. Improving Operations with Machine Learning
 
Big Data Analytics with MariaDB ColumnStore
Big Data Analytics with MariaDB ColumnStoreBig Data Analytics with MariaDB ColumnStore
Big Data Analytics with MariaDB ColumnStore
 

Más de Machine Learning Valencia

Más de Machine Learning Valencia (15)

From Turing To Humanoid Robots - Ramón López de Mántaras
From Turing To Humanoid Robots - Ramón López de MántarasFrom Turing To Humanoid Robots - Ramón López de Mántaras
From Turing To Humanoid Robots - Ramón López de Mántaras
 
Artificial Intelligence Progress - Tom Dietterich
Artificial Intelligence Progress - Tom DietterichArtificial Intelligence Progress - Tom Dietterich
Artificial Intelligence Progress - Tom Dietterich
 
LR2. Summary Day 2
LR2. Summary Day 2LR2. Summary Day 2
LR2. Summary Day 2
 
L15. Machine Learning - Black Art
L15. Machine Learning - Black ArtL15. Machine Learning - Black Art
L15. Machine Learning - Black Art
 
L14. Anomaly Detection
L14. Anomaly DetectionL14. Anomaly Detection
L14. Anomaly Detection
 
L13. Cluster Analysis
L13. Cluster AnalysisL13. Cluster Analysis
L13. Cluster Analysis
 
L11. The Future of Machine Learning
L11. The Future of Machine LearningL11. The Future of Machine Learning
L11. The Future of Machine Learning
 
L7. A developers’ overview of the world of predictive APIs
L7. A developers’ overview of the world of predictive APIsL7. A developers’ overview of the world of predictive APIs
L7. A developers’ overview of the world of predictive APIs
 
LR1. Summary Day 1
LR1. Summary Day 1LR1. Summary Day 1
LR1. Summary Day 1
 
L6. Unbalanced Datasets
L6. Unbalanced DatasetsL6. Unbalanced Datasets
L6. Unbalanced Datasets
 
L5. Data Transformation and Feature Engineering
L5. Data Transformation and Feature EngineeringL5. Data Transformation and Feature Engineering
L5. Data Transformation and Feature Engineering
 
L4. Ensembles of Decision Trees
L4. Ensembles of Decision TreesL4. Ensembles of Decision Trees
L4. Ensembles of Decision Trees
 
L3. Decision Trees
L3. Decision TreesL3. Decision Trees
L3. Decision Trees
 
L2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms IL2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms I
 
L1. State of the Art in Machine Learning
L1. State of the Art in Machine LearningL1. State of the Art in Machine Learning
L1. State of the Art in Machine Learning
 

Último

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Último (20)

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 

L9. Real World Machine Learning - Cooking Predictions

  • 1. Cooking Predictions A real case in the hotel sector Andrés González Big Data Prediction Manager andresg@clevertask.com Twitter: @data_lytics
  • 2. CleverTask Solutions SL - Big Data Business Unit 3 Agenda Business Need1 “Cooking” Predictions2 Gathering ingredients3 Cleaning and Transforming4 The recipe (the model)5 Tasting the dish6
  • 3. CleverTask Solutions SL - Big Data Business Unit 4 Hotel Sector • % room occupation. • Cancellation risk. • Income.
  • 4. CleverTask Solutions SL - Big Data Business Unit 5 Business Need Predict client’s NATIONALITY BEFORE client check-in
  • 5. CleverTask Solutions SL - Big Data Business Unit 6 Staff Arrangement Languages
  • 6. CleverTask Solutions SL - Big Data Business Unit 7 Prepare Activities
  • 7. CleverTask Solutions SL - Big Data Business Unit 8 Kitchen Arrangement
  • 8. CleverTask Solutions SL - Big Data Business Unit 9 Customize Stay
  • 9. CleverTask Solutions SL - Big Data Business Unit 10 … Details Make the Difference In short, because…
  • 10. CleverTask Solutions SL - Big Data Business Unit 11 Machine Learning basics
  • 11. CleverTask Solutions SL - Big Data Business Unit 12 Machine Learning basics Can you find patterns in this data?
  • 12. CleverTask Solutions SL - Big Data Business Unit 13 Machine Learning basics Historical Data Training Prediction New Data Re-Training
  • 13. CleverTask Solutions SL - Big Data Business Unit 14 Agenda Business Need1 “Cooking” Predictions2 Gathering ingredients3 Cleaning and Transforming4 The recipe (the model)5 Tasting the dish6
  • 14. CleverTask Solutions SL - Big Data Business Unit Tasting the Dish Cooking Transforming 15 “Cooking” Predictions2 Go to the market to buy ingredients Cleaning
  • 15. CleverTask Solutions SL - Big Data Business Unit Evaluating Prediction Quality Training the Model Transforming and Feature Engineering 15 “Cooking” Predictions2 Gathering RAW data Cleaning Data
  • 16. CleverTask Solutions SL - Big Data Business Unit 16 Agenda Business Need1 “Cooking” Predictions2 Gathering ingredients3 Cleaning and Transforming4 The recipe (the model)5 Tasting the dish6
  • 17. CleverTask Solutions SL - Big Data Business Unit 17 Where does Data come from? Own Website Partners Websites RAW Data
  • 18. CleverTask Solutions SL - Big Data Business Unit 18 RAW Data One year historical reservation data (.xlsx file) Characteristics •260.000 reservations •80 fields •57 categorical •9 numeric •10 date •3 text •1 incorrect field •Size: 150 MB
  • 19. CleverTask Solutions SL - Big Data Business Unit 19 RAW Data
  • 20. CleverTask Solutions SL - Big Data Business Unit 20 Agenda Business Need1 “Cooking” Predictions2 Gathering ingredients3 Cleaning and Transforming4 The recipe (the model)5 Tasting the dish6
  • 21. CleverTask Solutions SL - Big Data Business Unit “Dirty” RAW Data Gathering Data 21 The Process New Fields 1 3 4 Transformation and Feature Engineering “Clean” Data Calculated Fields 2 Cleaning Model
  • 22. CleverTask Solutions SL - Big Data Business Unit 22 Data Cleaning
  • 23. CleverTask Solutions SL - Big Data Business Unit 22 Data Cleaning
  • 24. CleverTask Solutions SL - Big Data Business Unit 22 Data Cleaning
  • 25. CleverTask Solutions SL - Big Data Business Unit 22 Data Cleaning
  • 26. CleverTask Solutions SL - Big Data Business Unit 22 Data Cleaning
  • 27. CleverTask Solutions SL - Big Data Business Unit 22 Data Cleaning
  • 28. CleverTask Solutions SL - Big Data Business Unit 23 Data Cleaning Row Deletion • Reservations without check-in • Cancelled reservations • Rows with errors Column Deletion • IDs vs names • Columns with little data Other Actions • Give dates a format • Delete accents • Transform .xlsx -> .csv
  • 29. CleverTask Solutions SL - Big Data Business Unit 24 Clean Dataset Clean •150.000 reservations •46 fields •26 categorical •9 numeric •10 data •1 text •Size: 75MB Dirty •260.000 reservations •80 fields •57 categorical •9 numeric •10 data •3 text •1 incorrect field •Size: 150 MB
  • 30. CleverTask Solutions SL - Big Data Business Unit “Dirty” RAW Data Gathering Data 25 The Process New Fields 1 3 4 Transformations and Feature Engineering “Clean” Data Calculated Fields 2 Cleaning Model
  • 31. CleverTask Solutions SL - Big Data Business Unit 26 Transformations Country Grouping •A lot of countries to predict (210) •Some countries have very few instances •Grouping objective: mín. 1% of total instances • Does not affect business objective •Total number of groups: 20 New Fields • RESERV_ANTICIPATION (calculated): (reservation date - checkin date) • COUNTRY_HOTEL (name of the country) • HOTEL_STARS (1-5)
  • 32. CleverTask Solutions SL - Big Data Business Unit 27 Clean Dataset Clean •150.000 reservations •46 fields •Size: 75MB Dirty •260.000 reservations •80 fields •Size: 150 MB Transformed •150.000 registers •49 fields •Size: 80MB
  • 33. CleverTask Solutions SL - Big Data Business Unit 28 What is Feature Engineering Extract signal from noise
  • 34. CleverTask Solutions SL - Big Data Business Unit 29 Feature Engineering Techniques • Detecta fields (features) that are predictorss (signal) and bypass those that are not (noise) • Dependand fields (pax, days, pax*days) • Needless fields (reservation number) • Fields with very little data • Random fields (minute and second of reservation) • Domain knowledge • Experience • Recursive cycle
  • 35. CleverTask Solutions SL - Big Data Business Unit 30 Field Selection Algorithm Adjustment Prediction Quality Evaluation Recursive Feature Engineering
  • 36. CleverTask Solutions SL - Big Data Business Unit 31 Clean Dataset Clean •150.000 reservations •46 fields •Size: 75MB Dirty •260.000 reservations •80 fields •Size: 150 MB Transformed •150.000 registers •49 fields •Size: 80MB Final Dataset •150.000 registers •10 fields •Size: 55MB
  • 37. CleverTask Solutions SL - Big Data Business Unit 32 Agenda Business Need1 “Cooking” Predictions2 Gathering ingredients3 Cleaning and Transforming4 The recipe (the model)5 Tasting the dish6
  • 38. CleverTask Solutions SL - Big Data Business Unit 33 The Process “Dirty” RAW Data New Fields 1 3 4 Gathering Data Transformation and Feature Engineering “Clean” Data Calculated 2 Cleaning Modeling
  • 39. CleverTask Solutions SL - Big Data Business Unit 34 Modeling Training Learning
  • 40. CleverTask Solutions SL - Big Data Business Unit 35 Modeling
  • 41. CleverTask Solutions SL - Big Data Business Unit 37 Agenda Business Need1 “Cooking” Predictions2 Gathering ingredients3 Cleaning and Transforming4 The recipe (the model)5 Tasting the dish6
  • 42. CleverTask Solutions SL - Big Data Business Unit 38 Quality Evaluation 80% 20% Evaluation Training Test Dataset 100% Modelo
  • 43. CleverTask Solutions SL - Big Data Business Unit 39 Quality Evaluation Accuracy Confusion Matrix
  • 44. CleverTask Solutions SL - Big Data Business Unit 40 Quality Evaluation 54% 75%
  • 45. CleverTask Solutions SL - Big Data Business Unit 41 Quality Evaluation Predicted vs Real Distribution
  • 46. CleverTask Solutions SL - Big Data Business Unit 42 Cooking Predictions 80% 20% Tasting the Dish Cooking Transforming Go to the market to buy ingredients Cleaning
  • 47. CleverTask Solutions SL - Big Data Business Unit 42 Cooking Predictions 80% 20% Evaluating Prediction Quality Training the Model Transforming and Feature Engineering Gathering RAW data Cleaning Data
  • 48. CleverTask Solutions SL - Big Data Business Unit 43 Other Techniques Ensembles Clusters Weight Analysis Anomaly Detection
  • 49. CleverTask Solutions SL - Big Data Business Unit 44 END email: andresg@clevertask.com Twitter: @data_lytics www.clevertask.com