SlideShare una empresa de Scribd logo
1 de 22
ETA
Prediction
Challenge
@DataHack
A taxi goes from
Chinatown to Times
Square. How long will
it take to arrive?
Taxi challenge by @Final
In this challenge, you are given data
on taxi rides in New York, containing
information on each ride such as the
start and end points, date, time of
day, distance, etc...
Our purpose is to predict the travel
time (in logarithmic scale) of a ride.
The data is split to train and test sets,
and we can use both general data of
the ride with local data on similar
rides from the train set.
Data : Goal :
Ride Information - Given Dataset
● From / To coordinates (lon, lat)
● Departure timestamp
● Trip distance (road distance)
● Vendor - Taxi company (Found to be not important)
● Passenger count (Found to be not important)
Data Wizard, expert in big
data processing and
production ready ML.
Googler, Wazer and traffic
analytics expert.
Data Ninja, a pure
professional in every data
spect from gathering and
exploring to modeling.
Kaggle Master
An innovative ML expert
and programmer, expert in
feature engineering and
selection techniques.
Kaggle Master
The Team
A group of talented and creative world class professionals in ML and Traffic Analytics
Nir Malbin Gad Benram Seffi Cohen
CDS (Chief Data Scientist)
for the Israeli Defense
Forces, and pioneer in ML
ensemble techniques.
Kaggle Master
Daniel Marcous
How it works
Dataset
Train
(Train model on)
Test
(Make prediction on)
Public score
(30%)
Private score
(70%)
Reminders
The Metric
Mean Square Error (log values) / Variance (constant)
sum((y-y')^2) / sum((y-avg(y))^2)
Notes :
● Interesting part :
𝚺((real-pred)^2) ~ Least Squares
● Log values
○ Mistake weight goes down for longer rides
○ Mistake is determined by error percentile -
10 minutes mistake on a 20 minutes ride
matters the same as
20 minutes mistake on a 40 minutes ride.
{log(a)-log(b)=log(a/b)}
Predicting ETA Using :
1. Ride Information
2. Environment
3. Geography
4. Inferred States
Machine
Learning
Data Shortage
● We Don’t Have
○ Historical speeds
○ Real Time speeds
● Box coordinates to NYC (remove 0.0 etc.)
● Remove very long / far rides (>2h/65km)
● Remove unreasonable speed / time-distance ratio
○ Remove 5% anomalies (Top & Bottom)
Data Cleaning
New feature - Abnormal Ride
Feature Engineering
Datetime based features
● Month start / end
● Day / Day of week / Hour / 15 Minute interval
● Is weekend / business day
● Is work hour (09:00-17:00)
● Is rush hour (morning / afternoon)
● Is holiday
● Coordinate Transformations (Coding directionality)
○ PCA 2 2
○ RBF
● Spheric (geo) distance
● Distance percentile
● Spheric-Trip distance ratio
Location based features
City based features
● NYC Neighbourhood (pair crossing)
● Distance to points of interest (100X2)
○ Schools / Hospitals / Parks etc.
PCA 2
Weather based features
● Temperature
● Events - Rain / Snow etc.
● Humidity
● Wind
● Visibility
● Min / Max / Avg / std etc.
PCA 2
Inferred Traffic based
features
PCA 1
● Assumption :
our data is a representative sample
of the NYC’s - “driving population”
● Crowdedness
○ #rides in X radius
■ 100 / 500 / 1500 / 5000
■ Euclidean / Manhattan
News based
features
1. Crawling NYTimes
2. Topic Modeling
3. Finding topics correlated with ETA
4. Using top10 correlated topics as
features
a. Number of articles on a day for
every topic
Results
Caveats
● Timeseries future mixing
● Not exactly a metric that might be the most important
○ Same weight for positive / negative error
● Crowdedness - assumes that data is a representative sample of the total
car population
○ E.g. 2 times the samples equals 2 times the traffic
● Variance - taken from original validation dataset (constant)
Public Leader Board
Team Score
Team DCountdown 0.150041
Squanchers 0.160468
Noa's stars 0.165869
Aperture Science 0.167958
R-North 0.175308
TAU Deep Learning Lab 0.182602
Mr Terminal 0.193593
MTG 0.282009
SuperFish 0.302637
Summary
Machine Learning Approach
● Black Box (~ish)
● Harder to deploy
● Retrain when system changes
Advantages : Disadvantages :
● No manual tuning
● No complex heuristics
● Optimise different metrics
● Personalisation
● Taking different ideas and their
interactions into account as one

Más contenido relacionado

La actualidad más candente

[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object DetectionTaegyun Jeon
 
[COMPAS] 고양시 공공자전거 분석과제(최우수상)
[COMPAS] 고양시 공공자전거 분석과제(최우수상)[COMPAS] 고양시 공공자전거 분석과제(최우수상)
[COMPAS] 고양시 공공자전거 분석과제(최우수상)Joonho Lee
 
Interpretable machine learning
Interpretable machine learningInterpretable machine learning
Interpretable machine learningSri Ambati
 
Intro to Machine Learning & AI
Intro to Machine Learning & AIIntro to Machine Learning & AI
Intro to Machine Learning & AIMostafa Elsheikh
 
Landscape of AI/ML in 2023
Landscape of AI/ML in 2023Landscape of AI/ML in 2023
Landscape of AI/ML in 2023HyunJoon Jung
 
Multi-armed Bandits
Multi-armed BanditsMulti-armed Bandits
Multi-armed BanditsDongmin Lee
 
Anomaly Detection using Deep Auto-Encoders
Anomaly Detection using Deep Auto-EncodersAnomaly Detection using Deep Auto-Encoders
Anomaly Detection using Deep Auto-EncodersGianmario Spacagna
 
Introduction To Machine Learning | Edureka
Introduction To Machine Learning | EdurekaIntroduction To Machine Learning | Edureka
Introduction To Machine Learning | EdurekaEdureka!
 
Customizing LLMs
Customizing LLMsCustomizing LLMs
Customizing LLMsJim Steele
 
Scala Data Pipelines @ Spotify
Scala Data Pipelines @ SpotifyScala Data Pipelines @ Spotify
Scala Data Pipelines @ SpotifyNeville Li
 
Graph Neural Networks for Recommendations
Graph Neural Networks for RecommendationsGraph Neural Networks for Recommendations
Graph Neural Networks for RecommendationsWQ Fan
 
Object detection with Tensorflow Api
Object detection with Tensorflow ApiObject detection with Tensorflow Api
Object detection with Tensorflow ApiArwinKhan1
 
Tutorial: Context In Recommender Systems
Tutorial: Context In Recommender SystemsTutorial: Context In Recommender Systems
Tutorial: Context In Recommender SystemsYONG ZHENG
 
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15MLconf
 
Session-based recommendations with recurrent neural networks
Session-based recommendations with recurrent neural networksSession-based recommendations with recurrent neural networks
Session-based recommendations with recurrent neural networksZimin Park
 
Trends_of_MLOps_tech_in_business
Trends_of_MLOps_tech_in_businessTrends_of_MLOps_tech_in_business
Trends_of_MLOps_tech_in_businessSANG WON PARK
 
Presentation on Sentiment Analysis
Presentation on Sentiment AnalysisPresentation on Sentiment Analysis
Presentation on Sentiment AnalysisRebecca Williams
 
Movies Recommendation System
Movies Recommendation SystemMovies Recommendation System
Movies Recommendation SystemShubham Patil
 

La actualidad más candente (20)

[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
 
[COMPAS] 고양시 공공자전거 분석과제(최우수상)
[COMPAS] 고양시 공공자전거 분석과제(최우수상)[COMPAS] 고양시 공공자전거 분석과제(최우수상)
[COMPAS] 고양시 공공자전거 분석과제(최우수상)
 
Interpretable machine learning
Interpretable machine learningInterpretable machine learning
Interpretable machine learning
 
Intro to Machine Learning & AI
Intro to Machine Learning & AIIntro to Machine Learning & AI
Intro to Machine Learning & AI
 
Landscape of AI/ML in 2023
Landscape of AI/ML in 2023Landscape of AI/ML in 2023
Landscape of AI/ML in 2023
 
Multi-armed Bandits
Multi-armed BanditsMulti-armed Bandits
Multi-armed Bandits
 
Anomaly Detection using Deep Auto-Encoders
Anomaly Detection using Deep Auto-EncodersAnomaly Detection using Deep Auto-Encoders
Anomaly Detection using Deep Auto-Encoders
 
Introduction To Machine Learning | Edureka
Introduction To Machine Learning | EdurekaIntroduction To Machine Learning | Edureka
Introduction To Machine Learning | Edureka
 
Customizing LLMs
Customizing LLMsCustomizing LLMs
Customizing LLMs
 
Scala Data Pipelines @ Spotify
Scala Data Pipelines @ SpotifyScala Data Pipelines @ Spotify
Scala Data Pipelines @ Spotify
 
Graph Neural Networks for Recommendations
Graph Neural Networks for RecommendationsGraph Neural Networks for Recommendations
Graph Neural Networks for Recommendations
 
Object detection with Tensorflow Api
Object detection with Tensorflow ApiObject detection with Tensorflow Api
Object detection with Tensorflow Api
 
Tutorial: Context In Recommender Systems
Tutorial: Context In Recommender SystemsTutorial: Context In Recommender Systems
Tutorial: Context In Recommender Systems
 
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
 
WEB Scraping.pptx
WEB Scraping.pptxWEB Scraping.pptx
WEB Scraping.pptx
 
Session-based recommendations with recurrent neural networks
Session-based recommendations with recurrent neural networksSession-based recommendations with recurrent neural networks
Session-based recommendations with recurrent neural networks
 
YOLO
YOLOYOLO
YOLO
 
Trends_of_MLOps_tech_in_business
Trends_of_MLOps_tech_in_businessTrends_of_MLOps_tech_in_business
Trends_of_MLOps_tech_in_business
 
Presentation on Sentiment Analysis
Presentation on Sentiment AnalysisPresentation on Sentiment Analysis
Presentation on Sentiment Analysis
 
Movies Recommendation System
Movies Recommendation SystemMovies Recommendation System
Movies Recommendation System
 

Similar a Prediction of taxi rides ETA

Kharita: Robust Road Map Inference Through Network Alignment of Trajectories
Kharita: Robust Road Map Inference Through Network Alignment of TrajectoriesKharita: Robust Road Map Inference Through Network Alignment of Trajectories
Kharita: Robust Road Map Inference Through Network Alignment of Trajectoriesvipyoung
 
Webinar: Using smart card and GPS data for policy and planning: the case of T...
Webinar: Using smart card and GPS data for policy and planning: the case of T...Webinar: Using smart card and GPS data for policy and planning: the case of T...
Webinar: Using smart card and GPS data for policy and planning: the case of T...BRTCoE
 
Case Studies in Managing Traffic in a Developing Country with Privacy-Preserv...
Case Studies in Managing Traffic in a Developing Country with Privacy-Preserv...Case Studies in Managing Traffic in a Developing Country with Privacy-Preserv...
Case Studies in Managing Traffic in a Developing Country with Privacy-Preserv...Biplav Srivastava
 
Theme 3b Users perspective of integrated transit systems
Theme 3b Users perspective of integrated transit systemsTheme 3b Users perspective of integrated transit systems
Theme 3b Users perspective of integrated transit systemsBRTCoE
 
Hybrid Ant Colony Optimization for Real-World Delivery Problems Based on Real...
Hybrid Ant Colony Optimization for Real-World Delivery Problems Based on Real...Hybrid Ant Colony Optimization for Real-World Delivery Problems Based on Real...
Hybrid Ant Colony Optimization for Real-World Delivery Problems Based on Real...csandit
 
Crunching Gigabytes Locally
Crunching Gigabytes LocallyCrunching Gigabytes Locally
Crunching Gigabytes LocallyDima Korolev
 
Machine Learning Approach to Report Prioritization with an ...
Machine Learning Approach to Report Prioritization with an ...Machine Learning Approach to Report Prioritization with an ...
Machine Learning Approach to Report Prioritization with an ...butest
 
Cab travel time prediction using ensemble models
Cab travel time prediction using ensemble modelsCab travel time prediction using ensemble models
Cab travel time prediction using ensemble modelsAyan Sengupta
 
Insight_Project_Presentation
Insight_Project_PresentationInsight_Project_Presentation
Insight_Project_Presentationdforthomme
 
3rd Conference on Sustainable Urban Mobility
3rd Conference on Sustainable Urban Mobility3rd Conference on Sustainable Urban Mobility
3rd Conference on Sustainable Urban MobilityLIFE GreenYourMove
 
Chapter 3&4
Chapter 3&4Chapter 3&4
Chapter 3&4EWIT
 
Supply chain logistics : vehicle routing and scheduling
Supply chain logistics : vehicle  routing and  schedulingSupply chain logistics : vehicle  routing and  scheduling
Supply chain logistics : vehicle routing and schedulingRetigence Technologies
 
Spark Summit EU talk by Javier Aguedes
Spark Summit EU talk by Javier AguedesSpark Summit EU talk by Javier Aguedes
Spark Summit EU talk by Javier AguedesSpark Summit
 
IntelliLight: A Reinforcement Learning Approach for Intelligent Traffic Light...
IntelliLight: A Reinforcement Learning Approach for Intelligent Traffic Light...IntelliLight: A Reinforcement Learning Approach for Intelligent Traffic Light...
IntelliLight: A Reinforcement Learning Approach for Intelligent Traffic Light...Yamato OKAMOTO
 
Replacing Manhattan Subway Service with On-demand transportation
Replacing Manhattan Subway Service with On-demand transportationReplacing Manhattan Subway Service with On-demand transportation
Replacing Manhattan Subway Service with On-demand transportationChristian Moscardi
 
Christian Moscardi Presentation
Christian Moscardi PresentationChristian Moscardi Presentation
Christian Moscardi PresentationJoseph Chow
 
KTH-Texxi Project 2010
KTH-Texxi Project 2010KTH-Texxi Project 2010
KTH-Texxi Project 2010Texxi Global
 
Participatory Project
Participatory ProjectParticipatory Project
Participatory Project#Xiao Zhe#
 

Similar a Prediction of taxi rides ETA (20)

Kharita: Robust Road Map Inference Through Network Alignment of Trajectories
Kharita: Robust Road Map Inference Through Network Alignment of TrajectoriesKharita: Robust Road Map Inference Through Network Alignment of Trajectories
Kharita: Robust Road Map Inference Through Network Alignment of Trajectories
 
Webinar: Using smart card and GPS data for policy and planning: the case of T...
Webinar: Using smart card and GPS data for policy and planning: the case of T...Webinar: Using smart card and GPS data for policy and planning: the case of T...
Webinar: Using smart card and GPS data for policy and planning: the case of T...
 
Case Studies in Managing Traffic in a Developing Country with Privacy-Preserv...
Case Studies in Managing Traffic in a Developing Country with Privacy-Preserv...Case Studies in Managing Traffic in a Developing Country with Privacy-Preserv...
Case Studies in Managing Traffic in a Developing Country with Privacy-Preserv...
 
Analysing road traffic
Analysing road trafficAnalysing road traffic
Analysing road traffic
 
Theme 3b Users perspective of integrated transit systems
Theme 3b Users perspective of integrated transit systemsTheme 3b Users perspective of integrated transit systems
Theme 3b Users perspective of integrated transit systems
 
Hybrid Ant Colony Optimization for Real-World Delivery Problems Based on Real...
Hybrid Ant Colony Optimization for Real-World Delivery Problems Based on Real...Hybrid Ant Colony Optimization for Real-World Delivery Problems Based on Real...
Hybrid Ant Colony Optimization for Real-World Delivery Problems Based on Real...
 
Crunching Gigabytes Locally
Crunching Gigabytes LocallyCrunching Gigabytes Locally
Crunching Gigabytes Locally
 
Machine Learning Approach to Report Prioritization with an ...
Machine Learning Approach to Report Prioritization with an ...Machine Learning Approach to Report Prioritization with an ...
Machine Learning Approach to Report Prioritization with an ...
 
Cab travel time prediction using ensemble models
Cab travel time prediction using ensemble modelsCab travel time prediction using ensemble models
Cab travel time prediction using ensemble models
 
Insight_Project_Presentation
Insight_Project_PresentationInsight_Project_Presentation
Insight_Project_Presentation
 
3rd Conference on Sustainable Urban Mobility
3rd Conference on Sustainable Urban Mobility3rd Conference on Sustainable Urban Mobility
3rd Conference on Sustainable Urban Mobility
 
Chapter 3&4
Chapter 3&4Chapter 3&4
Chapter 3&4
 
Supply chain logistics : vehicle routing and scheduling
Supply chain logistics : vehicle  routing and  schedulingSupply chain logistics : vehicle  routing and  scheduling
Supply chain logistics : vehicle routing and scheduling
 
Spark Summit EU talk by Javier Aguedes
Spark Summit EU talk by Javier AguedesSpark Summit EU talk by Javier Aguedes
Spark Summit EU talk by Javier Aguedes
 
IV2021-431-slides.pdf
IV2021-431-slides.pdfIV2021-431-slides.pdf
IV2021-431-slides.pdf
 
IntelliLight: A Reinforcement Learning Approach for Intelligent Traffic Light...
IntelliLight: A Reinforcement Learning Approach for Intelligent Traffic Light...IntelliLight: A Reinforcement Learning Approach for Intelligent Traffic Light...
IntelliLight: A Reinforcement Learning Approach for Intelligent Traffic Light...
 
Replacing Manhattan Subway Service with On-demand transportation
Replacing Manhattan Subway Service with On-demand transportationReplacing Manhattan Subway Service with On-demand transportation
Replacing Manhattan Subway Service with On-demand transportation
 
Christian Moscardi Presentation
Christian Moscardi PresentationChristian Moscardi Presentation
Christian Moscardi Presentation
 
KTH-Texxi Project 2010
KTH-Texxi Project 2010KTH-Texxi Project 2010
KTH-Texxi Project 2010
 
Participatory Project
Participatory ProjectParticipatory Project
Participatory Project
 

Más de Daniel Marcous

Cloud AI Platform Notebooks - Kaggle IL
Cloud AI Platform Notebooks - Kaggle ILCloud AI Platform Notebooks - Kaggle IL
Cloud AI Platform Notebooks - Kaggle ILDaniel Marcous
 
Towards Smart Transportation DSS 2018
Towards Smart Transportation DSS 2018Towards Smart Transportation DSS 2018
Towards Smart Transportation DSS 2018Daniel Marcous
 
Distributed Databases - Concepts & Architectures
Distributed Databases - Concepts & ArchitecturesDistributed Databases - Concepts & Architectures
Distributed Databases - Concepts & ArchitecturesDaniel Marcous
 
Distributed K-Betweenness (Spark)
Distributed K-Betweenness (Spark)Distributed K-Betweenness (Spark)
Distributed K-Betweenness (Spark)Daniel Marcous
 
Production-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroProduction-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroDaniel Marcous
 
Big Data - Big Insights - Waze @Google
Big Data - Big Insights - Waze @GoogleBig Data - Big Insights - Waze @Google
Big Data - Big Insights - Waze @GoogleDaniel Marcous
 
Big data real time architectures
Big data real time architecturesBig data real time architectures
Big data real time architecturesDaniel Marcous
 

Más de Daniel Marcous (10)

Cloud AI Platform Notebooks - Kaggle IL
Cloud AI Platform Notebooks - Kaggle ILCloud AI Platform Notebooks - Kaggle IL
Cloud AI Platform Notebooks - Kaggle IL
 
S2
S2S2
S2
 
Towards Smart Transportation DSS 2018
Towards Smart Transportation DSS 2018Towards Smart Transportation DSS 2018
Towards Smart Transportation DSS 2018
 
Distributed Databases - Concepts & Architectures
Distributed Databases - Concepts & ArchitecturesDistributed Databases - Concepts & Architectures
Distributed Databases - Concepts & Architectures
 
Distributed K-Betweenness (Spark)
Distributed K-Betweenness (Spark)Distributed K-Betweenness (Spark)
Distributed K-Betweenness (Spark)
 
Production-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroProduction-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to hero
 
Big Data - Big Insights - Waze @Google
Big Data - Big Insights - Waze @GoogleBig Data - Big Insights - Waze @Google
Big Data - Big Insights - Waze @Google
 
Big data real time architectures
Big data real time architecturesBig data real time architectures
Big data real time architectures
 
Data Visualisation
Data VisualisationData Visualisation
Data Visualisation
 
Geo data analytics
Geo data analyticsGeo data analytics
Geo data analytics
 

Último

VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 

Último (20)

VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 

Prediction of taxi rides ETA

  • 2. A taxi goes from Chinatown to Times Square. How long will it take to arrive?
  • 3. Taxi challenge by @Final In this challenge, you are given data on taxi rides in New York, containing information on each ride such as the start and end points, date, time of day, distance, etc... Our purpose is to predict the travel time (in logarithmic scale) of a ride. The data is split to train and test sets, and we can use both general data of the ride with local data on similar rides from the train set. Data : Goal :
  • 4. Ride Information - Given Dataset ● From / To coordinates (lon, lat) ● Departure timestamp ● Trip distance (road distance) ● Vendor - Taxi company (Found to be not important) ● Passenger count (Found to be not important)
  • 5. Data Wizard, expert in big data processing and production ready ML. Googler, Wazer and traffic analytics expert. Data Ninja, a pure professional in every data spect from gathering and exploring to modeling. Kaggle Master An innovative ML expert and programmer, expert in feature engineering and selection techniques. Kaggle Master The Team A group of talented and creative world class professionals in ML and Traffic Analytics Nir Malbin Gad Benram Seffi Cohen CDS (Chief Data Scientist) for the Israeli Defense Forces, and pioneer in ML ensemble techniques. Kaggle Master Daniel Marcous
  • 6. How it works Dataset Train (Train model on) Test (Make prediction on) Public score (30%) Private score (70%)
  • 7. Reminders The Metric Mean Square Error (log values) / Variance (constant) sum((y-y')^2) / sum((y-avg(y))^2) Notes : ● Interesting part : 𝚺((real-pred)^2) ~ Least Squares ● Log values ○ Mistake weight goes down for longer rides ○ Mistake is determined by error percentile - 10 minutes mistake on a 20 minutes ride matters the same as 20 minutes mistake on a 40 minutes ride. {log(a)-log(b)=log(a/b)}
  • 8. Predicting ETA Using : 1. Ride Information 2. Environment 3. Geography 4. Inferred States Machine Learning
  • 9. Data Shortage ● We Don’t Have ○ Historical speeds ○ Real Time speeds
  • 10. ● Box coordinates to NYC (remove 0.0 etc.) ● Remove very long / far rides (>2h/65km) ● Remove unreasonable speed / time-distance ratio ○ Remove 5% anomalies (Top & Bottom) Data Cleaning New feature - Abnormal Ride
  • 12. Datetime based features ● Month start / end ● Day / Day of week / Hour / 15 Minute interval ● Is weekend / business day ● Is work hour (09:00-17:00) ● Is rush hour (morning / afternoon) ● Is holiday
  • 13. ● Coordinate Transformations (Coding directionality) ○ PCA 2 2 ○ RBF ● Spheric (geo) distance ● Distance percentile ● Spheric-Trip distance ratio Location based features
  • 14. City based features ● NYC Neighbourhood (pair crossing) ● Distance to points of interest (100X2) ○ Schools / Hospitals / Parks etc. PCA 2
  • 15. Weather based features ● Temperature ● Events - Rain / Snow etc. ● Humidity ● Wind ● Visibility ● Min / Max / Avg / std etc. PCA 2
  • 16. Inferred Traffic based features PCA 1 ● Assumption : our data is a representative sample of the NYC’s - “driving population” ● Crowdedness ○ #rides in X radius ■ 100 / 500 / 1500 / 5000 ■ Euclidean / Manhattan
  • 17. News based features 1. Crawling NYTimes 2. Topic Modeling 3. Finding topics correlated with ETA 4. Using top10 correlated topics as features a. Number of articles on a day for every topic
  • 19. Caveats ● Timeseries future mixing ● Not exactly a metric that might be the most important ○ Same weight for positive / negative error ● Crowdedness - assumes that data is a representative sample of the total car population ○ E.g. 2 times the samples equals 2 times the traffic ● Variance - taken from original validation dataset (constant)
  • 20. Public Leader Board Team Score Team DCountdown 0.150041 Squanchers 0.160468 Noa's stars 0.165869 Aperture Science 0.167958 R-North 0.175308 TAU Deep Learning Lab 0.182602 Mr Terminal 0.193593 MTG 0.282009 SuperFish 0.302637
  • 22. Machine Learning Approach ● Black Box (~ish) ● Harder to deploy ● Retrain when system changes Advantages : Disadvantages : ● No manual tuning ● No complex heuristics ● Optimise different metrics ● Personalisation ● Taking different ideas and their interactions into account as one