SlideShare a Scribd company logo
1 of 21
Augmenting Decisions of Taxi Drivers through
Reinforcement Learning for Improving Revenues
AAAI Association for the Advancement of Artificial Intelligence, 2017
Tanvi Verma, Pradeep Varakantham, Sarit Kraus, Hoong Chuin Lau
November 3, 2021
Presenter: Kyunghwan Mun
Contents
• Introduction
• Related Work
• Methodology
• Experiment
• Conclusion and Discussion
Introduction
• Taxis roam around not having a customer (Cruising)
▪ It is important to reduce cruising time and increase revenue
• Right “location" at the right “time”
• Reinforcement Learning (RL)
▪ Maximizing the long term revenue
• Requirement of making a sequence of decisions
• Wait for 5 minutes
• Reinforcement Learning being well defined 🚌…
• Revenue earned from a customer
• Cost from travelling between locations
• Uncertain customer demand
• Reinforcement Learning captures uncertainty well
• Learning focus of RL can adapt demand patterns
2
3
Introduction
• Contributions
▪ Annotation precedure of the trajectory data
▪ Monte Carlo Reinforcement Learning
▪ Iterative abstraction
▪ Evaluation method 🚌…
• The average revenue earned by the learned policy >>The top 10 percentile revenue
• The agent performance >> top 1 percentile revenue (Some time intervals)
• The increase in taxi utilization employing revenue maximization objective
4
Related Work
• Taxi Guidance
▪ Pick-up probability to recommend a driving route for profit maximization
▪ Cruising route to vacant taxis such that vacancy time is minimized
▪ Driver’s experience to find parking spots for a cruising taxi
▪ Taxi trajectories to learn traffic patterns and estimate travel time
▪ Locations for taxi drivers by constructing a spatio-temporal profitability map
• Surrounding regions of the driver
• Computing potential profit using historical data
▪ Considers long term revenue
▪ Any perferences with respet to areas are inherently captured
▪ Relies on past experiences
▪ Taxi trajectory data
5
Related Work
• Reinforcement Learning (RL)
▪ Model-based learning
• Transition probabilities
• Reward function to compute values of states
▪ Model-free learning
• When obtaining samples of experience from the dataset
• Temporal Difference method
• Monte Carlo method
• Estimate state-action values
6
Related Work
• Deep Reinforcement Learning (DeepRL)
▪ Ideal methods for environments where tens of milions of learning episodes
▪ Inappropriate situations to apply in taxi cases 🚌…
• Too small the number of features within the state space
7
Methodology
• Taxi Dataset
▪ A major company in Singapore
▪ Each log enry of the data
• Latitude (GPS)
• Longitude (GPS)
• Taxi ID
• Driver ID
• Taxi Status 🚌…
• Taxis-free (meter off, actively looking for next passenger)
• Busy (not accepting bookings)
• POB (Passenger On Board)
• Off-line
8
Methodology
• Driver Activity Graphs - 1
▪ Cruising trajectory
• “Free” state  “Non-free” state (passenger on board, busy, break, off-line, on call etc.)
• Cruising trajectories of drivers from the dataset
• Annotating the trajectories with the decisions made
▪ Figure 1.
• Starts at A
• Terminates at E
• B, C, and D are intermediate decision coordinates
▪ Desired path
• The shortest path between A and E
• Evaluate if the driver could have made the decision to go to D at A 🚌…
 If not, includes C in the trajectory and repeats for the final trajectory
9
Methodology
• Driver Activity Graphs – 2
▪ Convert each cruising trajectory into an activity graph
• A directed graph with decision coordinates as nodes
• Distance travelled between the coordinates
• Weight of the edge between them
• Terminating node of the activity graph
• Contains information about revenue earned
• Earned Revenue
• The fare of trip – The cost of travel for the trip
10
Methodology
• Reinforcement Learning (RL) for Taxi Driver
▪ State is given as follows:
• <day of week, zone, time interval>
• Divide the entire map of Singapore into several zones
• Time interval (0-6 hours, 6-9 hours, 9-12 hours, 12-17 hours, 17-20 hours, 20-24 hours)
• For n zones, n available actions
(Stay in the current zone / Move to remaining n-1 zones)
▪ Episodes
• “Non-free” state  “Free” state  “Non-free” state (Termination)
• The cost of travel between nodes  Fixed cost per km to the weight of the edge
• Positive reward : The fare of the trip – The cost to travel the trip
11
Methodology
• Reinforcement Learning (RL) for Taxi Driver
▪ (Algorithm 1) Monte Carlo Estimation of Q Values
• Return (The cumulative reward accumulated till the end of the episode)
• 𝑄(𝑠, 𝑎) : The value of (𝑠, 𝑎) pair
• Variable “min-count” to avoid inaccurate estimated value
• 𝐶𝑜𝑢𝑛𝑡(𝑠, 𝑎) : The total number of training episodes in which
(𝑠, 𝑎) was visited
• Policy 𝜋(𝑠) : mapping state s to it’s optimal action
• 𝑆 : The set of states
• 𝐴 : The set of actions
• 𝑆𝑙𝑒𝑎𝑟𝑛𝑒𝑑 : The set of states for which we could learn optimal
policy
12
Methodology
• Reinforcement Learning (RL) for Taxi Driver
▪ Zone Structure
• Too big zones
 Increase uncertainty in outcome for actions
• Too small zones
 Doesn’t have sufficient training data to learn something meaningful
It is importance to balance between uncertainty and granulaity
▪ Method 1.) Static Zones
• Start with a large number of uniformly distributed zones
• Check how many relevant episodes are present in each zone
• If the number < 𝑚𝑖𝑛 − 𝑐𝑜𝑢𝑛𝑡, merge the zone has sufficient data
(500 zones  111 zones)
13
Methodology
• Reinforcement Learning (RL) for Taxi Driver
▪ Method 2.) Dynamic Zones
• Fix 𝑡𝑖𝑚𝑒 − 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 and 𝑑𝑎𝑦 − 𝑜𝑓 − 𝑡ℎ𝑒 − 𝑤𝑒𝑒𝑘
 each zone maps to a unique state and a unique action
• Decide whether certain low valued zones needs to split into smaller zones
• For Split zones, learn Q-values for the new set of zones
• Check if certain zones can be split
• Decrease the uncertainty in outcome of optimal action
• If smaller zones having adequate data & increasing the ocerall value of the bigger zone
 Split larger zones into smaller zones
14
Methodology
• (Algorithm 2) Dynamic zoning
▪ Start with four large uniform zones
▪ Split the zones repeatedly until further splits is not possible
• (Algorithm 3) WorthSplitting(z)
▪ Split the zones using K-Means Clustering
• Size of child zones > min-size
• max
𝑎
𝑄(𝑠1, 𝑎) + max
𝑎
𝑄 𝑠2, 𝑎 > max
𝑎
𝑄(𝑠, 𝑎)
• argmax
𝑎
𝑄 𝑠1, 𝑎 ! = argmax
𝑎
𝑄(𝑠, 𝑎) OR argmax
𝑎
𝑄 𝑠2, 𝑎 ! = argmax
𝑎
𝑄(𝑠, 𝑎)
15
Experiments
• Evaluation Method
▪ Compare (a), (b) and (c)
• Average revenue earned by our learning agent … (a)
• The top percentile revenue of drivers … (b)
• Revenue earned by greedy heuristics typically employed by drivers during cruising … (c)
▪ Simulation of Agent Movements
• Assigning the available trips to the agent while consider competition from active drivers
• Trip data and trajectories of all active drivers during a given date and time-interval
• Finding the relevant available trips (non pre-booked trips) that originated from each state
• Revenue earned, duration and distance for each trip
• Assignment probability (𝑝𝑎𝑠𝑠𝑖𝑔𝑛
𝑠𝑡
) :
𝑇ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑖𝑝𝑠 𝑎𝑣𝑎𝑖𝑙𝑎𝑏𝑙𝑒
𝑇ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑟𝑢𝑖𝑠𝑖𝑛𝑔 𝑑𝑟𝑖𝑣𝑒𝑟𝑠 𝑝𝑟𝑒𝑠𝑒𝑛𝑡 𝑖𝑛 𝑡ℎ𝑒 𝑠𝑡𝑎𝑡𝑒 𝑎𝑡 𝑡ℎ𝑎𝑡 𝑡𝑖𝑚𝑒
16
Experiments
• Evaluation Method
▪ Driver revenue
• It is difficult to estimate the exact cruising distance of our agent 🚌…
• Apply cost of travel per cruising minute
• Compute time duration for which the driver was not hired in the time interval
• 𝐶𝑟𝑢𝑖𝑠𝑖𝑛𝑔 − 𝑐𝑜𝑠𝑡 per minute is appled for this duration
• Driver’s revenue in a time interval
= All the trips of the driver in the time interval – Cost of travelling all trip distance – Cost of all cruising
▪ Heuristic strategy
• The remaining probability (𝑝𝑠𝑡𝑎𝑦 = 0.5)
17
Experiments
• Evaluation Method
▪ Agent revenue
• Compute agent’s revenue for each time-interval
• Initialize time with a start time of the interval
18
Experiments
• Experimental Results
▪ Evaluate dataset period : 1 month
▪ Average agent revenue VS Average of top percentile revenues earned by drivers
• Compare with top 10 percentile revenues
▪ Starting states of agent : Top 500 drivers in each time interval 🚌…
• For a given time interval and day, the agent revenue is averaged over 500,000 executions
(500 different initial states * 1000 exeutions)
19
Conclusion and Discussion
• Limitations & Requirements
▪ One single learning agent
 Multiple learning agents
▪ Starting states of agent : Top 500 drivers in each time interval
 Dynamic starting states with multiple learning agents
▪ Simple taxi states  Construct diverse taxi states (K trips, etc)
▪ Construction of time intervals divided
 Based on historical data
▪ Condition that the episode ends
 Set the end time of episodes that exceeds a specific threshold
(ex) K trips, Cruising distance, Waiting time, …) to reduces executions
Thank you
Any questions?

More Related Content

Similar to Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Improving Revenues

Time-based Prize Collecting Modified Orienteering Problem for NYC Travel Plan
Time-based Prize Collecting Modified Orienteering Problem for NYC Travel PlanTime-based Prize Collecting Modified Orienteering Problem for NYC Travel Plan
Time-based Prize Collecting Modified Orienteering Problem for NYC Travel Planpc3377
 
Deep reinforcement learning for traffic light cycle control
Deep reinforcement learning for traffic light cycle controlDeep reinforcement learning for traffic light cycle control
Deep reinforcement learning for traffic light cycle controlPRITIJHA21
 
Towards better bus networks: A visual analytics approach
Towards better bus networks: A visual analytics approachTowards better bus networks: A visual analytics approach
Towards better bus networks: A visual analytics approachivaderivader
 
A Dynamic Logistic Dispatching System With Set-Based Particle Swarm Optimization
A Dynamic Logistic Dispatching System With Set-Based Particle Swarm OptimizationA Dynamic Logistic Dispatching System With Set-Based Particle Swarm Optimization
A Dynamic Logistic Dispatching System With Set-Based Particle Swarm OptimizationRajib Roy
 
Assignment12 s1270253.pptx
Assignment12 s1270253.pptxAssignment12 s1270253.pptx
Assignment12 s1270253.pptxRyoyaYoshimoto
 
Deep_Reinforcement_Learning_based_Dynamic_Timetable.pptx
Deep_Reinforcement_Learning_based_Dynamic_Timetable.pptxDeep_Reinforcement_Learning_based_Dynamic_Timetable.pptx
Deep_Reinforcement_Learning_based_Dynamic_Timetable.pptxNehaVerma933923
 
When and where are bus express services justified?
When and where are bus express services justified?When and where are bus express services justified?
When and where are bus express services justified?BRTCoE
 
Traffic Flow Forecasting with Spatial-Temporal Graph Diffusion Network
Traffic Flow Forecasting with Spatial-Temporal Graph Diffusion NetworkTraffic Flow Forecasting with Spatial-Temporal Graph Diffusion Network
Traffic Flow Forecasting with Spatial-Temporal Graph Diffusion Networkivaderivader
 
Predicting Optimal Parallelism for Data Analytics
Predicting Optimal Parallelism for Data AnalyticsPredicting Optimal Parallelism for Data Analytics
Predicting Optimal Parallelism for Data AnalyticsDatabricks
 
Presentation on Spot Speed Study Analysis for the course CE 454
Presentation on Spot Speed Study Analysis for the course CE 454Presentation on Spot Speed Study Analysis for the course CE 454
Presentation on Spot Speed Study Analysis for the course CE 454nazifa tabassum
 
MSCV Capstone Spring 2020 Presentation - RL for AD
MSCV Capstone Spring 2020 Presentation - RL for ADMSCV Capstone Spring 2020 Presentation - RL for AD
MSCV Capstone Spring 2020 Presentation - RL for ADMayank Gupta
 
Appointment system tru x_usc_hackathon
Appointment system tru x_usc_hackathonAppointment system tru x_usc_hackathon
Appointment system tru x_usc_hackathonJacob Westerfield
 
How to Design an On-Demand Transit Service
How to Design an On-Demand Transit ServiceHow to Design an On-Demand Transit Service
How to Design an On-Demand Transit ServiceGurjap Birring
 
Paratransit Service Analytics Reporting
Paratransit Service Analytics ReportingParatransit Service Analytics Reporting
Paratransit Service Analytics ReportingTSSParatransit
 
AWS Finland Meetup June 2019 - DeepRacer story
AWS Finland Meetup June 2019 - DeepRacer storyAWS Finland Meetup June 2019 - DeepRacer story
AWS Finland Meetup June 2019 - DeepRacer storyJouni Luoma
 
AACourier-30m-Industry
AACourier-30m-IndustryAACourier-30m-Industry
AACourier-30m-IndustryJake Cracknell
 
iBAT: Detecting Anomalous Taxi Trajectories from GPS Traces
iBAT: Detecting Anomalous Taxi Trajectories from GPS TracesiBAT: Detecting Anomalous Taxi Trajectories from GPS Traces
iBAT: Detecting Anomalous Taxi Trajectories from GPS TracesRrubaa Panchendrarajan
 

Similar to Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Improving Revenues (20)

Time-based Prize Collecting Modified Orienteering Problem for NYC Travel Plan
Time-based Prize Collecting Modified Orienteering Problem for NYC Travel PlanTime-based Prize Collecting Modified Orienteering Problem for NYC Travel Plan
Time-based Prize Collecting Modified Orienteering Problem for NYC Travel Plan
 
Deep reinforcement learning for traffic light cycle control
Deep reinforcement learning for traffic light cycle controlDeep reinforcement learning for traffic light cycle control
Deep reinforcement learning for traffic light cycle control
 
Towards better bus networks: A visual analytics approach
Towards better bus networks: A visual analytics approachTowards better bus networks: A visual analytics approach
Towards better bus networks: A visual analytics approach
 
A Dynamic Logistic Dispatching System With Set-Based Particle Swarm Optimization
A Dynamic Logistic Dispatching System With Set-Based Particle Swarm OptimizationA Dynamic Logistic Dispatching System With Set-Based Particle Swarm Optimization
A Dynamic Logistic Dispatching System With Set-Based Particle Swarm Optimization
 
Assignment12 s1270253.pptx
Assignment12 s1270253.pptxAssignment12 s1270253.pptx
Assignment12 s1270253.pptx
 
Deep_Reinforcement_Learning_based_Dynamic_Timetable.pptx
Deep_Reinforcement_Learning_based_Dynamic_Timetable.pptxDeep_Reinforcement_Learning_based_Dynamic_Timetable.pptx
Deep_Reinforcement_Learning_based_Dynamic_Timetable.pptx
 
When and where are bus express services justified?
When and where are bus express services justified?When and where are bus express services justified?
When and where are bus express services justified?
 
Traffic Flow Forecasting with Spatial-Temporal Graph Diffusion Network
Traffic Flow Forecasting with Spatial-Temporal Graph Diffusion NetworkTraffic Flow Forecasting with Spatial-Temporal Graph Diffusion Network
Traffic Flow Forecasting with Spatial-Temporal Graph Diffusion Network
 
Predicting Optimal Parallelism for Data Analytics
Predicting Optimal Parallelism for Data AnalyticsPredicting Optimal Parallelism for Data Analytics
Predicting Optimal Parallelism for Data Analytics
 
Presentation on Spot Speed Study Analysis for the course CE 454
Presentation on Spot Speed Study Analysis for the course CE 454Presentation on Spot Speed Study Analysis for the course CE 454
Presentation on Spot Speed Study Analysis for the course CE 454
 
A new traffic report 3
A new traffic report 3A new traffic report 3
A new traffic report 3
 
MSCV Capstone Spring 2020 Presentation - RL for AD
MSCV Capstone Spring 2020 Presentation - RL for ADMSCV Capstone Spring 2020 Presentation - RL for AD
MSCV Capstone Spring 2020 Presentation - RL for AD
 
Appointment system tru x_usc_hackathon
Appointment system tru x_usc_hackathonAppointment system tru x_usc_hackathon
Appointment system tru x_usc_hackathon
 
How to Design an On-Demand Transit Service
How to Design an On-Demand Transit ServiceHow to Design an On-Demand Transit Service
How to Design an On-Demand Transit Service
 
Paratransit Service Analytics Reporting
Paratransit Service Analytics ReportingParatransit Service Analytics Reporting
Paratransit Service Analytics Reporting
 
A new traffic report 2
A new traffic report 2A new traffic report 2
A new traffic report 2
 
A new traffic report 2
A new traffic report 2A new traffic report 2
A new traffic report 2
 
AWS Finland Meetup June 2019 - DeepRacer story
AWS Finland Meetup June 2019 - DeepRacer storyAWS Finland Meetup June 2019 - DeepRacer story
AWS Finland Meetup June 2019 - DeepRacer story
 
AACourier-30m-Industry
AACourier-30m-IndustryAACourier-30m-Industry
AACourier-30m-Industry
 
iBAT: Detecting Anomalous Taxi Trajectories from GPS Traces
iBAT: Detecting Anomalous Taxi Trajectories from GPS TracesiBAT: Detecting Anomalous Taxi Trajectories from GPS Traces
iBAT: Detecting Anomalous Taxi Trajectories from GPS Traces
 

More from ivaderivader

DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph KernelsDDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph Kernelsivaderivader
 
So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality
So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality
So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality ivaderivader
 
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...ivaderivader
 
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...ivaderivader
 
Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...
Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...
Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...ivaderivader
 
A Style-Based Generator Architecture for Generative Adversarial Networks
A Style-Based Generator Architecture for Generative Adversarial NetworksA Style-Based Generator Architecture for Generative Adversarial Networks
A Style-Based Generator Architecture for Generative Adversarial Networksivaderivader
 
CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...
CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...
CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...ivaderivader
 
Perception! Immersion! Empowerment! Superpowers as Inspiration for Visualization
Perception! Immersion! Empowerment! Superpowers as Inspiration for VisualizationPerception! Immersion! Empowerment! Superpowers as Inspiration for Visualization
Perception! Immersion! Empowerment! Superpowers as Inspiration for Visualizationivaderivader
 
Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...
Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...
Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...ivaderivader
 
StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...
StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...
StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...ivaderivader
 
Bad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTube
Bad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTubeBad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTube
Bad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTubeivaderivader
 
Invertible Denoising Network: A Light Solution for Real Noise Removal
Invertible Denoising Network: A Light Solution for Real Noise RemovalInvertible Denoising Network: A Light Solution for Real Noise Removal
Invertible Denoising Network: A Light Solution for Real Noise Removalivaderivader
 
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural Network
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural NetworkTraffic Demand Prediction Based Dynamic Transition Convolutional Neural Network
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural Networkivaderivader
 
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training  MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training ivaderivader
 
Screen2Vec: Semantic Embedding of GUI Screens and GUI Components
Screen2Vec: Semantic Embedding of GUI Screens and GUI ComponentsScreen2Vec: Semantic Embedding of GUI Screens and GUI Components
Screen2Vec: Semantic Embedding of GUI Screens and GUI Componentsivaderivader
 
Natural Language to Visualization by Neural Machine Translation
Natural Language to Visualization by Neural Machine TranslationNatural Language to Visualization by Neural Machine Translation
Natural Language to Visualization by Neural Machine Translationivaderivader
 
Recommending What Video to Watch Next: A Multitask Ranking System
Recommending What Video to Watch Next: A Multitask Ranking SystemRecommending What Video to Watch Next: A Multitask Ranking System
Recommending What Video to Watch Next: A Multitask Ranking Systemivaderivader
 
Video Background Music Generation with Controllable Music Transformer
Video Background Music Generation with Controllable Music TransformerVideo Background Music Generation with Controllable Music Transformer
Video Background Music Generation with Controllable Music Transformerivaderivader
 

More from ivaderivader (20)

Argument Mining
Argument MiningArgument Mining
Argument Mining
 
Papers at CHI23
Papers at CHI23Papers at CHI23
Papers at CHI23
 
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph KernelsDDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
 
So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality
So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality
So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality
 
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...
 
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...
 
Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...
Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...
Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...
 
A Style-Based Generator Architecture for Generative Adversarial Networks
A Style-Based Generator Architecture for Generative Adversarial NetworksA Style-Based Generator Architecture for Generative Adversarial Networks
A Style-Based Generator Architecture for Generative Adversarial Networks
 
CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...
CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...
CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...
 
Perception! Immersion! Empowerment! Superpowers as Inspiration for Visualization
Perception! Immersion! Empowerment! Superpowers as Inspiration for VisualizationPerception! Immersion! Empowerment! Superpowers as Inspiration for Visualization
Perception! Immersion! Empowerment! Superpowers as Inspiration for Visualization
 
Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...
Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...
Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...
 
StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...
StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...
StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...
 
Bad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTube
Bad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTubeBad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTube
Bad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTube
 
Invertible Denoising Network: A Light Solution for Real Noise Removal
Invertible Denoising Network: A Light Solution for Real Noise RemovalInvertible Denoising Network: A Light Solution for Real Noise Removal
Invertible Denoising Network: A Light Solution for Real Noise Removal
 
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural Network
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural NetworkTraffic Demand Prediction Based Dynamic Transition Convolutional Neural Network
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural Network
 
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training  MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training
 
Screen2Vec: Semantic Embedding of GUI Screens and GUI Components
Screen2Vec: Semantic Embedding of GUI Screens and GUI ComponentsScreen2Vec: Semantic Embedding of GUI Screens and GUI Components
Screen2Vec: Semantic Embedding of GUI Screens and GUI Components
 
Natural Language to Visualization by Neural Machine Translation
Natural Language to Visualization by Neural Machine TranslationNatural Language to Visualization by Neural Machine Translation
Natural Language to Visualization by Neural Machine Translation
 
Recommending What Video to Watch Next: A Multitask Ranking System
Recommending What Video to Watch Next: A Multitask Ranking SystemRecommending What Video to Watch Next: A Multitask Ranking System
Recommending What Video to Watch Next: A Multitask Ranking System
 
Video Background Music Generation with Controllable Music Transformer
Video Background Music Generation with Controllable Music TransformerVideo Background Music Generation with Controllable Music Transformer
Video Background Music Generation with Controllable Music Transformer
 

Recently uploaded

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 

Recently uploaded (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Improving Revenues

  • 1. Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Improving Revenues AAAI Association for the Advancement of Artificial Intelligence, 2017 Tanvi Verma, Pradeep Varakantham, Sarit Kraus, Hoong Chuin Lau November 3, 2021 Presenter: Kyunghwan Mun
  • 2. Contents • Introduction • Related Work • Methodology • Experiment • Conclusion and Discussion
  • 3. Introduction • Taxis roam around not having a customer (Cruising) ▪ It is important to reduce cruising time and increase revenue • Right “location" at the right “time” • Reinforcement Learning (RL) ▪ Maximizing the long term revenue • Requirement of making a sequence of decisions • Wait for 5 minutes • Reinforcement Learning being well defined 🚌… • Revenue earned from a customer • Cost from travelling between locations • Uncertain customer demand • Reinforcement Learning captures uncertainty well • Learning focus of RL can adapt demand patterns 2
  • 4. 3 Introduction • Contributions ▪ Annotation precedure of the trajectory data ▪ Monte Carlo Reinforcement Learning ▪ Iterative abstraction ▪ Evaluation method 🚌… • The average revenue earned by the learned policy >>The top 10 percentile revenue • The agent performance >> top 1 percentile revenue (Some time intervals) • The increase in taxi utilization employing revenue maximization objective
  • 5. 4 Related Work • Taxi Guidance ▪ Pick-up probability to recommend a driving route for profit maximization ▪ Cruising route to vacant taxis such that vacancy time is minimized ▪ Driver’s experience to find parking spots for a cruising taxi ▪ Taxi trajectories to learn traffic patterns and estimate travel time ▪ Locations for taxi drivers by constructing a spatio-temporal profitability map • Surrounding regions of the driver • Computing potential profit using historical data ▪ Considers long term revenue ▪ Any perferences with respet to areas are inherently captured ▪ Relies on past experiences ▪ Taxi trajectory data
  • 6. 5 Related Work • Reinforcement Learning (RL) ▪ Model-based learning • Transition probabilities • Reward function to compute values of states ▪ Model-free learning • When obtaining samples of experience from the dataset • Temporal Difference method • Monte Carlo method • Estimate state-action values
  • 7. 6 Related Work • Deep Reinforcement Learning (DeepRL) ▪ Ideal methods for environments where tens of milions of learning episodes ▪ Inappropriate situations to apply in taxi cases 🚌… • Too small the number of features within the state space
  • 8. 7 Methodology • Taxi Dataset ▪ A major company in Singapore ▪ Each log enry of the data • Latitude (GPS) • Longitude (GPS) • Taxi ID • Driver ID • Taxi Status 🚌… • Taxis-free (meter off, actively looking for next passenger) • Busy (not accepting bookings) • POB (Passenger On Board) • Off-line
  • 9. 8 Methodology • Driver Activity Graphs - 1 ▪ Cruising trajectory • “Free” state  “Non-free” state (passenger on board, busy, break, off-line, on call etc.) • Cruising trajectories of drivers from the dataset • Annotating the trajectories with the decisions made ▪ Figure 1. • Starts at A • Terminates at E • B, C, and D are intermediate decision coordinates ▪ Desired path • The shortest path between A and E • Evaluate if the driver could have made the decision to go to D at A 🚌…  If not, includes C in the trajectory and repeats for the final trajectory
  • 10. 9 Methodology • Driver Activity Graphs – 2 ▪ Convert each cruising trajectory into an activity graph • A directed graph with decision coordinates as nodes • Distance travelled between the coordinates • Weight of the edge between them • Terminating node of the activity graph • Contains information about revenue earned • Earned Revenue • The fare of trip – The cost of travel for the trip
  • 11. 10 Methodology • Reinforcement Learning (RL) for Taxi Driver ▪ State is given as follows: • <day of week, zone, time interval> • Divide the entire map of Singapore into several zones • Time interval (0-6 hours, 6-9 hours, 9-12 hours, 12-17 hours, 17-20 hours, 20-24 hours) • For n zones, n available actions (Stay in the current zone / Move to remaining n-1 zones) ▪ Episodes • “Non-free” state  “Free” state  “Non-free” state (Termination) • The cost of travel between nodes  Fixed cost per km to the weight of the edge • Positive reward : The fare of the trip – The cost to travel the trip
  • 12. 11 Methodology • Reinforcement Learning (RL) for Taxi Driver ▪ (Algorithm 1) Monte Carlo Estimation of Q Values • Return (The cumulative reward accumulated till the end of the episode) • 𝑄(𝑠, 𝑎) : The value of (𝑠, 𝑎) pair • Variable “min-count” to avoid inaccurate estimated value • 𝐶𝑜𝑢𝑛𝑡(𝑠, 𝑎) : The total number of training episodes in which (𝑠, 𝑎) was visited • Policy 𝜋(𝑠) : mapping state s to it’s optimal action • 𝑆 : The set of states • 𝐴 : The set of actions • 𝑆𝑙𝑒𝑎𝑟𝑛𝑒𝑑 : The set of states for which we could learn optimal policy
  • 13. 12 Methodology • Reinforcement Learning (RL) for Taxi Driver ▪ Zone Structure • Too big zones  Increase uncertainty in outcome for actions • Too small zones  Doesn’t have sufficient training data to learn something meaningful It is importance to balance between uncertainty and granulaity ▪ Method 1.) Static Zones • Start with a large number of uniformly distributed zones • Check how many relevant episodes are present in each zone • If the number < 𝑚𝑖𝑛 − 𝑐𝑜𝑢𝑛𝑡, merge the zone has sufficient data (500 zones  111 zones)
  • 14. 13 Methodology • Reinforcement Learning (RL) for Taxi Driver ▪ Method 2.) Dynamic Zones • Fix 𝑡𝑖𝑚𝑒 − 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 and 𝑑𝑎𝑦 − 𝑜𝑓 − 𝑡ℎ𝑒 − 𝑤𝑒𝑒𝑘  each zone maps to a unique state and a unique action • Decide whether certain low valued zones needs to split into smaller zones • For Split zones, learn Q-values for the new set of zones • Check if certain zones can be split • Decrease the uncertainty in outcome of optimal action • If smaller zones having adequate data & increasing the ocerall value of the bigger zone  Split larger zones into smaller zones
  • 15. 14 Methodology • (Algorithm 2) Dynamic zoning ▪ Start with four large uniform zones ▪ Split the zones repeatedly until further splits is not possible • (Algorithm 3) WorthSplitting(z) ▪ Split the zones using K-Means Clustering • Size of child zones > min-size • max 𝑎 𝑄(𝑠1, 𝑎) + max 𝑎 𝑄 𝑠2, 𝑎 > max 𝑎 𝑄(𝑠, 𝑎) • argmax 𝑎 𝑄 𝑠1, 𝑎 ! = argmax 𝑎 𝑄(𝑠, 𝑎) OR argmax 𝑎 𝑄 𝑠2, 𝑎 ! = argmax 𝑎 𝑄(𝑠, 𝑎)
  • 16. 15 Experiments • Evaluation Method ▪ Compare (a), (b) and (c) • Average revenue earned by our learning agent … (a) • The top percentile revenue of drivers … (b) • Revenue earned by greedy heuristics typically employed by drivers during cruising … (c) ▪ Simulation of Agent Movements • Assigning the available trips to the agent while consider competition from active drivers • Trip data and trajectories of all active drivers during a given date and time-interval • Finding the relevant available trips (non pre-booked trips) that originated from each state • Revenue earned, duration and distance for each trip • Assignment probability (𝑝𝑎𝑠𝑠𝑖𝑔𝑛 𝑠𝑡 ) : 𝑇ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑖𝑝𝑠 𝑎𝑣𝑎𝑖𝑙𝑎𝑏𝑙𝑒 𝑇ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑟𝑢𝑖𝑠𝑖𝑛𝑔 𝑑𝑟𝑖𝑣𝑒𝑟𝑠 𝑝𝑟𝑒𝑠𝑒𝑛𝑡 𝑖𝑛 𝑡ℎ𝑒 𝑠𝑡𝑎𝑡𝑒 𝑎𝑡 𝑡ℎ𝑎𝑡 𝑡𝑖𝑚𝑒
  • 17. 16 Experiments • Evaluation Method ▪ Driver revenue • It is difficult to estimate the exact cruising distance of our agent 🚌… • Apply cost of travel per cruising minute • Compute time duration for which the driver was not hired in the time interval • 𝐶𝑟𝑢𝑖𝑠𝑖𝑛𝑔 − 𝑐𝑜𝑠𝑡 per minute is appled for this duration • Driver’s revenue in a time interval = All the trips of the driver in the time interval – Cost of travelling all trip distance – Cost of all cruising ▪ Heuristic strategy • The remaining probability (𝑝𝑠𝑡𝑎𝑦 = 0.5)
  • 18. 17 Experiments • Evaluation Method ▪ Agent revenue • Compute agent’s revenue for each time-interval • Initialize time with a start time of the interval
  • 19. 18 Experiments • Experimental Results ▪ Evaluate dataset period : 1 month ▪ Average agent revenue VS Average of top percentile revenues earned by drivers • Compare with top 10 percentile revenues ▪ Starting states of agent : Top 500 drivers in each time interval 🚌… • For a given time interval and day, the agent revenue is averaged over 500,000 executions (500 different initial states * 1000 exeutions)
  • 20. 19 Conclusion and Discussion • Limitations & Requirements ▪ One single learning agent  Multiple learning agents ▪ Starting states of agent : Top 500 drivers in each time interval  Dynamic starting states with multiple learning agents ▪ Simple taxi states  Construct diverse taxi states (K trips, etc) ▪ Construction of time intervals divided  Based on historical data ▪ Condition that the episode ends  Set the end time of episodes that exceeds a specific threshold (ex) K trips, Cruising distance, Waiting time, …) to reduces executions