2015 D-STOP Symposium session by UT Austin's Doug Fearing. Watch the presentation at http://youtu.be/y2kYLM8GdbI?t=44m44s
Get symposium details: http://ctr.utexas.edu/research/d-stop/education/annual-symposium/
Using Microsimulation to Recover Heterogeneity from Aggregated Data
1. D-STOP Symposium
March 2, 2015
Douglas Fearing, UT Austin McCombs
(doug.fearing@mccombs.utexas.edu)
Joint work with Cynthia Barnhart, MIT, and Vikrant Vaze, Dartmouth
Using Microsimulation to Recover
Heterogeneity from Aggregated Data
2. 22
Passenger Delays
• Depend on flight delays, flight cancelations, missed
connections, and re-accommodation
– Flight delays alone are not enough (Bratu & Barnhart, 2005)
• Cost U.S. passengers billions of dollars per year
– Exact amount unknown because data is proprietary
• Multiple methodologies, cost estimates for 2007:
– Air Transport Association ($5 billion), U.S. Senate Joint
Economic Committee ($7.4 billion)
(ignoring flight cancelations & passenger connections)
– Wang, Sherry, and Donahue ($8.5 billion)
(simulating passenger connections)
3. 33
Data Sources
• Planned flight schedules
– Flight on-time performance data
• Flight seating capacities
– Airline inventories, aircraft codes, monthly seat counts
• Aggregate passenger demand data
– Monthly segment demands, quarterly 10% coupon samples
(one-way itineraries)
• Heterogeneous booking data
– One quarter for a major U.S. carrier
4. 44
Passenger Travel Estimation
• Developed multinomial logit (discrete choice) model
of itinerary shares
– Regression function includes time-of-day, day-of-week,
connection time, cancelations, and seats
– Trained on one quarter of booking data from a large carrier
• Generate potential non-stop and one-stop itineraries
from flight schedule data
• Use microsimulation to allocate passengers to
itineraries based on estimated logit proportions
– Using aggregated passenger demand data to determine total
number of passengers and one-stop route proportions
5. 55
Passenger Delay Calculation
• Extension of Passenger Delay Calculator developed
by Bratu & Barnhart (2005)
• Disrupted passengers are determined by analyzing
historical flight schedule data
• Passengers are re-accommodated on alternative
itineraries in the order they are disrupted
– Attempt re-accommodation on ticketed carrier and partner
carriers first, and then consider all carriers
• Maximum delay of 8 hours for daytime disruptions
(5:00am - 5:00pm) / 16 hours for evening disruptions
6. 66
Annualized Costs of Delays
• Estimated 245 million hours of U.S. domestic
passenger delays in 2007
• Total cost of $9.2 billion
– Assuming $37.60 per hour value of passenger time (same
value as used in other reports)
• Out of all passenger delay,
– (only) 52% due to flight delays
– 30% due to canceled flights
– 18% due to missed connections
• Average passenger delay of 30.2 minutes
– Compared to average flight delay of 15.3 minutes
7. 77
Crew-related Flight Delays
• FAA Office of Performance Analysis responsible for
forecasting delays six months in advance
– Existing simulations model delay propagation through
aircraft, not passengers or crew
• Crew schedules create another source of flight
delays and cancelations
– For example, pilots hitting work limits due to earlier delays
• Data on crew schedules and constraints not publicly
available
8. 88
Proposed Estimation Approach
• Collect proprietary crew scheduling and recovery
data to match with flight delay information
– Across major carrier types (network, regional, point-to-point)
• Build parameterized crew scheduling and recovery
optimization models that incorporate insights from
interviews and industry surveys
– Allow for trade-offs between operating costs and delays
• Fit optimization models against proprietary data
• Apply optimization models against full data set
– To understand the role that crew heterogeneity plays in
overall flight delays, inform FAA simulation models
Notas del editor
The key finding from the Bratu & Barnhart research was that passenger itinerary disruptions, such as flight cancelations and missed connections, contribute to approximately 50% of passenger delays, due largely to the significant time taken to re-accommodate disrupted passengers.
Ideally, we would like to use actual passenger bookings to estimate passenger delays and their corresponding costs, but this data is proprietary for each carrier. So, instead we have to resort some sort of approximate approach.
The Air Transport Association and the U.S. Senate Joint Economic Committee have both come up with estimates by multiplying flight delays times an estimated number of passengers on each flight (thus ignoring any itinerary disruptions), with each using a slightly different definition of flight delays. Wang, Sherry, and Donahue perform an analysis using average load factors to estimate non-stop passenger counts, and separately simulate missed connection rates. Our approach differs from these in that we first focus on the problem of allocating passengers to individual itineraries.
The Bureau of Transportation Statistics provides an extensive data set related to individual flights and groups of passengers. The primary challenge we address is the fact that the passenger data provided is significantly aggregated, either monthly for origin-destination segment demands or quarterly for sampled one-way itineraries. Our approach is to use proprietary booking data to analyze patterns in passenger demand and subsequently use this information to disaggregate the publicly-available passenger demand data.
It’s worth noting that many of these data sets use slightly different keys for flights and aircraft, thus creating the cleaned and joined data set presented a significant challenge in and of itself. All of the data has been combined in a large Oracle database.
The parameter estimates in the multinomial logit model represent the utility associated with various features of the itinerary. For instance, passengers prefer connection times close to an hour with decreased utility for both long and short connection times. We fit these parameters using the training data, the one quarter of proprietary booking data. Note that we include cancelations in the model, a feature that is known only ex-post, thus this does not represent a passenger choice model. Based on the proprietary data, we can see that airlines tend to cancel flights with fewer passengers, thus implying a negative correlation between flight cancelations and passenger bookings. This is the property we are exploiting in order to ensure that we are conservative in our estimates of passenger cancelations and corresponding delays.
To generate possible itineraries, we use the flight on-time performance data, filtering out any combination of carrier and route that does not exist in the Bureau of Transportation Statistics’ DB1B data. When generating these itineraries, we allow planned connection times between 30 minutes and 5 hours.
As we allocate passengers to the generated itineraries, we ensure that no flights become overbooked, and we are careful not to bias our allocation towards any group of itineraries based on the order in which we allocate passengers. We avoid these biases by randomly ordering the passengers prior to allocation.
The passenger delay calculator proceeds by first determining the set of disrupted passengers and then re-accommodating these passengers on future flights.
The Bureau of Transportation Statistics flight on-time performance data contains historical information on flight cancelations and delays, which we can use to determine which passenger itineraries have been disrupted. Once we have this list, we greedily re-accommodate passengers on future flights, making sure not to overbook any of these flights. Because of this constraint, higher load factors will generally lead to longer re-accommodation times, which is a result we would expect to see.
As part of our analysis, we have performed significant validation and sensitivity analysis relating to the passenger travel estimation and delay calculation. The overall estimates are particularly sensitive to the maximum delay value of 8 hours for daytime disruptions and 16 hours for evening disruptions. We include these delay caps in part because some passengers might have been re-accommodated using flights not included our data set. A shorter maximum delay decreases passenger delays in two ways. First, passengers who have no alternative are estimated to have smaller delays. And second, because more passengers are assigned the maximum delay, fewer passengers are required to be re-accommodated within the system, saving seats for other passenger re-accommodations.
Consistent with the analysis performed by Bratu and Barnhart, we estimate that about half of all passenger delays are due to cancelations and missed connections.