2. INTRODUCTION
Primary Research question
Will a specific U.S. flight arrive at the destination on-time or with delay?
Secondary Research Questions
When is the best day of week/time of year to fly to minimize delays?
Which carrier suffers more delays?
How well does departure delay predict arrival delays?
11. CONCLUSION
Fewer flights are delayed in April, May, June, September, October,
November.
Flights are less likely to arrive with delay when their departure time is
between 20:00 and 05:00.
Carriers with the most delayed minutes are AA (American Airlines), WN
(Northwest Airline) and MQ (Envoy Air).
airports with the most number of on-time flights are Atlanta, Phoenix and
Kentucky
Orlando, Atlanta, Dallas and Newark are the most congested airports
12. CONCLUSION
In almost 77% of the cases, if there is a departure delay, then there is an
arrival delay or if the departure is on-time, there is no arrival delay. Our phi
coefficient is 0.53 So we can say departure delay is one of the positive
influencing factors on arrival delay.
After modeling with 3 methods, SVM is the winning method with 80.20%
training accuracy and in 76.45% of the tests, the prediction of aircrafts
arriving on-time or with delay, is correctly done.
Our prediction accuracy could potentially improve if we include other strong
influencing factors such as “weather”.
slide 1: problem (in a simple English what the problem is, and probably why it is important), and the expected results. So, the overview of the problem in a high level.The inconveniences resulted from flight delays have been a long-time challenge for passengers, airports and airlines. According to the study conducted by the U.S. Federal Aviation Administration (FAA) in 2010, the data from 2007 was analyzed in order to quantify the economic impact of flight delays. It was found that 32.9 billion USD was borne by the American passengers and airlines. The purpose of this paper is to use the dataset that is thoroughly to train and test a predicting Machine Learning model to predict arrival flight delays based on the features with the highest relevance to the topic. This will be decided based on descriptive statistical analysis on the data. The aim is to predict whether a flight will arrive at the destination with delay or not, given the circumstances.
slide 2: snapshot of dataset (with names of important features)
dataset which includes all commercial flight arrival and departure details in the USA, between Oct 1987 and Apr 2008
After cleaning the data and removing the features that are not impactful and records with “NA” values, we end up with 20 features and 14,130,317 records with the following data types:
slide 3: overview or your approach (similar to waterfall figure that you all have in your reports)slide 4: more details of your approachslide 5: more details of your approach
slide 7: results
Now we can answer the first research question “When is the best day of week/time of year to fly to minimize delays?” Passengers who travel on Saturdays, Tuesdays and Wednesdays are less likely to experience delay compared to other weekdays. This also applies months with less flight delays such as April, May, June, September, October, November.
To answer the second research question, “Which carrier suffers more delays?” we first look at the top 10 airlines with the most number of delayed minutes in arrival and departure and we separately look at the overall outlook of the top 10 carriers with the most delayed minutes. AA (American Airlines) and WN (Northwest Airline) MQ (Envoy Air), UA (United Airlines) and OO (SkyWest Airlines) are at the top.
Looking at arrival and departure times shows that it’s better to avoid flights that leave the origin at 6:00 am, those flights are more likely to arrive with delays. 6:00 also carries the most traffic of departures. Best hours to fly with the lowest probability of running into arrival delays are between 20:00 and 05:00.
slide 7: results
If we look at 2007 Arrival and Departure delays per airport, this is the image we will get. Orlando, Atlanta, Dallas and Newark are the most congested airports
If we look at the Airlines, Airports and Arrival Delay at the same time, we can clearly see EV (EVA Airlines) experiences its longest delays at Atlanta, AA (American Airlines) and MQ (Envoy Air) at Orlando and Dallas. XE (ExpressJet) at Newark.
To answer the last research question “How well does traveling distance predict plane delays?”, we plot the different possibilities of “Arrival delays” and “Departure delays” together. The pie chart below shows that in almost 77% of the cases, if there is a departure delay, then there is an arrival delay or if the departure is on-time, there is no arrival delay. In this case our phi coefficient is 0.53 which shows a weak positive association between the two variables. So we can say departure delay is one of the positive influencing factors on arrival delay.
the algorithm outputs an optimal hyperplane which categorizes new examples.
Characterization and prediction of air traffic delays (Rebollo & Balakrishnan, 2014)
Predicting airline delays (Bandyopadhyay & Guerrero, 2012)
Flight delay prediction (Martinez, 2012)
Estimating flight departure delay distributions (TU, Ball, & Jank)
Multi-Factor model for predicting delays at U.S. airports (Xu, Sherry, & Laskey)