Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Traffic Data Analysis and Prediction using Big Data

500 visualizaciones

Publicado el

- Denser traffic on Freeways 101, 405, 10
- Rush hours from 7 am to 9 am produce a lot of traffic, the heaviest traffic time start from 3pm and gets better after 6pm.
- Major areas of traffic in DTLA, Santa Monica, Hollywood
- More insights can be found with bigger dataset using this framework for analysis of traffic
- Using such data and platform can also give an opportunity to predict traffic congestions. Prediction can be performed using machine learning algorithm – Decision Forest with the accuracy of 83% for predicting the heaviest traffic jam.

Publicado en: Datos y análisis
  • Inicia sesión para ver los comentarios

  • Sé el primero en recomendar esto

Traffic Data Analysis and Prediction using Big Data

  1. 1. Jongwook Woo HiPIC CalStateLA KSII The 14th Asia Pacific International Conference on Information Science and Technology(APIC-IST), Beijing June 24 2019 Dalya (Dalyapraz) Dauletbak, dmanato@calstatela.edu Jongwook Woo, PhD Big Data AI Center (BigDAI) California State University Los Angeles Traffic Data Analysis and Prediction using Big Data
  2. 2. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Contents  Introduction  H/W Specification  Architecture Chart  Implementation steps  Data structure  Analysis  Prediction  Summary
  3. 3. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Introduction About me:  Graduate Computer Information Systems Student at California State University, Los Angeles – BS (2015): Mathematics at Nazarbayev University – Previously: Senior Consultant/Data Analyst @ Management consulting at KPMG Central Asia – Current: Community Manager @ International Data Engineering and Science Association (IDEAS) Data source:  A GPS navigation mobile application  Provide real-time directions and up-to-date information  Traffic  Accidents  Road closure  Weather hazards  Lurking police vehicles and etc.
  4. 4. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Introduction Data source:  Navigation app traffic data set from LA City Department*  Information reported by users - Alerts  information captured by user’s device - Jams  We are going to find out:  Areas with high volume of traffic (geography)  Peak-hours  Density of Alerts and Incidents  Traffic volume by road types  Prediction of traffic jam *Limited authorization to access the full datasets 100 GB + original; we used limited dataset to 9 days (Dec 31– Jan 8, 2018) ~2GB
  5. 5. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Contents  Introduction  H/W Specification  Architecture Chart  Implementation steps  Data structure  Analysis  Prediction  Summary
  6. 6. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA H/W Specification Number of nodes 6 OCPUs 12 CPU speed 2195.196MHz Memory 180 GB Storage 682 GB
  7. 7. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Architecture Chart Source: Hadoop Masterclass Part 4 of 4: Analyzing Big Data Lars George | EMEA Chief Architect Cloudera
  8. 8. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Implementation steps Local Computer Raw data files (JSON) Geo-Spatial Visualization (3D map) Dashboard for Analytics Hadoop/Hive Upload dataset to HDFS Parse JSON files using Pandas Create tables’ schema Clean data Create sample/summary dataset for prediction and visualization Microsoft Azure ML Studio Upload sample dataset Apply data transformation Split dataset for training and scoring Train model(s) Evaluate model(s)
  9. 9. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Data structure
  10. 10. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Contents  Introduction  H/W Specification  Architecture Chart  Implementation steps  Data structure  Analysis  Prediction  Summary
  11. 11. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Analysis  Information we are using:  Location/Time  Level of traffic intensity  X and Y coordinates (Longitude & Latitude)  Counts of jams/alerts  Tools we are using:  Excel - 3D map  Power BI - Flow map, pie charts, bar charts  What we are predicting:  Level of traffic (1 to 3 – light, medium, heavy)  Based on date, time, location
  12. 12. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Traffic in LA (captured from users' devices)
  13. 13. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Traffic in LA (reported by app users)
  14. 14. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Video-Simulation of Traffic in LA (captured from users' devices)
  15. 15. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Video-Simulation of Traffic in LA (reported by app users)
  16. 16. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Traffic Analysis Dashboard Peak Peak
  17. 17. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Traffic Analysis Dashboard Major areas of traffic are: Downtown Los Angeles, Santa Monica, Hollywood, and highways.
  18. 18. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Contents  Introduction  H/W Specification  Architecture Chart  Implementation steps  Data structure  Analysis  Prediction  Summary
  19. 19. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Prediction of traffic congestion with Machine Learning Data preparation Group label values Join additional dataset Apply data transformation Normalize data Model building Model(s) selection Cross Validation Train model(s) Model evaluation Score model Evaluate model (Accuracy, Recall)
  20. 20. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Features/columns in a dataset location x, location y X and Y -coordinate of location date_pst Pacific Time of the publication of traffic report level jam level, where 1 – almost no jam and 5 – standstill jam speed driver’s captured speed in mph length length of the traffic ahead in the route of user in meters *date_pst *date splits into month, day, hour, min, sec, weekday
  21. 21. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Data transformation  Randomly selected data ~ 100MB  Select relevant features  Group level into 2 classes (label: 0 & 1)  Join holidays dataset  Add attribute is_holiday (0 or 1)  Change cyclical attributes from Polar coordinates to Cartesian  Add is_rush, is_weekend (0 or 1)  Normalize features  Make categorical: is_rush, is_holiday, is_weekend, label
  22. 22. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA SELECT location_x, location_y, SIN((weekday)*(2*PI()/7)) as sin_weekday, COS((weekday)*(2*PI()/7)) as cos_weekday, SIN((month-1)*(2*PI()/12)) as sin_month, COS((month-1)*(2*PI()/12)) as cos_month, SIN((day-1)*(2*PI()/31)) as sin_day, COS((day- 1)*(2*PI()/31)) as cos_day, SIN(hour*(2*PI()/24)) as sin_hour, COS(hour*(2*PI()/24)) as cos_hour, SIN(min*(2*PI()/60)) as sin_min, COS(min*(2*PI()/60)) as cos_min , SIN(sec*(2*PI()/60)) as sin_sec, COS(sec*(2*PI()/60)) as cos_sec, …
  23. 23. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA MODEL Evaluation Model Accuracy Precision Recall AUC ROC LR 0.662 0.662 1.0 0.571 BDT 0.805 0.832 0.884 0.868 DF 0.832 0.868 0.880 0.885
  24. 24. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Summary of Traffic Prediction with Machine Learning  Model is based on sampled dataset ~ 1M rows (100 MB)  Best model - Decision Forest  Accuracy – 0.832  Precision - 0.868  Recall - 0.880  Area under the Curve – 0.885 Confusion Matrix
  25. 25. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Contents  Introduction  H/W Specification  Architecture Chart  Implementation steps  Data structure  Analysis  Prediction  Summary
  26. 26. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Summary Denser traffic on Freeways 101, 405, 10 Rush hours from 7 am to 9 am produce a lot of traffic, the heaviest traffic time start from 3pm and gets better after 6pm. Major areas of traffic in DTLA, Santa Monica, Hollywood More insights can be found with bigger dataset using this framework for analysis of traffic Using such data and platform can also give an opportunity to predict traffic congestions. Prediction can be performed using machine learning algorithm – Decision Forest with the accuracy of 83% for predicting the heaviest traffic jam.
  27. 27. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Questions?
  28. 28. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA References 1. J. Barbaresso, G. Cordahi, D. Garcia et al., “USDOT’s Intelligent Transportation Systems (ITS) ITS Strategic Plan 2015- 2019,” 2014. 2. “Integrated Corridor Management,” Intelligent Transportation Systems - Integrated Corridor Management, www.its.dot.gov/research_archives/icms/. Accessed April 14, 2019. 3. J. Kestelyn, “Real-Time Data Visualization and Machine Learning for London Traffic Analysis,” Google Cloud, 2016, cloud.google.com/blog/products/gcp/real-time-data-visualization-and-machine-learning-for-london- traffic-analysis. Accessed April 14, 2019. 4. “Connected Citizens by Waze,” Waze, www.waze.com/ccp. Accessed April 14, 2019. 5. M. Schnuerle, “Louisville and Waze: Applying Mobility Data in Cities,” Harvard Civic Analytics Network Summit on Data-Smart Government, 2017. 6. Louisville Metro. “Thunder Jams, 2017 Traffic Delays.” CARTO, louisvillemetro- ms.carto.com/builder/d98732d0-1f6a-4db2-9f8a-e58026bf0d39/embed. Accessed April 14, 2019. 7. Louisville Metro. “Pothole Animation.” CARTO, cdolabs-admin.carto.com/builder/a80f62bf-98e1-4591-8354- acfa8e51a8de/embed. Accessed April 14, 2019. 8. E. Necula, “Analyzing Traffic Patterns on Street Segments Based on GPS Data Using R,” Transportation Research Procedia, Vol. 10, pp. 276–285, 2015.
  29. 29. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA References 9. J. Woo and Y. Xu, “Market Basket Analysis Algorithm with Map/Reduce of Cloud Computing,” in Proc. of International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), Las Vegas. 2011. 10. “Pandas.io.json.json_normalize.” Pandas.io.json.json_normalize - Pandas 0.24.2 Documentation, pandas.pydata.org/pandas-docs/stable/reference/api/pandas.io.json.json_normalize.html. Accessed April 14, 2019. 11. United States, Chief Executive Office County of Los Angeles. “Cities within the County of Los Angeles.” lacounty.gov. Accessed April 14, 2019. 12. Garyericson. “What Is - Azure Machine Learning Studio.” Microsoft Docs, docs.microsoft.com/en- us/azure/machine-learning/studio/what-is-ml-studio. Accessed April 14, 2019. 13. A. Tharwat, “Classification Assessment Methods.” Applied Computing and Informatics, 2018. 14. M. Sokolova and L. Guy, “A Systematic Analysis of Performance Measures for Classification Tasks,” Information Processing & Management, Vol. 45. No. 4, pp. 427–437, 2009.

×