In September 2017, dockless bikeshare joined the transportation options in the District of Columbia. In March 2018, scooter share followed. During the pilot of these technologies, Python has helped District Department of Transportation answer some critical questions. This talk will discuss how Python was used to answer research questions and how it supported the evaluation of this demonstration.
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brodie, Kiana Roshan Zamir
1. Stefanie Brodie, Research Program Specialist @ DDOT
Kiana Roshan Zamir, Amanda Alvarez, The Lab @ DC
PyData DC 2018
Date: 11/17/2018
2. 01Overview
04 Labeling Dockless Trips
03 Idle Time Analysis
02 Comparative AnalysisCONTENT
Page 2
05 Hotspot Analysis
06 Visualizations
*Pictures are taken from various online resources including : https://mobilitylab.org
4. 1.1 Dockless [bike]share in DC
Page 4*Pictures are taken from various online resources including : https://www.bikeshare.cc
5. 1.2 Dockless [bike]share in DC
Page 5
ü Address active transportation goals
ü Ensure a safe transportation system
ü Receive public input
ü Manage bikesharing for the District
7. 1.4 Our Analyses
Page 7
ü How does dockless compare to Capital Bikeshare?
ü What is the impact on public space?
ü Are travelers’ behavior patterns different?
ü Can we identify locations for more parking infrastructure?
ü Where are trips coming from and going to?
9. 2.1 Number of Trips
Page 9
Monthly trips on Capital Bikeshare and dockless vehicles
10. 2.2 Distribution of Trips
Page 10
Heat map matrix showing percentage of trips across wards
Capital Bikeshare Dockless Vehicles
11. 2.3 Temporal Usage
Page 11
Usage trends over the course of the day for different vehicle types
Weekdays Weekend
12. 2.4 Temporal and Spatial Usage for Capital Bikeshare
Page 12
Usage trends over morning peak (7AM-9AM)
Capital Bikeshare Casual RidersCapital Bikeshare Member Riders
13. 2.4 Temporal and Spatial Usage for Dockless Vehicles
Page 13
Usage trends over morning peak (7AM-9AM)
Dockless ScootersDockless Bikes
14. 2.5 Day of Week Distribution
Page 14
Usage trends over the course of the week by vehicle type
15. 2.6 Duration of Trips
Page 15
Boxplot of trip duration by vehicle type
17. 3.1 Introduction
Page 17
Idle time (time the vehicle is parked between trips) interests both
DDOT’s and operators
ü Identify locations that have high idle time (public complaints)
ü Assess operators’ performance
ü Help to locate parking facilities
ü Rebalance dockless vehicles ü matplotlib
ü geopandas
ü mpl_toolkits
ü shiny
ü leaflet
ü leaflet.extras
18. 3.2 Methodology
Page 18
ü To find consecutive trips, trips were sorted based on their
operator, vehicle id, starting time, and ending time.
ü To make sure this procedure does not assign an irrelevant trip as
the next trip, a cleaning process is performed.
ü The idle time analysis is done based on the remaining trips (67
and 71 percent of initial data respectively for dockless scooters
and bikes)
24. 4.1 Introduction
Page 24
ü Is the behavior of dockless bikeshare and scootershare users in
DC more closely related to the Capital Bikeshare casual users’ or
members’?
ü Developed a classification model based on Capital Bikeshare data
to categorize the trips taken by dockless users.
ü Sklearn.linear_model
ü Sklearn.ensemble
25. 4.2 Methodology
Page 25
Random forest classifier consists of a
collection of tree-structured classifiers
created by randomly selecting features
for randomly selected training data set.
Logistic regression is a statistical model
that relates a dependent variable to
independent variables by probability
estimation using a logit function.
Random Forest and Logistic Regression
*Pictures are taken from various online resources
26. 4.3 Data and Feature Engineering
Page 26
Dataset
ü Capital Bikeshare (March-May 2017-2018): 1,407,633 trips
After downsampling:772,344
ü Dockless bikes (March-May 2018): 71,590 trips
ü Dockless scooters (March-May 2018): 187,911 trips
Features
ü Start/end location (Single Member Districts(SMD) level)
ü Duration of trips
ü Start time of the trips
ü Day of the week
28. 4.5 Results
Page 28
Confusion matrix for logistic regression and random forest (Test data set)
Model prediction for dockless bikes and dockless scooters
30. 5.1 Introduction
Page 30
ü Improper parking in the public right-of-way is a major concern
for dockless vehicles. Vehicles may be parked in pedestrian and
frontage zones, inhibiting pedestrians and making disabled
transportation impossible
ü To locate bike parking hotspots, Density Based Spatial Clustering
algorithm (DBSCAN) was used for clustering end trips of dockless
bikes
ü Sklearn.cluster
ü Plotly
ü shapely
31. 5.2 Methodology
Page 31
DBSCAN is a density based clustering algorithm that groups points
in proximity to each other based on a distance measurement
(Epsilon, ε distance) and a minimum number of points (Minimum
points). It does not require every point to be assigned to a cluster.
*Pictures are taken from: http://quipu-strands.blogspot.com
32. 5.3 Parameters
Page 32
Epsilon
ü Distance between two points to be
considered as neighbors.
Minimum Points
ü The minimum number of points within
Epsilon distance (neighbor points) to
form a dense region (cluster.)
Min points = 6
ε
33. 5.4 Sensitivity Analysis on Parameters
Page 33
Epsilon = 50 , Min points = 50 Epsilon = 50 , Min points =100
Epsilon
10 meters
25 meters
50 meters
Minimum points
25 end points 100 end points 175 end points
50 end points 125 end points 200 end points
75 end points 150 end points
35. 5.6 Illustration of Results
Page 35
ü Locations with high number of parking events:
https://plot.ly/~kianarz/155/trip-ends-and-density-based-spatial-clustering-
of-parking-events/
36. 5.7 Hexagonal Binning
Page 36
ü Visualizing the data distribution is hard
when you have a dataset with a large
number of points. Many of the data points
can overlap.
ü Hexagonal binning is a technique of data
aggregation.
• Hexagons are used to make a grid over the
plane.
• The number of points falling inside of each
hexagon is counted and then used to create a
heat map.
*Picture is taken from: https://www.meccanismocomplesso.org
39. 6.1 Interactive Visualization
Page 39
ü Heat map of start location of trips versus time of the day
https://kianarz.shinyapps.io/docklessdata/
ü Chord plot of wards
https://kianarz.shinyapps.io/Wards/