Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brodie, Kiana Roshan Zamir

Stefanie Brodie, Research Program Specialist @ DDOT
Kiana Roshan Zamir, Amanda Alvarez, The Lab @ DC
PyData DC 2018
Date: 11/17/2018

01Overview
04 Labeling Dockless Trips
03 Idle Time Analysis
02 Comparative AnalysisCONTENT
Page 2
05 Hotspot Analysis
06 Visualizations
*Pictures are taken from various online resources including : https://mobilitylab.org

1.1 Dockless [bike]share in DC
Page 4*Pictures are taken from various online resources including : https://www.bikeshare.cc

1.2 Dockless [bike]share in DC
Page 5
ü Address active transportation goals
ü Ensure a safe transportation system
ü Receive public input
ü Manage bikesharing for the District

1.3 Data Science in Public Service

1.4 Our Analyses
Page 7
ü  How does dockless compare to Capital Bikeshare?
ü  What is the impact on public space?
ü  Are travelers’ behavior patterns different?
ü  Can we identify locations for more parking infrastructure?
ü  Where are trips coming from and going to?

2.1 Number of Trips
Page 9
Monthly trips on Capital Bikeshare and dockless vehicles

2.2 Distribution of Trips
Page 10
Heat map matrix showing percentage of trips across wards
Capital Bikeshare Dockless Vehicles

2.3 Temporal Usage
Page 11
Usage trends over the course of the day for different vehicle types
Weekdays Weekend

2.4 Temporal and Spatial Usage for Capital Bikeshare
Page 12
Usage trends over morning peak (7AM-9AM)
Capital Bikeshare Casual RidersCapital Bikeshare Member Riders

2.4 Temporal and Spatial Usage for Dockless Vehicles
Page 13
Usage trends over morning peak (7AM-9AM)
Dockless ScootersDockless Bikes

2.5 Day of Week Distribution
Page 14
Usage trends over the course of the week by vehicle type

2.6 Duration of Trips
Page 15
Boxplot of trip duration by vehicle type

3.1 Introduction
Page 17
Idle time (time the vehicle is parked between trips) interests both
DDOT’s and operators
ü  Identify locations that have high idle time (public complaints)
ü  Assess operators’ performance
ü  Help to locate parking facilities
ü  Rebalance dockless vehicles ü  matplotlib
ü  geopandas
ü  mpl_toolkits
ü  shiny
ü  leaflet
ü  leaflet.extras

3.2 Methodology
Page 18
ü  To find consecutive trips, trips were sorted based on their
operator, vehicle id, starting time, and ending time.
ü  To make sure this procedure does not assign an irrelevant trip as
the next trip, a cleaning process is performed.
ü  The idle time analysis is done based on the remaining trips (67
and 71 percent of initial data respectively for dockless scooters
and bikes)

3.3 Effect of Time of Day on Idle Duration
Page 19

3.4 Percent of Trips with Idle Time >6 hours – Bikes
Page 20

3.5 Percent of Trips with Idle Time >6 hours – Scooters
Page 21

3.6 Interactive Visualization
Page 22
ü  Locations with idle time more than 6 hours
https://kianarz.shinyapps.io/idle_time/

4.1 Introduction
Page 24
ü  Is the behavior of dockless bikeshare and scootershare users in
DC more closely related to the Capital Bikeshare casual users’ or
members’?
ü  Developed a classification model based on Capital Bikeshare data
to categorize the trips taken by dockless users.
ü  Sklearn.linear_model
ü  Sklearn.ensemble

4.2 Methodology
Page 25
Random forest classifier consists of a
collection of tree-structured classifiers
created by randomly selecting features
for randomly selected training data set.
Logistic regression is a statistical model
that relates a dependent variable to
independent variables by probability
estimation using a logit function.
Random Forest and Logistic Regression
*Pictures are taken from various online resources

4.3 Data and Feature Engineering
Page 26
Dataset
ü  Capital Bikeshare (March-May 2017-2018): 1,407,633 trips
After downsampling:772,344
ü  Dockless bikes (March-May 2018): 71,590 trips
ü  Dockless scooters (March-May 2018): 187,911 trips
Features
ü  Start/end location (Single Member Districts(SMD) level)
ü  Duration of trips
ü  Start time of the trips
ü  Day of the week

4.4 Evaluation
Page 27
Where:
Precision, Recall, and F1-scores for Logistic Regression and Random Forest (test Set)

4.5 Results
Page 28
Confusion matrix for logistic regression and random forest (Test data set)
Model prediction for dockless bikes and dockless scooters

5.1 Introduction
Page 30
ü  Improper parking in the public right-of-way is a major concern
for dockless vehicles. Vehicles may be parked in pedestrian and
frontage zones, inhibiting pedestrians and making disabled
transportation impossible
ü  To locate bike parking hotspots, Density Based Spatial Clustering
algorithm (DBSCAN) was used for clustering end trips of dockless
bikes
ü  Sklearn.cluster
ü  Plotly
ü  shapely

5.2 Methodology
Page 31
DBSCAN is a density based clustering algorithm that groups points
in proximity to each other based on a distance measurement
(Epsilon, ε distance) and a minimum number of points (Minimum
points). It does not require every point to be assigned to a cluster.
*Pictures are taken from: http://quipu-strands.blogspot.com

5.3 Parameters
Page 32
Epsilon
ü  Distance between two points to be
considered as neighbors.
Minimum Points
ü  The minimum number of points within
Epsilon distance (neighbor points) to
form a dense region (cluster.)
Min points = 6
ε

5.4 Sensitivity Analysis on Parameters
Page 33
Epsilon = 50 , Min points = 50 Epsilon = 50 , Min points =100
Epsilon
10 meters
25 meters
50 meters
Minimum points
25 end points 100 end points 175 end points
50 end points 125 end points 200 end points
75 end points 150 end points

5.5 Results
Page 34
Epsilon (meters)-Minimum points Ward1 Ward2 Ward3 Ward4 Ward5 Ward6 Ward7 Ward8
10-25 33 300 2 2 11 74 2 0
10-50 3 73 0 1 2 13 1 0
10-75 0 22 0 0 0 7 0 0
10-100 0 14 0 0 0 2 0 0
10-125 0 4 0 0 0 0 0 0
10-150 0 2 0 0 0 0 0 0
10-175 0 1 0 0 0 0 0 0
10-200 0 1 0 0 0 0 0 0
25-10 179 234 38 66 174 411 16 20
25-15 125 225 19 28 105 321 7 5
25-25 64 221 9 10 44 218 3 1
25-50 31 215 1 1 12 74 1 0
25-75 15 161 1 1 1 44 1 0
25-100 7 104 1 1 0 19 1 0
25-125 3 63 0 0 0 11 0 0
25-150 3 44 0 0 0 7 0 0
25-175 0 30 0 0 0 6 0 0
25-200 0 17 0 0 0 4 0 0
50-25 49 52 14 20 60 119 5 3
50-50 27 56 3 4 24 89 1 1
50-75 10 63 1 3 11 59 1 0
50-100 9 67 1 1 4 50 1 0
50-125 12 75 1 1 1 30 1 0
50-150 12 72 1 1 1 27 0 0
50-175 9 77 0 0 0 21 0 0
50-200 5 58 0 0 0 18 0 0

5.6 Illustration of Results
Page 35
ü Locations with high number of parking events:
https://plot.ly/~kianarz/155/trip-ends-and-density-based-spatial-clustering-
of-parking-events/

5.7 Hexagonal Binning
Page 36
ü  Visualizing the data distribution is hard
when you have a dataset with a large
number of points. Many of the data points
can overlap.
ü  Hexagonal binning is a technique of data
aggregation.
•  Hexagons are used to make a grid over the
plane.
•  The number of points falling inside of each
hexagon is counted and then used to create a
heat map.
*Picture is taken from: https://www.meccanismocomplesso.org

6.1 Interactive Visualization
Page 39
ü  Heat map of start location of trips versus time of the day
https://kianarz.shinyapps.io/docklessdata/
ü  Chord plot of wards
https://kianarz.shinyapps.io/Wards/

Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brodie, Kiana Roshan Zamir

Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brodie, Kiana Roshan Zamir

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (14)

Similar a Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brodie, Kiana Roshan Zamir

Similar a Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brodie, Kiana Roshan Zamir (20)

Más de PyData

Más de PyData (20)

Último

Último (20)

Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brodie, Kiana Roshan Zamir