In this presentation, I describe a system that uses crowdsourcing, computer vision, machine learning, and Google Street View to collect sidewalk accessibility data.
7. Characterizing Sidewalk
Accessibility at Scale
using Google Street View, Crowdsourcing, and
Automated Methods
Kotaro Hara | Project Sidewalk (PI: Prof. Jon Froehlich)
makeability lab
16. The lack of street-level
accessibility information can
have a significant impact on
the independence and
mobility of citizens
cf. Nuernberger, 2008; Thapar et al., 2004
27. Amazon Mechanical Turk is an online labor market
where you can hire workers to complete small tasks
28.
29. Task: Find the company name from an email domain
$0.02 per task
Task interface
30. Timer: 00:07:00 of 3 hours
University of Maryland: Help make our sidewalks more accessible for wheelchair users with Google Maps
Kotaro Hara 10 3 hours
Crowdsourcing Data Collection
Hara K., Le V., and Froehlich J.E [ASSETS2012, CHI2013]
Crowdsourcing | Image Labeling
42. Computer vision + verification is
cheaper but less accurate
Manual labeling is accurate,
but labor intensive
Design Principles
43. Computer vision + verification is
cheaper but less accurate
(not true for easy tasks)
Manual labeling is accurate,
but labor intensive
Design Principles
63. Washington D.C. Baltimore Los Angeles Saskatoon
Total Area:11.3 km2
Intersections: 1,086
Curb Ramps: 2,877
Missing Curb Ramps:647
Avg. GSV Data Age:2.2 yr*
* At the time of downloading data in summer 2013
Scraper
64. How well does GSV data reflect
the current state of the physical
world?
75. Deformable Part Models
Felzenszwalb et al. 2008
Automatic Curb Ramp Detection
http://www.cs.berkeley.edu/~rbg/latent/
Root filter Parts filter Displacement cost
76. Automatic Curb Ramp Detection
Multiple redundant
detection boxes
Detected Labels
Stage 1: Deformable Part Model
Correct 1
False Positive 12
Miss 0
77. Automatic Curb Ramp Detection
Curb ramps shouldn’t be
in the sky or on roofs
Correct 1
False Positive 12
Miss 0
Detected Labels
Stage 1: Deformable Part Model
79. Automatic Curb Ramp Detection
Detected Labels
Stage 3: SVM-based Refinement
Filter out labels based on
their size, color, and position.
Correct 1
False Positive 5
Miss 0
89. Occlusion Illumination
Scale Viewpoint Variation
Structures Similar to Curb Ramps Curb Ramp Design Variation
Automatic Curb Ramp Detection
CURB RAMP DETECTION IS A HARD PROBLEM
92. Automatic Task Allocation | Features to Assess Scene Difficulty for CV
A number of streets connected in an intersection
Depth information for a road width and variance in distance
Top-down images to assess complexity of an intersection
A number of detections and confidence values
93. Automatic Task Allocation | Features to Assess Scene Difficulty for CV
A number of street from metadata
Depth information to assess a road width and variance in distance
Top-down images to assess complexity of an intersection
A number of detections and confidence values
94. Depth information for a road width and variance in distance
Automatic Task Allocation | Features to Assess Scene Difficulty for CV
95. Automatic Task Allocation | Features to Assess Scene Difficulty for CV
A number of streets from metadata
Depth information for a road width and variance in distance
Top-down images to assess complexity of an intersection
A number of detections and confidence values
96. Google Maps Styled Maps
Top-down images to assess complexity of an intersection
Automatic Task Allocation | Features to Assess Scene Difficulty for CV
97. Automatic Task Allocation | Features to Assess Scene Difficulty for CV
A number of streets from metadata
Depth information for a road width and variance in distance
Top-down images to assess complexity of an intersection
CV Output: A number of detections and confidence values
107. Recruited workers from Mturk
Used 1,046 GSV images (40 used for golden insertion)
Evaluation
STUDY METHOD: APPROACH
108. RESULTS
Labeling Tasks Verification Tasks
# of distinct turkers: 242 161
1,270 582# of HITs completed:
# of tasks completed: 6,350 4,820
# of tasks allocated: 769 277
Evaluation
We used Monte Carlo simulations for evaluation
109. 84%
68%
83%
88%
58%
86%86%
63%
84%
0%
20%
40%
60%
80%
100%
AccuracyMeasures(%)
Precision Recall F-measure 94
42
81
0
20
40
60
80
100
TaskCompletionTime/Scene(s)
Accuracy
measures
Task
completion
time per scene
Manual
Labeling
CV and Manual
Verification
& Tohme
遠目 Remote Eye・ Manual
Labeling
CV and Manual
Verification
& Tohme
遠目 Remote Eye・
Evaluation | Labeling Accuracy and Time Cost
Error bars are standard deviations.
ACCURACY COST (TIME)
110. 84%
68%
83%
88%
58%
86%86%
63%
84%
0%
20%
40%
60%
80%
100%
AccuracyMeasures(%)
Precision Recall F-measure
Error bars are standard deviations.
Manual
Labeling
CV and Manual
Verification
&
94
42
81
0
20
40
60
80
100
TaskCompletionTime/Scene(s)
Manual
Labeling
CV and Manual
Verification
&
Accuracy
measures
Task
completion
time per scene
Tohme
遠目 Remote Eye・
Tohme
遠目 Remote Eye・
Evaluation | Labeling Accuracy and Time Cost
13% reduction
in cost
ACCURACY COST (TIME)
111. svControl
Automatic
Task Allocation svVerify
Manual Label
Verification
svLabel
Manual Labeling
Evaluation | Smart Task Allocator
~80% of svVerify tasks were correctly routed
~50% of svLabel tasks were correctly routed
112. svControl
Automatic
Task Allocation svVerify
Manual Label
Verification
svLabel
Manual Labeling
Evaluation | Smart Task Allocator
If svControl worked perfectly,
Tohme’s cost would drop to 28% of
a manually labelling approach
alone.
128. 8,209Intersections in DC
BACK OF THE ENVELOPE CALCULATIONS
Manually labeling GSV with our custom interfaces
would take 214 hours
With Tohme, this drops to 184 hours
We think we can do better
129. makeability lab
Smart task management can improve efficiency of
semi-automatic crowd-powered system
Takeaway
We can combine crowdsourcing and automated
methods to collect accessibility data from Street View
130. FUTURE WORK: COMPUTER VISION
Context integration & scene understanding
3D-data integration
Improve training & sample size
Mensuration
132. This work is supported by
Faculty Research Award
makeability lab
133. THE CROWD-POWERED STREETVIEW ACCESSIBILITY TEAM!
Kotaro Hara Jin Sun Victoria Le Robert Moore Sean Pannella
Jonah Chazan David Jacobs Jon Froehlich
Zachary Lawrence
Graduate Student
Undergraduate
High School
Professor
Thanks!
@kotarohara_en | kotaro@cs.umd.edu
Notas del editor
My name is Kotaro Hara. Today, I will talk about how we can use automated methods and crowdsourcing to collect accessibility information about cities
My name is Kotaro Hara. Today, I will talk about how we can use automated methods and crowdsourcing to collect accessibility information about cities
I want to tell you a story…
Imagine that you and a friend are on a walk. You’re both somewhat unfamiliar with the area.
Suddenly, in the middle of the sidewalk, you encounter a fire hydrant
-- Image Reference
http://www.iconsdb.com/black-icons/fire-hydrant-icon.html
In this case, you manage to go around because there is a driveway, but they are temporarily forced onto the street which is dangerous.
Now, you get to the end of the block and discover that there is no curb cut. You are forced to turn around and find another way.
The problem is not only the sidewalks remain inaccessible, but there are currently few mechanisms to find out about the accessibility of a route in advance
Now, you get to the end of the block and discover that there is no curb cut. You are forced to turn around and find another way.
The problem is not only the sidewalks remain inaccessible, but there are currently few mechanisms to find out about the accessibility of a route in advance
-- Quote from paper
The problem is not just that sidewalk accessibility fundamentally affects where and how people travel in cities but also that there are few, if any, mechanisms to determine accessible areas of a city a priori
-- What Jon wrote
The problem is not just that there are inaccessible areas of cities but that there are currently few methods for us to determine them a priori
Now, you get to the end of the block and discover that there is no curb cut. You are forced to turn around and find another way.
The problem is not only the sidewalks remain inaccessible, but there are currently few mechanisms to find out about the accessibility of a route in advance
-- Quote from paper
The problem is not just that sidewalk accessibility fundamentally affects where and how people travel in cities but also that there are few, if any, mechanisms to determine accessible areas of a city a priori
-- What Jon wrote
The problem is not just that there are inaccessible areas of cities but that there are currently few methods for us to determine them a priori
According to the most recent US Census (2010), roughly 30.6 million adults have physical disabilities that affect their ambulatory activities [128].
-----
Flickr: 3627562740_c74f7bfb82_o.jpg
Of these, nearly half report using an assistive aid such as a wheelchair (3.6 million) or a cane, crutches, or walker (11.6 million)
内閣府のデータでは日本では総数366.3万人。
----
Flickr: 14816521847_5c3c7af348_o.jpg
Despite comprehensive civil rights legislation for Americans with disabilities (e.g., [9,75]), many city streets, sidewalks, and businesses in the US remain inaccessible [90,96,120].
The lack of street-level accessibility information can have a significant negative impact on the independence and mobility of citizens [99,120].
99: Nuernberger, A. (2008). Presenting accessibility to mobility-impaired travelers. (Doctoral dissertation,
University of California, Berkeley).
120: Thapar, N., Warner, G., Drainoni, M., Williams, S., Ditchfield, H., Wierbicky, J., & Nesathurai, S.
(2004). A pilot of functional access to public buildings and facilities for persons with
impairments. Disability and Rehabilitation, 26(5), 280-9.
So we would like to develop technologies such as an accessibility aware navigation system. It shows an accessible path instead of a shortest path based on your mobility level.
We also want to build an application that allows you to visualize the accessibility of a city. You can quickly compare which area of a city is more accessible. We need geo-data to make these.
To do this, we need a lot of data about accessibility. Our group’s goal is to collect and deliver street-level accessibility data for every city in the world.
-- Image
http://www.flickr.com/photos/rgb12/6225459696/lightbox/
Traditionally, information about a neighborhood have been gathered by volunteers or government organizations through physical audit.
However, this is time-consuming and expensive.
Mobile crowdsourcing such as SeeClickFix.com
Mobile crowdsourcing such as SeeClickFix.com
And NYC 311 allows citizens to report neighborhood sidewalk accessibility issues.
But this requires people to be on-site
Our approach is different though complementary. Use Google Street View as a massive data source…
Today, I am going to talk about how we can use crowdsourcing and automated methods to collect accessibility data Google Street View.
Amazon Mechanical Turk is an online labor market where you can hire workers to complete small tasks.
For example, if you are a worker, you can go to Amazon’s website to browse through available tasks
Choose one of the tasks. For example, this task is about finding the company name from an email domain. You can get 2 cents for completing a task through this web interface.
We recruit crowd worker from Amazon Mechanical Turk. For those of you who don’t know Mechanical Turk, it is an online labor market where you can work or recruit workers to perform small tasks over the Internet.
Using this platform, we recruit workers to work on our task. We developed this interface where you can see Google Street View imagey, and label, in this case, an obstacle in path.
We showed that this is an effective method, but it is labor intensive.
We showed that this is an effective method, but it is labor intensive.
To more efficiently find accessibility attributes, we turned to computer vision, which is used for applications like face detection.
Different attributes affect sidewalk accessibility for people with mobility impairment. For example, presence of curb ramps, surface conditions, obstacles, steep gradients, and more.
And removed even more errors
And removed even more errors
Computer vision is not perfect. And there are false positives, which can be fixed by verification. It misses curb ramps, and humans need to label these.
Here you see detected curb ramps as green boxes on top of the Street View image (to the next slide to play).
The question is, can we achieve same or better accuracy as a system with a lower time cost compared to manual labeling.
5 min
To do this, we developed a system called Tohme. It combines the two approach.
This is the overview of the system. A custom web scraper that collects dataset including Street View images. A computer vision based detector finds curb ramps.
So we designed a smart task allocator.
It routes detection results to a cheap manual verification workflow to remove false positive errors. However, since our verification task disallow workers to fix the false negatives, curb ramps that are missed never get detected.
So if the allocator predicts false negative, it then passes tasks to manual labeling workflow.
We get a Street View image.
We run a detector
Then extract features.
Our task allocator predicts presence of false negatives. If it predicts no false negative, then it allocates a task to a verification workflow.
Our task allocator predicts presence of false negatives. If it predicts no false negative, then it allocates a task to a verification workflow.
Another example.
Run a detector
Extract features.
If the allocator predicts false negative, then it passes the task to the labeling workflow.
If the allocator predicts false negative, then it passes the task to the labeling workflow.
Let’s first talk about our web scraper
Let’s first talk about our web scraper
We scraped GSV panoramas and metadata from the intersections. We also scraped their accompanying 3-d point cloud data. As well as top-down Google Maps imagery. These datasets are used to train automatic task allocator.
_AUz5cV_ofocoDbesxY3Kw
-dlUzxwCI_-k5RbGw6IlEg
0C6PG3Zpuwz11kZKfG_vUg
D-2VNbhqOqYAKTU0hFneIw
Because sidewalk infrastructure can vary in design and appearance across cities and countries, we included 4 regions including Washington DC, Baltimore, Los Angeles, and Saskatoon.
We also looked at different types of city areas.
Blue regions represent dense urban areas, and red regions represent residential area.
In all, we had 11.3 square kilometers. There were 1,086 intersections. We found 2,877 curb ramps and 647 missing curb ramps based on the ground truth data. Average Street View image age was 2.2 years old.
(pause) But how well does Street View data reflect the current state of curb ramp infrastructure.
To answer this question, we compared Street View intersections with physical intersections
To answer this question, we compared Street View intersections with physical intersections
First, we physically visited intersections and took multiple pictures.
The areas included four subset regions, and it consisted of 273 intersections.
We then counted the numbers of curb ramps and missing curb ramps in both dataset, and evaluate their concordance.
As a result, we observed over 97% agreement between Google Street View and the real world. A small disagreement due to construction.
Moving on to our dataset
Moving on to our dataset
Moving on to our dataset
To train and evaluate our computer vision program, 2 members of our research team manually labeled curb ramps in Street View images. In total, we collected 2,877 curb ramp labels.
To train and evaluate our computer vision program, 2 members of our research team manually labeled curb ramps in Street View images. In total, we collected 2,877 curb ramp labels.
Our computer vision component has three parts.
Our computer vision component has three parts.
We experimented with various object detection. We chose to build it on top of a framework called DPM, one of the most successful approaches in object detection.
DPM models a target object and its parts with histogram of gradient features. It also models the spatial relationship between the parts.
DPM sweeps through an entire image, and detects areas that look like a curb ramp. Detections are shown in red boxes. Numbers of correct detections and errors are shown in this table. There are some redundant labels such as overlapping boxes.
h7ZW0_VasRt3vhevz1mjeg
And there shouldn’t be curb ramps in the sky.
h7ZW0_VasRt3vhevz1mjeg
We use non-maxima suppression to remove overlapping labels, and 3D point cloud data to remove curb ramps that are not on ground level. Note, that this 3D data is coarse we cannot identify detailed structure of curb ramps.
h7ZW0_VasRt3vhevz1mjeg
We get a cleaner result, but we still have some errors. We try to remove them by utilizing other information such as size of a bounding box and RGB information.
h7ZW0_VasRt3vhevz1mjeg
This is the final result with computer vision alone.
h7ZW0_VasRt3vhevz1mjeg
I will talk about how we can combine crowdsourcing and automated methods to collect curb ramp data from Google Street View efficiently.
Today, how algorithmic work management plays a role in this process.
And removed even more errors
Our curve is less ideal
For our system, we set the confidence threshold to emphasize higher recall than higher precision because false positives are easier to correct
We observed various image properties that could cause computer vision to make errors. Including occlusion, illumination, scale, view point variation, structures similar to curb ramps, and variation in design of curb ramps.
That’s what we do with the task allocator.
That’s what we do with the task allocator.
We used following features.
To assess complexity of intersections, we used street cardinality in the meta data.
Depth data
It allows us to estimate a size of a street, which is useful because further the curb ramp, harder to detect.
We also assessed the complexity of each intersection with top-down imagery.
Because looks of curb ramps vary more in irregular intersections, computer vision tend to fail finding curb ramps. For example, the intersection on the right is arguably more complex than the one on the left.
We also used the number of detection boxes, their positions, and confidence to see how confused the computer vision program was.
Our manual labeling tool allows people to control a viewing angle. You select the curb ramp button at the top, and label the target. We collect outline labels of curb ramps to collect rich data to train computer vision.
Let’s talk about the verification task
Let’s talk about the verification task
Here you see detected curb ramps as green boxes on top of the Street View image (to the next slide to play).
The question is, can we achieve same or better accuracy as a system with a lower time cost compared to manual labeling.
We compare the performance of manual labeling without smart task allocation, computer vision plus verification without smart task allocation, and finally Tohme.
We measured accuracy and average task completion time of each workflow.
Turkers completed over 6,300 labeling tasks and 4,800 verification tasks and we used monte carlo simulations for evaluation
On the left, I show accuracy. On the right, I show cost. We want accuracy to be high, and cost to be low.
On the left, I show accuracy. On the right, I show cost. We want accuracy to be high, and cost to be low.
For manual labeling approach alone, our accuracy measures are 84 – 86%. 94 seconds per intersection
For CV + manual verification, our results dropped substantially but so did the time cost by more than half
So, now, for Tohme, here we saw similar accuracies to the manual baseline approach
217 of 277 tasks correctly routed to svVerify
We compare the performance of manual labeling without smart task allocation, computer vision plus verification without smart task allocation, and finally Tohme.
We compare the performance of manual labeling without smart task allocation, computer vision plus verification without smart task allocation, and finally Tohme.
We measured accuracy and average task completion time of each workflow.
We recruited multiple workers to work on labeling tasks and verification tasks. We evaluated the result with Monte Carlo simulation.
Let’s see how turkers labeled.
In general, their labels were high quality
In general, their labels were high quality
Even with a difficult scene with shadows, they labeled correctly most of the times.
Even with a difficult scene with shadows, they labeled correctly most of the times.
But some times there were errors.
For example this person labeled a drive way as a curb ramp.
And some was a little lazy.
And labeled two curb ramps with a single label.
Here are some examples.
Here are some examples.
With only computer vision, there are false positive detections.
With human verification, errors get corrected.
Based on the shapefile downloaded from data.dc.gov, there are 8,209 intersections in DC
Manual labeling: 94s per intersection * 8,209 intersections =
Tohme: 81 s per intersection
----
Source:
http://data.dc.gov/Metadata.aspx?id=2106
Based on the shapefile downloaded from data.dc.gov, there are 8,209 intersections in DC
Manual labeling: 94s per intersection * 8,209 intersections =
Tohme: 81 s per intersection
----
Source:
http://data.dc.gov/Metadata.aspx?id=2106
(i) Context integration. While we use some context information in Tohme (e.g., 3D-depth data, intersection complexity inference), we are exploring methods to include broader contextual cues about buildings, traffic signal poles, crosswalks, and pedestrians as well as the precise location of corners from top-down map imagery.
(ii) 3D-data integration. Due to low-resolution and noise, we currently use 3D-point cloud data as a ground plane mask rather than as a feature to our CV algorithms. We plan to explore approaches that combine the 3D and 2D imagery to increase scene structure understanding (e.g., [28]). If higher resolution depth data becomes available, this may be useful to directly detect the presence of a curb or corner, which would likely improve our results.
(iii) Training. Our CV algorithms are currently trained using GSV scenes from all eight city regions in our dataset. Given the variation in curb ramp appearance across geographic areas, we expect that performance could be improved if we trained and tested per city.