Recognition of road markings from street-level panoramic images for automated map generation.

Disclaimer
The Department of Electrical Engineering of the Eindhoven University of Technology
accepts no re ponsibility for the contents of M.Sc. theses or practical training reports
Department of Electrical Engineering
Den Dolech 2, 5612 AZ Eindhoven
P.O. Box 513, 5600 MB Eindhoven
The Netherlands
http://w3.ele.tue.nl/nl/
Series title:
Master graduation paper,
Electrical Engineering
Commissioned by Professor:
Group / Chair:
Date of final presentation:
Report number:
by
Author:
Prof. dr. ir. P.H.N. de With
SPS
Recognition of road markings from
street-level panoramic images for
automated map generation
May 22, 2015
Internal supervisors: Prof. dr. ir. P.H.N. de With, Ir. L. Hazelhoff
T. Woudsma

Recognition of road markings from street-level
panoramic images for automated map generation
Thomas Woudsma
Department of Electrical Engineering
Eindhoven University of Technology, The Netherlands
Cyclomedia Technology B.V., The Netherlands
Email: t.woudsma@student.tue.nl
Abstract—Road-marking maps created from the automated
recognition of road markings from images can be used for the
automated inspection of markings, used by autonomous vehicles
or applied in navigation systems. This paper presents a road-
marking recognition pipeline operating on street-level panoramic
images. First, all individually images are processed in a geograph-
ical region of interest. The single-image marking recognition stage
consists of Inverse Perspective Mapping, segmentation, contour
classification, context inference and marking model evaluation.
Second, single-image detections are merged into the multi-view
positioning stage, which uses connectivity-based clustering. The
single-image stage detects 88%-97% of the pedestrian crossings,
block, give-way and stripe markings in a city environment with
ground-truth deviations below 0.5 m. Context inference signifi-
cantly improves both the detection performance and positioning
accuracy. On a large dataset of 84,387 images, the full processing
pipeline achieves detection rates of 85%, 92% and 80% for
crosswalks, block- and give-way markings, respectively, with
a positioning error smaller than 0.6 m. This shows that the
presented system is performing sufficiently well for generating
road-marking maps. Closer analysis of missed detections reveals
that the common causes are marking damage and high capture
range.
I. INTRODUCTION
Road markings provide extensive information on traffic situ-
ations and are therefore vital for traffic safety. Amongst others,
these include lane markings, give-way triangles, pedestrian
crossings, stop lines and arrows. Databases with the position
and type of the road markings can be used for various appli-
cations. For instance, this data can be supplied to a navigation
system to alert the driver of upcoming traffic hazards or allow
for more detailed route generation, e.g. truck drivers can
set up a route to avoid pedestrian crossings. The databases
can also be used for automatic quality monitoring, thereby
strongly reducing the need for manual quality supervision.
Additionally, marking situations can be checked for safety
analysis, such as markings at priority situations.
Currently, marking inspection is performed by manually
inspecting the markings on roads. Typically, such inspections
are performed reactively, e.g. after complaints of road users
or accidents. Road-marking recognition systems can help to
automate this inspection. These systems often use images
(street-level or aerial) to detect and position road markings.
Specific image analysis algorithms can be used to recognize
markings in these images. In this paper, the focus is on
marking recognition from street-level images, specifically the
(a) Abrasion (b) Occlusion (c) Shadows
Fig. 1. Three examples of road markings that are difficult to detect.
street-level panoramic images created by Cyclomedia B.V. in
the Netherlands, which are annually captured on all public
roads.
Detection and recognition of road markings from images
involves several challenges. There are numerous of factors
making the recognition difficult, such as occlusions (e.g. by
other vehicles), shadows cast by surrounding objects, varying
weather conditions (affecting lighting) and marking deterio-
ration due to abrasion from vehicles. Figure 1 shows exam-
ples of three common difficult detection situations (abrasions,
occlusions and shadows on markings). Even if the detection
succeeds in overcoming these challenges, recognition of the
specific marking types can still be very complicated because
most recognition algorithms heavily rely on accurate shape
extraction.
Further analysis of the most common road markings shows
that these markings occur in specific periodic patterns (e.g.
dashed lane markings occur at regular intervals) and occur
in groups (e.g. block markings and arrows in an exit lane).
Where individual recognition of (damaged) markings may be
complicated, modeling high-level context information/patterns
can help to improve the detection rate and recognize road
markings that are for instance partially damaged. In addition,
images are often taken at regular intervals giving redundant
marking information. In this case, the same markings are cap-
tured from different viewpoints which can improve detection
rates, if a specific marking is occluded in some images (but
not in all).
For roads in the Netherlands, which are considered here
solely, the marking design standards are managed by the
CROW [1]. They divide road markings into three categories:

(1) parallel markings (parallel with the driving direction), (2)
perpendicular markings (perpendicular to the driving direction)
and (3) symbol markings. Markings in the first category, which
are the most common, include e.g. lane markings (continuous
and dashed) and block markings. The second category includes
e.g. markings of pedestrian and bike crossings, give-way
triangles (“shark teeth”) and stop lines. The last category
consists of a wide variety of symbols such as arrows, speed
numbers, words and bike symbols, which do not occur in
periodic patterns.
This research focuses on the automated generation of
road-marking databases (type, position and orientation) in a
geographical region of interest. This implies the automated
detection, recognition and positioning of road markings. Our
research work builds upon previous work by Li et al. [2]
that has mainly concentrated on recognition of road markings
on highways. This work extends this prior research for road
markings in city and rural environments in several ways. Next
to adding support for different road types, this work also
contains significant algorithmic alterations and improvements,
of which the most important contributions at the algorithmic
level are (1) context inference using probabilistic modeling, (2)
evaluation of marking placement models to identify marking
clusters, and (3) add multi-view positioning of markings to
find real-world positions and generate marking maps.
This work resulted in a generic road-marking recognition
pipeline, which can be applied to the recognition of a wide
variety of markings (e.g. crosswalks, give-way triangles and
block markings). Furthermore, this system complements an
existing traffic-sign recognition system [3], by providing both
redundant (i.e. several situations consist of both road markings
and traffic signs) and complementary information (i.e. some
situations are indicated by only signs or markings). This
overall results in a more complete overview of high-quality
driver signaling and traffic situations. Before we present our
approach, some related work is discussed.
II. RELATED WORK
Commonly, road marking recognition is developed for Ad-
vanced Driver Assistance Systems (ADAS) or for autonomous
driving vehicles [4], using car-mounted cameras, which are
sometimes combined with LIDAR systems. As the main goal
of these ADAS systems is to aid drivers to stay in their lane, a
significant portion of the related work focuses at lane detection
[5][6]. As described by a survey paper [7], these systems
commonly follow three principal steps: (1) pre-processing to
remove noise and other unwanted image data, (2) feature
extraction to find relevant parts such as edges, (3) model fitting
to verify the detected markings and to remove false positives.
Recognition of several marking types is e.g. performed by
Foucher et. al. [8], who present a system for the recognition
of pedestrian crossings and arrows. After segmentation, the
authors identify crossings based on the mutual relations be-
tween the connected components from the segmentation mask,
where crosswalks are identified if the segments meet certain
conditions. To recognize arrows, the connected components
are compared to 63 models of arrows. Experimental results
on a dataset containing 165 crosswalks and 151 arrows show
a true positive rate of 90% and 78%, respectively.
Li et al. [9] follow a similar approach for the recognition
of crosswalks, stop lines and lane markings. After filtering
the segmentation result with directed morphological opera-
tions, all connected components are analyzed, and the target
markings are identified based on angular orientation and blob
dimensions. Although no quantitative results are available,
qualitative recognition results on urban images are presented.
Qin et al. [10] present a general framework for road-
marking detection and analysis. After Inverse Perspective
Mapping (IPM), segmentation and contour extraction, the
framework is split into different modules for specific markings
(lanes, arrows, crosswalks and words). Every module includes
a Support Vector Machine (SVM), which is trained for the
classification of each marking, using geometric features such
as Hu moments. Experiments show precision rates above 90%,
though problems occur with the recognition of worn and
shadow-covered markings.
Previous work in [2], describes a recognition pipeline that
can accurately detect and recognize lane, stripe, block and
arrow markings. From the IPM image, a marking segmentation
is obtained with a local threshold. Then each connected
component is translated to its centroids, scaled to be within
a unity interval and rotated to align its primary axis, making
them invariant to these three transformations. Next, the four
different marking types are classified by SVMs, trained on the
specific types, using shape features. These are the distance
from the shape centroid to its contour at regular angular
intervals. Lanes are then modeled using RANSAC for straight
lanes and the Catmull-Rom spline for curved lanes. On the
dataset of 910 highway panoramic street-level images, the
algorithm achieved precision and recall metrics of over 90%.
This work is used as a starting point for our contributions for
which an approach is discussed in the next section.
III. APPROACH
Most related research is focused on in-car use where images
are captured with regular (non-panoramic) video cameras.
These systems aim at the recognition of specific markings that
are important for driver assistance and autonomous vehicles,
such as lane markings and stop lines and do not generate
databases of the recognized markings with type and global
position and orientation.
In this research, we use the same basic processing steps
(IPM, segmentation, contour classification, model evaluation),
as commonly applied in literature. This pipeline is extended
with two novel major processing stages, compared to related
work. First, contextual inference is added to incorporate in-
formation about the neighboring marking elements, thereby
clearly improving the detection performance. Second, an
accurate positioning stage is added, which uses recognized
markings from several images to determine the real-world
coordinates. The proposed system should satisfy the following
requirements:

fvertical
= -½p
fvertical
= 0
(horizon)
fvertical
= ½p
fhorizontal
-p p
-½p
½pfvertical
S SNW E
(a) Street-level panoramic images from Cyclomedia (Cycloramas) used in the experiments. The horizontal axis
corresponds to the horizontal angle (azimuth) around the camera. The vertical axis corresponds to the vertical
angle (altitude), where φvertical = 0 indicates the horizon.
(b) IPM of street-level panoramic image.
Fig. 2. Example street-level panoramic image with its IPM image.
1) follow a semi-supervised generic learning-based ap-
proach to recognize a variety of markings,
2) apply context inference to exploit contextual relations
between neighboring elements,
3) apply road-marking models in a generic framework to
retrieve marking clusters,
4) extraction of global marking positions of the identified
clusters, allowing for the generation of road-marking
maps.
These requirements result in a system capable of recognizing
multiple marking types: pedestrian crossings, give-way and
block lines, stripes. We have selected these types as they
are most common on intersections and denote very important
information for road safety. It should be noted that recognition
of lane markings is covered in [2].
This system will be evaluated at two different levels: (1)
single-image marking recognition, and (2) multi-view marking
positioning. The first experiment assesses the performance at
different stages in the system to investigate their performance
aspects. In a second experiment we evaluate the quality of
the generated marking maps. Additionally, this experiment in-
volves a combination with a traffic-sign recognition system [3],
which has been published in [11].
The proposed road-marking recognition system following
this approach is explained in Section V. However, we will first
elaborate on the characteristics of the source data (street-level
panoramic images) used as input by the recognition pipeline.
IV. SOURCE DATA
The presented system for the recognition of road markings
operates on street-level panoramic images, which provide a
recent and accurate overview of the road infrastructure. These
images are acquired at a large scale and are recorded at all
public roads within the target area, using a capturing interval
of 5 m. The recording vehicles drive along with regular
traffic at normal speeds. The cars are utilized in an efficient
way by capturing during daytime during all kinds of weather
conditions, including sunny, cloudy and foggy weather, and
directly after (but not during) rain or snow.
The panoramic images have a resolution of 2, 400 × 4, 800
pixels and are stored as equi-rectangular images. The capturing
location is also accurately known for each image, based on
a high-quality positioning system featuring both GPS and
IMU devices. Figure 2a displays an example equi-rectangular
panoramic image.
The employed capturing systems are calibrated precisely,
resulting in panoramic images that are mapped to a sphere,
on which angular distances can be measured. The resulting
images are stored as equi-rectangular images, which have a
linear relationship between the pixel coordinates within the
image and the viewing directions in horizontal and vertical
dimensions. This allows for the precise calculation of the
real-world 3D positions based on triangulation. The position
of an object can be retrieved in case multiple points (≥ 2)
corresponding with the considered object are found in multiple
images, using straightforward geometrical computations.
V. ROAD-MARKING RECOGNITION SYSTEM
This section presents a learning-based system for road-
marking recognition, using street-level panoramic images
captured from a vehicle. This system is split up into two
major processing blocks: single-image marking recognition
and multi-view positioning. The first block independently
processes all images in a specific geographical region of
interest. To recognize markings in images, this block consists
of five consecutive processing steps, which are described in
Section V-A. This results in both a pixel location, orientation
and a marking type for each recognition.
The detection results from the single-image marking recog-
nition (marking types, positions and orientations in images) are
passed to the multi-view positioning block which merges these
results to get global marking positions. Section V-B elaborates
further on this block. The complete recognition and positioning
pipeline is shown in Figure 3.

Single-image Marking Recognition
Multi-view
Positioning
Image
Segmentation
Pre
Processing
Marking
SVMMarking
SVMMarking
SVM
Contour
Classification
Marking
SVMMarking
SVMMarking
Model
Model
Evaluation
Context
Inference
Marking
SVMMarking
SVMContext
Model
Fig. 3. Road-marking recognition pipeline. First, markings are recognized in each image separately by performing IPM, segmentation, contour classification,
context inference and model evaluation. Then the results are combined with multi-view positioning. The five images below the diagram give the intermediate
results for the single-image marking recognition. Note that the contour classification, which generates probability maps, is performed for each marking type
(here shown for give-way markings). These are used by the context inference and model evaluation together with context and marking models, respectively.
A. Single-Image Marking Recognition
The recognition pipeline first processes all panoramic
images individually and recognizes markings in each im-
age in five sequential processing steps. Because street-level
panoramic images (Cycloramas) are used, the first step is
to (1) perform the aforementioned IPM. This results in a
top-down view of the scene, centered around the vehicle.
Then, (2) image segmentation is used to find the relevant
regions in the top-down image (i.e. road markings). This is
followed by (3) the classification of each relevant region (or
connected component) in the segmentation result. Features
from these connected components are extracted and classified
by trained SVMs. This results in a probability map for each
marking type. To enhance the performance of the SVMs,
context information (i.e. neighboring elements in the spatial
placement patterns) is exploited in the next step by modeling
the probability maps as Markov Random Fields (MRF). As a
result, we adopt (4) context inference, selecting the most likely
marking type with respect to the context, is performed by
using Loopy Belief Propagation (LBP) on the MRFs. Finally,
(5) the classified contours are evaluated by marking models
which merge single elements into multi-element markings (e.g.
pedestrian crossings or give-way lines). The next five sections
elaborate on these steps.
1) Image Pre-Processing: Direct recognition of road mark-
ings from the spherical, equi-rectangular panoramic images
is challenging, due to the inherent perspective deformations.
This can be observed from e.g. Fig. 2a, which illustrates
that parallel lines on the ground plane are not parallel in the
image plane, but instead converge to a single vanishing point.
Therefore, such perspective-distorted images are commonly
transformed to a top-down view, using an Inverse Perspective
Mapping (IPM) (similar to [12] [13]). This transformation
remaps the image such that the image plane equals a pre-
defined (ground) plane (e.g. the road). It should be noted that
since most roads are not perfectly flat (but slightly curved for
drainage), small deformations may be visible. Nevertheless,
the resulting images allow for easier detection, recognition and
positioning of road markings, as e.g. illustrated by Fig. 2b.
These top-down images are calculated by:
x = xcar +
arctan(yIP M
xIP M
)
2π
× n mod n, (1)
y = m −
arctan(d/h)
2π
× n mod m. (2)
In these equations, (x, y) and (xIP M , yIP M ) denote the hor-
izontal and vertical image coordinates in the equi-rectangular
panoramic image and the computed IPM image, respectively.
The parameter xcar represents the horizontal coordinate of the
front of the car within the panoramic image, h and d denote
the camera height from the ground plane and the distance
from pixel coordinate (xIP M , yIP M ) to the center of the IPM
image. Finally, m and n denote the resolution of the panoramic
image.
2) Road Marking Segmentation: The retrieved IPM image
is segmented into two categories: road marking- and non-road
marking-pixels. Road markings are typically brighter than the
road and have a low saturation, as they are typically close to
white luminance. Therefore, image regions that have a high
local intensity and a low saturation are extracted in a two-
step process. Using this metric for the segmentation of road
markings gives good results [14].
The first step involves the calculation of the intensity
difference between the grayscale pixel values and the average
graycale intensity value in a rectangular window around each
considered pixel. With gp the grayscale pixel value of pixel p

(a) (b) (c)
Fig. 4. Illustration of the segmentation steps. (a) input top-down image, (b) segmentation result based on local intensity measure, (c) segmentation based on
both local intensity and saturation measures.
θ
d1d2d3
d4
d5 d6 d7
d8
θ
d1d2d3
d4
d5
d6
d7
d8
Fig. 5. Example of shape features in two different shapes (block and triangle).
Clearly, the vectors have different magnitudes for equal angles.
and v, w the size of the local neighborhood around pixel p,
this calculation can be expressed as:
gp = gp −
1
vw
v
2
i=− v
2
w
2
j=− w
2
gij. (3)
The size of the window is determined by the marking types of
interest. A binary segmentation is then obtained by applying
Otsu’s threshold method on the found differences.
The second step involves filtering based on the saturation
value, where a thresholding operation removes the highly-
saturated pixels from the previously obtained mask. After mor-
phological closing and hole filling, all connected components
(groups of neighboring pixels) are extracted from the retrieved
segmentation mask. Figure 4 illustrates this procedure.
3) Contour Classification: The next step is to classify each
of the connected components in the segmentation result. First
the contour is extracted from each connected component,
representing its outline in pixel positions. Then all contours are
translated to the origin and rotated to align their primary axis
to be translation- and rotation-invariant. Often scale invariance
is used as well, but in this case the scale is relevant to
the marking type, e.g. small stripe markings and crosswalks
may have the same shape but different scale, such that scale-
invariance is omitted. Next, the distance between the contour
centroid and the contour edge at set angular intervals is
determined, as shown in Figure 5. As road markings have
highly regular (and mostly convex) shapes, these values can be
used as a shape descriptor and concatenated to form a feature
vector.
To classify between the different marking types, the feature
vectors are transformed to zero mean and unity standard
deviation, by subtracting the mean feature vector and dividing
by the standard deviation from vectors from training sets. This
results in a set of N feature vectors for each marking, where
each vector corresponds to a specific marking type. These
vectors are then classified by SVM classifiers, each operating
on the feature vector extracted for the marking type that it
should recognize. Each SVM outputs the distance towards
its decision bound, which is then converted to a probability
measure using Platt scaling [15] [16]. After evaluation of the
SVMs, each segmented object has N probability measures,
which are in the same unity interval and indicate probabilities,
which allows for direct comparison between marking types.
For each contour, a vector of N probability measures is
created. Figure 6a and 6c show the obtained probability map
for two different marking categories.
4) Context Inference: Individual markings can be occluded
(e.g. by other vehicles) and may have a lowered visibility
or can be damaged. This results in non-ideal shapes in the
segmentation mask, which are recognized with lowered proba-
bility, or a shape has high probabilities for other marking types.
Therefore, we exploit the periodic spatial placement patterns
at which road markings are typically placed, to improve the
recognition performance. We use this contextual information
to update the recognition scores for each detected marking,
based on the scores of markings located at the expected
locations for the respective marking types.
For this type of novel contextual information, we employ a
Markov Random Field (MRF), which allows for updating of
the recognition probabilities based on contextual information,
i.e. based on the probabilities of their neighbors. Within the
MRF, all detected road markings are modeled as nodes, where
the initial probabilities of the nodes are set to the probabilities
found by the SVMs at the classification step. All neighboring
nodes, having an inter-node distance smaller than a pre-defined
threshold, are connected with edges.

(a) (b)
(c) (d)
Fig. 6. Illustration of the MRF processing. Left column: input probabilities for (a) give-way and (c) block markings, red-green denote low-high values. Right
column: output probabilities for (b) shark teeth and (d) block marking. Note that the overlap between triangle and block markings disappears.
Fig. 7. Construction of context weighting functions using marking distance
and orientation difference from a training set for perpendicular markings. The
heat map at the top shows the weights in the x-y plane for neighboring nodes
in the MRF, where red indicates a high and blue a low weight. For instance,
for perpendicular markings the weight to another node in the MRF is high if
it is at its periodical distance (e.g. 1 meter on the minor axis and < 0.5 meter
on the major axis). The second figure shows the weighting function in terms
of the orientation difference (between markings/nodes of the MRF).
In contrast to the conventional MRF, where all edges are
unweighted, we assign weights to the edges. These weights
represent the contextual influence of nodes on each other,
based on how well their inter-node distance and relative
orientations fit to the expected marking placement pattern. This
relationship is modeled by fitting a Gaussian Mixture Model
(GMM) on the inter-node distances and orientations of a train-
ing set. Figure 7 shows an example context weight function
for perpendicular markings. Because different marking types
have varying contextual relations, the weights are determined
for each marking type. Figure 8 provides an illustration of an
example MRF for road markings. Details of this figure will
1 2 3 4
p1
= [0, 0.01, 0.99]T
p2
= [0, 0.15, 0.85]T
p3
= [0, 0.6, 0.4]T
p4
= [0, 0.1, 0.9]T
w1→2
= [w1
1→2
, w2
1→2
,...]T
w2→1
w2→3
w3→2
w3→4
w4→3
Fig. 8. Example MRF network including edge weights, where each node has
a probability to belong to one out of 3 classes. Note that the weights are also
specific for each marking type and are thus vectors.
be further explained below.
Since finding the exact probabilities within an MRF is
computationally expensive, the solution is typically approx-
imated with Loopy Belief Propagation (LBP) [17] [18], which
updates the probabilities by passing messages on the edges.
After all messages have been sent, new probability values are
calculated from the probabilities of the previous iteration and
the weighted messages. Below, we will explain this process in
detail.
With pi being the vector of probabilities that node i belongs
to each of the marking types, the message that will be sent
from node i to node j in the next iteration, ˆm i→j, equals:
ˆm i→j =
m i→j
||m i→j||
, where (4)
m i→j = pi +
k∈N∧k=j
wk→i ˆmk→i. (5)
In these equations, m denotes the vector containing the mes-
sages for each marking type, N is the set of nodes connected
to node i and wk→i denotes the vector of edge weights for
each marking type on the edge from node k to node i.
Next, the probabilities are updated. First a belief value b is
calculated by incrementing the current probabilities with the
weighted sum of all incoming messages. The new probabilities

that node j belongs to each of the marking types, p j, are then
found by normalizing the obtained belief values.
bj = pj +
i→j
wi→j ˆmi→j and p j =
bj
||bj||
. (6)
This process is repeated until all probabilities converge, or the
maximum number of iterations is reached. Based on the newly
acquired probabilities, all segmented markings are assigned to
a single marking class, by selecting the class with the highest
probability. Figure 6 displays the input and output of this pro-
cessing stage. Because the orientation of (partially damaged)
markings is subject to noise, each marking element is assigned
an orientation co-determined by its context. In particular, we fit
a line using least-squares to neighboring elements of the same
class and determine the dominant orientation of this line.
5) Model Evaluation: To recognize high-level marking
elements (e.g. lines of stripes, crosswalks, give-way situations)
and to remove falsely detected markings, we apply a marking
model that exploits the periodic placement patterns in which
most road markings occur. For example, pedestrian crossings
are constructed by a number of equally-sized rectangles at reg-
ular intervals with equal orientations, and give-way situations
are constructed by multiple shark teeth, located along a line,
where each triangle is oriented parallel to the driving direction.
This placement model is evaluated as follows. First, we
calculate the distances between all markings and their close-
by neighbors on their major and minor axes, where the major
axis is aligned with the orientation of the contour as extracted
in the previous steps. Second, we connect all markings that
adhere to the adjacency and orientation rules for the specific
marking type, where we ignore marking pairs deviating from
those rules. In this stage, elements are connected when they
are located at the expected locations within the pattern. These
positions consist of the locations of neighbors at once or
twice the period of the specific marking pattern. For example,
two lane markings are connected, if they are positioned at
a predefined distance interval on their major and minor axes
within set deviation. Last, markings belonging to the same
high-level element (such as a pedestrian crossing) are then
grouped together using connectivity-based clustering, which
creates clusters from markings which are pair-wise connected.
Groups of recognized markings that contain too few elements
can be removed to reduce the amount of falsely positives.
This model evaluation results in a single position, orienta-
tion and width for each found marking element, thereby allow-
ing for the recognition of high-level elements (i.e. crosswalks,
give-way situations and lane divisions). The evaluation of the
placement model is illustrated by Fig. 9.
B. Multi-View Positioning
When all images in a certain region of interest are processed
(e.g. in a city), the detected markings are processed by the
multi-view positioning block. Because the images have a
global position (in GPS coordinates, logged at capture time),
the position of each detected marking can be calculated from
Fig. 9. Illustration and evaluation of the placement model. Left: crop
from input IPM image. Middle: found markings within the cropped region
(pedestrian crossing element = red, block marking element = green, shark
teeth = black, other = blue). Right: output. The pedestrian marking segments
are coupled together, and also the block markings that are located on the same
line. The two erroneously found block markings are ignored as they do not
fit the model.
0 5 10 15 20
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Distance from car [m]
DistanceWeight
Distance Weighting Function
Fig. 10. Distance weighting function constructed by fitting a Gaussian
function to the AUC. The weight function is maximal around 6 m, i.e. this is
the range where the least markings are missed.
the relative position in the image due to the IPM transfor-
mation, which maps the image plane to the horizontal plane
at ground level. This results in a marking map on which
multiple detections of the same marking are present, where
all detections originate from different images.
To obtain a result where each marking is detected only
once, connectivity-based clustering is used again to merge
detections. The connectivity criterion is based on the size
and orientation of the marking. Specifically, Marking A is
connected to Marking B if the centroid of A is within the shape
of B. Each shape originates from the single image detection
processing stage, and is determined by its cluster size (number
of elements times period) and its orientation. So for each
cluster size, the size of the shape is defined from [1].
After this procedure, clusters with only one element (mark-
ing detected in only one image) are discarded, assuming that
markings are detected at least twice in all images. For each
cluster, the final crosswalk position is calculated from the
weighted mean of all detections in the clusters. This weight
is computed by fitting a function to a performance metric as
a function of the detection distance (i.e. pixel distance from
image center). In this case, we have determined the AUC at

different detection ranges and fitted a Gaussian function on
the resulting values. From the fitted curve in Figure 10, it
can be observed that the performance is optimal at 6 m. For
distances smaller than the optimum, markings are occluded by
nearby objects or the car itself. At distances larger than the
optimum, markings can be occluded as well, but also suffer
from perspective distortions.
VI. EXPERIMENTAL SETUP
The system consists of two major stages, the single-image
marking recognition and the multi-view positioning. We eval-
uate the system at both stages, using different datasets and
configurations. The datasets consist of the aforementioned
street-level panoramic images with corresponding metadata
(i.e. car orientation, global position, time stamp).
As general performance metrics we use the true positives
(TPs), representing the correctly detected markings, false pos-
itives (FPs), indicating the falsely found markings, and false
negatives (FNs), referring to the missed markings. Addition-
ally, we apply recall-precision curves to show the performance
for the specified marking types. The recall denotes the ratio of
found and missed markings and the precision the ratio between
the correctly and falsely found markings. In an ideal system,
both the recall and precision are unity (all markings found and
no false positives.)
The first dataset contains 263 images of a large city in the
Netherlands, in which the ground-truth positions of pedestrian
crossings, block-shaped, give-way and stripe marking elements
have been annotated. The set consists of 834, 1573, 805 and
771 single-marking elements for the previously mentioned
four marking types, respectively. This set is used to test
the single-image marking recognition performance, where we
specifically assess (1) the impact of using context information
to enhance recognition rates, (2) evaluate the influence of
detection distance from the car/capture position, and (3) to
analyze the contribution of the marking model evaluation.
Next to testing to the recognition performance of single-
marking elements, we also specifically evaluate detection
rate and positioning accuracy of clusters of markings, such
as crosswalks or give-way lines. We calculate the average
positioning error for each marking type and also create a curve
of the detected percentage as a function of the distance from
the ground truth. The single-image marking recognition is
executed on the first dataset both with and without the context
inference step. The dataset contains 25 pedestrian crossings,
60 block lines, 39 give-way lines and 34 dashed lines.
To test the performance of the complete marking recognition
pipeline, we have created a large dataset in the municipality of
Lingewaard with 84,387 images corresponding to 400 km of
road. To test both the recognition and positioning performance
of marking clusters, global GPS positions along with the size
and orientation of the markings are used as ground truth
for this set. We focus on the recognition and positioning
accuracy of pedestrian crossings, give-way markings and block
markings, which occur with amounts 105, 729 and 141,
respectively.
After the evaluation of the complete road-marking recogni-
tion pipeline, we perform an additional experiment in order to
relate our detected markings to traffic signs. Road markings
and traffic signs often coexist, such that combined databases
can be used for the evaluation of redundant and complemen-
tary information. In this case, we create such a database for
the consistency checking of road markings and traffic signs,
concentrating on crosswalks and give-way markings. Markings
and signs are consistent if they are within a set distance and
have matching orientations. For the traffic-sign recognition, we
use the system described in [3]. The consistency evaluation is
performed on the full dataset of 84,387 images.
VII. DETECTION AND POSITIONING RESULTS
This section presents the results of (1) single road-marking
recognition performance, (2) marking cluster detection perfor-
mance, (3) 3D positioning (full pipeline) and (4) consistency
checking.
A. Individual Road-Marking Recognition Performance
Figure 11 shows the recall-precision curves for pedes-
trian crossings, block-shaped markings, give-way triangles
and stripe markings. The performance of each marking type
has been evaluated for three pipeline stages (SVM, context
inference and model evaluation) and for two detection ranges
(distance within 10 m and 20 m of the car). We first analyze
the recognition performance of individual markings for the
SVM classification and then evaluate the performance impact
of the added context inference and model evaluation.
SVM Performance: Within a range of 10 m from the car,
over 90% of all markings are detected, with the exception of
give-way markings (> 80%). However, when a distance of
20 m is considered, the performance drops significantly for
crosswalks, give-way and stripe markings. First, this is due to
farther away markings often tending to be occluded by other
objects, such as vehicles. Second, perspective distortions occur
at larger distances from the capturing location, as the IPM
assumes a flat ground plane (although most roads are curved).
Influence of context inference: When context inference
is used, the performance is equal or in most cases better
than the SVM performance. For pedestrian crossings, block
markings and give-way triangles, the recognition performance
is only increased slightly. Looking at stripe markings, the
impact of context inference on the recognition performance
is considerable. This is due to the fact that stripe contours are
easily distorted, resulting in a low probability output of the
SVM. By exploiting the contextual placement patterns, stripe
markings with low probabilities can be boosted. Within 10 m,
the same recall of above 90% can be achieved at a much higher
precision (>90%). For a distance of 20 m, the precision can
be improved as well, albeit at a lower recall.
Result of model evaluation: As mentioned before, the model
evaluation is mainly used for clustering and removal of false
positives. The output probabilities after context inference are
set to zero if markings do not adhere to the model. This
improves the final precision in most cases, but also lowers the

0 0.2 0.4 0.6 0.8 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Recall
Precision
Recall−precision performance for Crosswalks
SVM (<10m)
+Context (<10m)
+Model (<10m)
SVM (<20m)
+Context (<20m)
+Model (<20m)
(a)
0 0.2 0.4 0.6 0.8 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Recall
Precision
Recall−precision performance for Blocks
SVM (<10m)
+Context (<10m)
+Model (<10m)
SVM (<20m)
+Context (<20m)
+Model (<20m)
(b)
0 0.2 0.4 0.6 0.8 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Recall
Precision
Recall−precision performance for Give−way
SVM (<10m)
+Context (<10m)
+Model (<10m)
SVM (<20m)
+Context (<20m)
+Model (<20m)
(c)
0 0.2 0.4 0.6 0.8 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Recall
Precision
Recall−precision performance for Stripes
SVM (<10m)
+Context (<10m)
+Model (<10m)
SVM (<20m)
+Context (<20m)
+Model (<20m)
(d)
Fig. 11. Recall-precision curves for single-image marking recognition for (a) pedestrian crossings, (b) block markings, (c) give-way markings and (d) stripe
markings. For each marking, the performance is shown after SVM classification (blue), context inference (red) and model evaluation (green) and both detection
ranges: within 10 m (solid line) and 20 m (dashed line).
recall slightly. Since marking clusters should at least have 2
marking elements, single isolated markings are discarded, even
though they might be correct. Furthermore, due to perspective
distortions, distances between markings are altered. The IPM
assumes a flat ground plane, but roads are often curved (e.g.
for drainage). Markings that are far away from the car are
particularly affected by these distortions.
B. Road-Marking Cluster Performance
The model evaluation produces road-marking clusters with
a position, orientation and size. Table I shows the recognition
results of road-marking clusters within 10 m of the car. Con-
sidering the detection results, we observe that without context
inference, between 79% and 90% of the clusters are found,
except for block markings of which only 40% of the clusters
is found. With context inference, the detection performance
is significantly improved, where 88% of the crosswalks and
over 90% of the other markings are found. The positioning
accuracy, which is expressed as the mean of the distances from
the ground-truth cluster positions, is considerably improved by
using context inference, except for give-way markings, where
it is marginally worse (2 cm). However, this is still within the
significance of this measurement.
Table II shows the same detection and accuracy metrics
TABLE I
MARKING GROUP PERFORMANCE FOR DETECTIONS WITHIN 10 M
Situation Found # Found % False det. Pos. Error m
Pedestrian Crossing 22 88% 1 0.59 m
with context 22 88% 3 0.53 m
Block Markings 24 40% 3 0.77 m
Give-way Markings 31 79% 1 0.20 m
Stripe Markings 30 88% 4 0.73 m
TABLE II
MARKING GROUP PERFORMANCE FOR DETECTIONS WITHIN 20 M
Situation Found # Found % False det. Pos. Error m
Stripe Markings 54 49% 7 0.99 m
for a range of 20 meters from the car. Overall, context
inference still improves both the number of found markings
and the positioning accuracy, but we observe that the results
are not improved for pedestrian crossings and stripe markings

(a)
(b)
Fig. 12. Two situations where marking clusters are missed. In (a) the pedestrian
crossing is occluded, in (b) the stripe markings are distorted by the IPM.
0 1 2 3 4 5 6
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Percentage of detected markings as a function of distance from ground truth
Detection distance [m]
Detected%
SVM
+Context
Fig. 13. Detected percentage as a function of distance from ground truth.
(56% and 53% found). For these two marking types, we now
investigate the causes for the lower detection performance.
Figure 12 shows two cases where crosswalks and stripe
markings are not detected. Because crosswalk markings are
relatively large, they tend to be occluded by other objects
if they are farther away from the car. Stripe markings are
relatively small and thus are more susceptible to the distortions
of the IPM. This implies that even without context inference,
all undistorted markings have been detected, such that added
context information does not improve detection rates. Because
most markings are detected within 10 m and the results from
individual images are merged in the next stage, almost all
markings can still be detected, as the same marking occurs in
multiple images.
Regarding positioning accuracy, Figure 13 gives the percent-
age of detected markings that are within a certain range of the
ground truth. From these curves, it is clear that the context in-
ference improves the positioning accuracy of marking clusters
in an image. When the pipeline is used only with the SVM
classification, roughly 60% is detected within 1 meter and
above 80% within 2 meters. Using context information, this
can be improved to above 80% within 1 meter and above 90%
within 2 meters. Positioning accuracy is particularly important
for the multi-view positioning, which is the next step in the
pipeline and is evaluated in the next section.
TABLE III
MULTI-VIEW POSITIONING RESULTS FOR CROSSWALKS, BLOCK
MARKINGS AND GIVE-WAY TRIANGLES.
Situation Found # Found % False det. Accuracy m
C. 3D Positioning Results
Table III shows the detection and positioning accuracy
results for the multi-view positioning of crosswalks, block
markings and give-way triangles. Overall, the recognition rate
is equal or above 80%. For each marking type, we will evaluate
the most common causes for false negatives (misses) and false
positives (erroneously found markings), provided that they
are significant. It should be noted that the impact of missed
markings (FNs) is larger than falsely found markings (FPs).
With little manual effort, all found detections of the pipeline
can be inspected and accepted or discarded accordingly.
For all classes, marking abrasion is the most common
cause for missed detections. Furthermore, markings are missed
if there are only a few images which capture them, often
occurring when they are far away. Below, we specifically
address each marking type.
The main cause for missed detections in crosswalks is the
merging of clusters. In most cases, a smaller crosswalk of
only two elements is located closely to a larger crosswalk.
Due to positioning errors, caused by perspective distortions
and GPS position errors, these clusters are ’connected’ and
are merged, but marked as missed because both markings are
annotated. Figure 14a shows an example of two smaller two-
stripe crosswalks that are merged to the larger crosswalks in
the middle.
The detection rate of block markings is already high (92%).
However, there are a lot of false positive detections. Closer
inspection of the false positives reveals that the pipeline
recognized tile patterns in gardens and on driveways and
sidewalks, as shown in Figure 14b.
Give-way triangles have the lowest detection rate compared
to the other markings and also have the highest number of
elements in the dataset. As there are around 150 missed
markings, we create a breakdown of the causes of this high
number. Over 50% of all missed give-way markings were
heavily damaged, thus resulting in the inability of detection by
the presented system. The second cause at 23% is due to give-
way markings being present on bike roads/lanes. These lanes
are often located far from the road/capture location, resulting
in distorted and occluded markings. At 13%, cluster merging
is the third most probable cause of FNs and is comparable to
the merging aspects described for crosswalks. Other minor
causes include occlusions and IPM distortions (e.g. when
markings are on slopes). Figure 14c and Figure 14d show
cases of damaged markings and far-away capture locations,
respectively.

(a) FNs (red) due to the merging of close-by pedestrian crossings (blue).
(b) FP due to similar shapes and patterns.
(c) FN due to severely damaged markings.
(d) FN due to far away capture locations (blue dots).
Fig. 14. False and missed detections of 3D-positioned markings.
D. Road-Marking and Traffic-Sign Co-Occurrence Validation
The presented road-marking recognition system is applied
in conjunction with the existing traffic-sign recognition sys-
tem [3] to check the correct co-occurrence of signs and
markings, in particular for pedestrian crossings and give-way
situations. The goal of this experiment is to (1) evaluate
the recognition of traffic situations where both marking and
signs occur, (2) explore consistency checking on databases
containing the positions of signs and markings. The last
aspect aims at identification of situations where expected signs
TABLE IV
OVERVIEW OF THE COMPLETE COMBINED RESULTS AND FOR INDIVIDUAL
SIGN AND MARKING RECOGNITION ONLY.
Situation Approach Correctly det. False det.
Pedestrian
crossings
Combined recognition 53 100% 8
Marking recognition only 51 96.2% 4
Sign recognition only 49 92.5% 4
Give-way
situations
Combined recognition 694 96.5% 28
Marking recognition only 500 69.5% 23
Sign recognition only 598 83.1% 5
TABLE V
OVERVIEW OF THE CONSISTENCY CHECKING RESULTS.
Situation Consistent
Pedestrian crossings 34 / 53 64.2%
Give-way situations 349 / 719 48.5%
or markings are missing, potentially leading to dangerous
traffic behavior. For instance, a pedestrian crossing should be
indicated by both signs visible from all driving directions and
a sufficiently large crosswalk marking.
Combined Recognition: Table IV shows the number of
recognized traffic situations for marking-, sign- and combined-
recognition. It should be noted that this table considers traffic
situations and not single-marking clusters or signs, such that
multiple signs and markings denoting the same situation are
merged. The results show that marking- and sign-recognition
complement each other when considering the identification of
situations, especially taking into account that a significant part
of the situations is denoted exclusively by only a road marking
or a sign.
Consistency Checking: Considering Table V, two-third of
the pedestrian crossings and about half of the give-way sit-
uations are marked as consistent, i.e. expected markings and
signs are both detected. Compared to manual safety inspection,
this approach reduces the number of situations that has to be
verified with about a factor of two (or better).
VIII. CONCLUSIONS AND FUTURE WORK
In this paper we have presented a road-marking recogni-
tion system to create road-marking maps from street-level
panoramic images. Next to the general characteristics of
marking recognition system, such as segmentation and contour
classification, the proposed system contributes to the perfor-
mance by several aspects. This system is able to (1) recognize
a variety of markings, (2) exploit context relations between
individual marking elements, (3) retrieve marking clusters,
and (4) find the global positions of the recognized mark-
ings. These contributions have resulted in a system design
with two consecutive processing stages. First, each image is
processed individually to identify the present markings. This
stage applies Inverse Perspective Mapping (IPM), segmenta-
tion, learning-based contour classification with SVMs, context
inference and model evaluation on each individual image sub-
sequently. Context inference is realized by modeling the SVM
results in a Markov Random Field with weighted edges and

performing inference with Loopy Belief Propagation. From the
context inference results, marking clusters are constructed by
applying marking placement models, which exploit marking
design rules provided by traffic legislation. In the second
processing stage, recognition results from the separate images
are combined with connectivity-based clustering to find the
3D positions of the markings.
First the single-image processing stage has been evaluated
for crosswalks, block-, stripe- and give-way markings. The
base performance (SVM classification) within 10 m of the car
has been found sufficient for crosswalks, give-way and stripe
markings (≥79%), but not for block markings (40%). The
use of context inference strongly improves both the detection
performance and positioning accuracy of all marking types to
above 88% (above 90% for most marking types) and finds
them within a few decimeters of the ground-truth locations.
The performance within 20 m of the car is lower than close-
by detections, even with context inference for some types.
However, because actual markings are captured in multiple
images, this is not an issue, as is discussed below in the results
of the multi-view positioning.
Applying the full processing pipeline to a dataset of a
complete municipality in the Netherlands including more than
84,387 images (corresponding to more than 400 km of road),
yields promising results. In this set, pedestrian crossings,
block- and give-way markings have been recognized at 85%,
92% and 80%, respectively. Closer inspection of missed de-
tections reveals that most undetected markings are severely
damaged (give-way markings in particular), or are located far
from the capture location, which results in more occlusions
and perspective distortions. Manually verifying all detections
of the system for removing false positives, this system is per-
forming sufficiently well for creating road-marking maps for
the use of traffic safety inspection or navigation applications.
Exploiting high-level context information from other
sources such a traffic signs, a significant amount of the missed
detections can be identified with traffic situation analysis.
Combined databases of markings and signs (1) supply a larger
number of traffic situations than using a single source, and
(2) allow for consistency checking of sign and marking co-
occurrences, which directs manual traffic safety inspection
to aberrant cases. On our dataset, we have found 64.2% for
crosswalk situations and 48.9% for give-way situations to be
consistent, thereby reducing the manual verification by a factor
of two or more.
Looking to the results, future work should be geared towards
two objectives: (1) specific processing of damaged markings
and (2) support for other marking types, such as lines, arrows
and speed numbers. Improving segmentation performance and
exploiting high-level context information of other markings
and signs, can help to increase recognition performance and
give an additional indication of damaged markings. Shapes of
alternative marking types can be learned by the system and
context relations and marking models can be setup.
Besides development and validation of a road-marking
recognition system in an industrial environment, this work has
initiated and contributed to several international publications:
• L. Hazelhoff, I. Creusen, T. Woudsma, and P.H.N. de With,
Combined generation of road marking and road sign databases
applied to consistency checking of pedestrian crossings, Ac-
cepted for: IAPR Int. Conf. on Machine Vision and Applica-
tions, 2015.
• T. Woudsma, L. Hazelhoff, I. Creusen, and P.H.N. de With,
Automated generation of road marking maps from street-level
panoramic images, Submitted to Int. Conf. on Intelligent Trans-
portation Systems, 2015.
• L. Hazelhoff, I. Creusen, T. Woudsma, and P.H.N. de With,
Exploiting automatically generated databases of traffic signs and
road markings for contextual co-occurrence analysis, Submitted
to Int. Journal on Electronic Imaging, 2015.
REFERENCES
[1] CROW, “Richtlijnen voor de bebakening en markering van wegen,”
2005.
[2] C. Li, I. Creusen, L. Hazelhoff, and P.H.N. de With, “Detection and
recognition of road markings in panoramic images,” in ACCV Workshop
on My Car Has Eyes - Intelligent Vehicles with Vision Technology, 2014.
[3] L. Hazelhoff, I. Creusen, and P. H. N. de With, “Exploiting street-level
panoramic images for large-scale automated surveying of traffic signs,”
Machine Vision and Applications, vol. 25, no. 7, pp. 1893–1911, 2014.
[4] S. Vacek, C. Schimmel, and R. Dillmann, “Road-marking analysis for
autonomous vehicle guidance.” in EMCR.
[5] M. Fu, X. Wang, H. Ma, Y. Yang, and M. Wang, “Multi-lanes detection
based on panoramic camera,” in Control Automation (ICCA), 11th IEEE
International Conference on, June 2014, pp. 655–660.
[6] J. Huang, H. Liang, Z. Wang, T. Mei, and Y. Song, “Robust lane marking
detection under different road conditions,” in Robotics and Biomimetics,
2013 IEEE International Conference on, Dec 2013, pp. 1753–1758.
[7] S. Yenikaya, G. Yenikaya, and E. Düven, “Keeping the vehicle on the
road: A survey on on-road lane detection systems,” ACM Comput. Surv.,
vol. 46, no. 1, pp. 2:1–2:43, Jul. 2013.
[8] P. Foucher, Y. Sebsadji, and J.-P. Tarel et al., “Detection and recognition
of urban road markings using images,” in Intelligent Transportation
Systems, International IEEE Conference on, 2011, pp. 1747–1752.
[9] H. Li, M. Feng, and X. Wang, “Inverse perspective mapping based urban
road markings detection,” in Cloud Computing and Intelligent Systems
(CCIS), International Conference on, vol. 03, 2012, pp. 1178–1182.
[10] B. Qin, W. Liu, and X. Shen et al., “A general framework for road
marking detection and analysis,” in Intelligent Transportation Systems,
2013 16th International IEEE Conference on, 2013, pp. 619–625.
[11] L. Hazelhoff, I. Creusen, T. Woudsma, and P. H. N. de With, “Combined
generation of road marking and road sign databases applied to con-
sistency checking of pedestrian crossings,” in 14th IAPR International
Conference on Machine Vision Applications (submitted to), 2015.
[12] J. Rebut, A. Bensrhair, and G. Toulminet, “Image segmentation and
pattern recognition for road marking analysis,” in Industrial Electronics,
IEEE Int. Symp. on, vol. 1, May 2004, pp. 727–732.
[13] T. Wu and A. Ranganathan, “A practical system for road marking
detection and recognition,” in Intelligent Vehicles Symp. (IV), IEEE, June
2012, pp. 25–30.
[14] T. Veit, J.-P. Tarel, P. Nicolle, and P. Charbonnier, “Evaluation of road
marking feature extraction,” in Intelligent Transportation Systems, 2008.
11th International IEEE Conference on, Oct 2008, pp. 174–181.
[15] J. C. Platt, “Probabilistic outputs for support vector machines and
comparisons to regularized likelihood methods,” in ADVANCES IN
LARGE MARGIN CLASSIFIERS. MIT Press, 1999, pp. 61–74.
[16] H.-T. Lin, C.-J. Lin, and R. Weng, “A note on platt’s probabilistic
outputs for support vector machines,” Machine Learning, vol. 68, no. 3,
pp. 267–276, 2007.
[17] J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of
Plausible Inference. San Francisco, CA, USA: Morgan Kaufmann
Publishers Inc., 1988.
[18] K. P. Murphy, Y. Weiss, and M. I. Jordan, “Loopy belief propagation
for approximate inference: An empirical study,” in Proceedings of the
15th Conf. on Uncertainty in Artificial Intelligence, 1999, pp. 467–475.
[Online]. Available: http://dl.acm.org/citation.cfm?id=2073796.2073849

Recognition of road markings from street-level panoramic images for automated map generation.

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (6)

Destacado

Destacado (17)

Similar a Recognition of road markings from street-level panoramic images for automated map generation.

Similar a Recognition of road markings from street-level panoramic images for automated map generation. (20)

Último

Último (20)

Recognition of road markings from street-level panoramic images for automated map generation.