Many data sets stewarded by geospatial professionals are spatially correlated derivatives of higher accuracy data sets such as parcels and road networks. This article documents the use of the Buffer-Overlay method of Goodchild and Hunter (1997) to determine and improve the horizontal accuracy of geospatial features.
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
A Method for Determining and Improving the Horizontal Accuracy of Geospatial Features
1. A Method for Determining and Improving the Horizontal Accuracy of
Geospatial Features
Juan Tobar, Shakir Ahmed, Linda McCafferty, and Carlos Piccirillo
South Florida Water Management District, West Palm Beach, FL, USA
Abstract
Many data sets stewarded by geospatial professionals are spatially correlated derivatives of
higher accuracy data sets such as parcels and road networks. This article documents the use of
the Buffer-Overlay method of Goodchild and Hunter (1997) to determine and improve the
horizontal accuracy of geospatial features. The method relies on a comparison with a
representation of higher accuracy, and estimates the percentage of the total length of the higher
accuracy representation that is within a specified distance of the lower accuracy representation.
The method is then extended using topological operators to extract and replace lower accuracy
representations with those of higher accuracy.
Introduction
The South Florida Water Management District (SFWMD) regulates water supply, water quality,
groundwater withdrawals, and surface water runoff through the issuance of permits for these
activities on specific land parcels. The District’s Regulatory GIS consists of approximately
85,000 permits spread over a 16 county jurisdictional area from Orlando to the Keys. The
permits are maintained in an SDE database in 18 feature classes based on permit type. About half
of these permits (Environmental Resource Permits) never expire and the other half are valid for
20 years (Water Use Permits) before they need to be renewed. These feature classes are used by
engineers, environmental scientists, hydrologist, and compliance staff to make informed
decisions during the application review process and post permit compliance. For these reasons it
is important that even the oldest permits are depicted as accurately as possible in the GIS system.
The Data - Permits
From 1980 to 1987 (15 years) permits were drawn directly on USGS 1:24,000 topographic
quadrangles maps and mylar overlays. From 1987 to 1995 (8 years) the maps had been migrated
to CAD and permits where being heads-up digitized using SPOT 10 Meter Panchromatic and 20
Meter Multi-Spectral Scanner imagery. From 1995 to 1999 (4 years) 1 meter Digital Ortho-photo
Quarter Quads were used, and by 1999 some permits were being digitized using county parcel
data. Today all permits are digitized to parcel but we have 23 years of badly data digitized with
much less than optimal base maps.
2. The Data - Parcels
The District uses a contiguous parcel base that is composed of features from the 16 counties
within the District’s jurisdiction. The State of Florida’s Cadastral Mapping Guidelines
recommend that horizontal accuracy should meet or exceed U.S. National Map Accuracy
Standards (NMAS). These standards state that at “scales larger than 1:20,000, not more than 10
percent of the points tested shall be in error by more than 1/30 inch, measured on the publication
scale.” Common scales for cadastral maps range from 1:500 to 1:10,000 assuming that they are
following NMAS horizontal positional accuracy at the 90% confidence will range from ±1.38 to
27.78 feet (Table 2).
NMAS NMAS NSSDA NSSDA
Map Scale CMAS RMSE(R) Accuracy (R) 95%
90% confidence level
1:1,200 (1” = 100’) 3.33 2.20 ft 3.80 ft
1:2,400 (1” = 200’) 6.67 4.39 ft 7.60 ft
1:4,800 (1” = 400’) 13.33 8.79 ft 15.21 ft
1:6,000 (1” = 500’) 16.67 10.98 ft 19.01 ft
1:12,000 (1” = 1000’) 33.33 21.97 ft 38.02 ft
Table 1: Comparison of NMAS, NSSDA Horizontal Accuracy for Parcels
These two data set are spatially correlated as permits are based on the same legal boundaries
used for parcels and we can therefore use parcels as a control to test the accuracy of our permits.
In general, the horizontal accuracy of the parcels can be considered to be an order of magnitude
better than the permits.
Literature Review
Positional accuracy or spatial accuracy refers to the accuracy of a test feature when compared to
a control feature. Methods for determining the positional accuracy of points are well established
and are usually provided by the Euclidean distance between the test point and a control point.
The error can be reported as errors in x, y, and z and descriptive statistics can be generated based
on these numbers.
Determining the positional accuracy of a line is more complex since they are composed of
multiple points each of which may or may not have a matching control point. Additional
problems include the determination of an appropriate search radius and the identification of
equivalent features to be used for comparison. Atkinson-Gordo and Ariza-Lopez (2002) provide
an excellent review of methods for measuring the position accuracy of linear features.
Methods for measuring the positional accuracy of polygons come from the extension of methods
used to measure the positional accuracy of lines. The five primary methods from Atkinson-
Gordo and Ariza-Lopez in brief are as follows:
2
3. Epsilon Band Error methods are based on defining an uncertainty band around a polygon
feature. The band width is known as Epsilon and the wider it is the greater the uncertainty
in the position of a line. The band can be derived by error propogation or by the
comparison of test line segments to a control. The method determines an error band
rather than determining or quantifying the accuracy of the line
Figure 1: Epsilon Bands
The Buffer-Overlay method of Goodchild and Hunter (1997) is based on defining a
buffer around a control line of higher accuracy and computing the percentage of the
length of the less accurate line within the buffer zone. Then, the width of the buffer is
increased and the percentage computed again. The process is repeated several times
producing a probability distribution.
Figure 2: Buffer-Overlay
The Buffer Overlay Statistics method of Tveite and Langaas (1999) involves buffering,
overlay, and generating statistics. First both the test line (X) and the control line (Q) are
buffered to produce buffers XB and QB. An overlay operation is then performed resulting
in four types of areas (Figure 3):
Type 1: Area outside XB and outside QB:
Type 2: Area outside XB and inside QB:
Type 3: Area inside XB and outside QB:
Type 4: Area inside XB and inside QB:
3
4. A number of different statistics can be generated from the above metrics but for our
purposes the most interesting is Type 4 which will dominate if the test and control
polygon are very similar. When the lines are similar in form but differ in position
(displacement is present), an estimate of the positional accuracy can be made when Type
4 approaches 50%.
Figure 3: Buffer Overlay Statistics
Hausdorff Distance methods of Abbas, Grussenmeyer and Hunter (1995) is based on
calculating the Hausdorff distance on a pair of equivalent lines that have been generalized
and normalized using the RMSE and a generalization factor. Two values are computed
for evaluation of a line: percentage of agreement (ratio between the normalized lines and
the original lines) and the RMSE for planimetric features (computed from all the
normalized lines).
Figure 4: Hausdorff Distance
Maximum Proportion Standard (MPS) and Maximum Distortion Standard (MDS) method
of Veregin (2000) is based on the computation of the uniform distortion (UDD). The
UDD is computed from areas between two lines and the length of the line in the map.
Then, a diagram of cumulative frequencies is built for a given band width at a given level
of confidence.
4
5. Figure 5: MPS and MDS
The advantages of the Buffer-Overlay method over other methods discussed is that: (1) it can
perform effectively without the need to extract both the test and the control polygon, (2) it does
not require matching of points between the two representations, (3) it is relatively insensitive to
outlying values, and (4) it is statistically based. Additionally, the algorithm uses common
buffering and clipping functions available in all major GIS.
The Test Area
In order to thoroughly test the limits of our procedures for determining and improving horizontal
accuracy we chose to run our test on a subset of the data. Specifically, we extracted the
Environmental Resource Permits for Township 44S Range 25E in Lee County, Florida. Lee
County was selected because it was an area known to have permits that were highly displaced
from their parcel counterparts. All permits that intersected this township range were extracted
into a File Geodatabase consisting of 259 features.
Figure 6: Test Area
5
6. Methods
A straight forward method for determining the horizontal accuracy of a polygon feature class is
to measure the offset between polygon vertices and parcel vertices and then calculate the Root
Mean Square Error (RMSE). In order to facilitate this activity a C# program was written that
would allow staff to create a database of coordinate sample points. The RMSE provides us with
the accuracy of the entire feature class but does not tell us the accuracy of individual permits,
hence, the need for Buffer-Overlay.
Buffer-Overlay is usually implemented by buffering a control line and quantifying how much of
the test line is found within each buffer. This works well with small control data sets such as a
shoreline but is not practical when using parcels. In this case it would require buffering each
parcel line segment and then checking for an overlapping permit line segment that in the
majority of cases does not exist. This implementation will therefore buffer the permit lines (test)
and quantify how much of the parcel line (control) is found within each buffer. The output is the
cumulative probability (CP) curve for each individual permit.
The pseudo code for calculating the initial horizontal accuracy is as follows:
Convert parcel polygons to parcel lines
For each permit
o Buffer from 0.5 ft to 60 ft @ 0.5 ft intervals
Clip the parcel lines (control) using buffer distance
Drop dangling nodes (where length = buffer)
Calculate the CP
If CP 1
horizontal accuracy is the buffer distance
Else If CP 0.999 and buffer < 60
next buffer
Clipping produces short and long line segment dangles as artifacts the length of which are
directly related to the buffer distance used to clip. Short dangles are easily removed by
eliminating segments equal to the buffer distance. In the case of long dangles the CP reaches 1
before a complete ring can be extracted and will result in a failed polygon build.
This algorithm was run on all polygons in the test area resulting in 259 curves composed of the
individual probability at each buffer distance for each feature. In Figure 7 a random sample of
CP curves for 21 permits is displayed. On this graph the x-axis represents the distance buffered
from 0.5 to 60 feet @ 0.5 ft intervals. The y-axis represents the CP and when the curve reaches 1
or more the length of clipped parcel line is greater than or equal to the perimeter of the permit
6
7. line. In these cases the buffer distance used is assigned as the horizontal accuracy of the permit.
Those curves that never reach 1 are outside of our maximum buffer distance of 60 feet.
1.2
1
Cumulative Probability (%)
0.8
0.6
0.4
0.2
0
0.5
2.5
4.5
6.5
8.5
10.5
12.5
14.5
16.5
18.5
20.5
22.5
24.5
26.5
28.5
30.5
32.5
34.5
36.5
38.5
40.5
42.5
44.5
46.5
48.5
50.5
52.5
54.5
56.5
58.5
Buffer Distance (ft)
Figure 7: Cumulative Probability for Individual Permits
Phase I Correction
Phase I involved converting the extracted parcel line segments into polygons using geospatial
tools. This functionality is built into many GIS and is best associated with the creation of parcel
polygons from meets and bounds entered using Coordinate Geometry. The pseudo code for this
is as follows:
For each permit
o Buffer at the accuracy level previously determined
o Clip the parcel lines (control) using the test buffer
o Drop dangling nodes (where length = buffer)
o Build parcel lines as polygons
Compare area of polygon to original permit
Only accept if polygon area = 0.03 * permit area
Build Succeeds/Fails
In some cases long line segment artifacts are extracted that form closed rings and results in
polygon builds that are significantly larger or smaller in area than the original permit and can be
excluded through an area comparison.
7
8. Once complete each permit feature will have a CP and an assigned horizontal accuracy. The
RMSE will be recalculated to quantify the improvements on the entire feature class.
Phase II Correction
Phase II involved adding arc segments to parcel lines with gaps in order to form a closed ring
that could be built into a permit polygon. The pseudo code for this is as follows:
For each permit that failed to build
o Buffer @ accuracy level previously determined
o Clip the parcel lines (control) using test buffer
o Drop dangling nodes (where length = buffer)
o For each remaining node
Identify the closest node
Connect the two nodes with a line segment
o Build lines as polygons
o Compare area of polygon to original permit
if polygon area = permit area ( +/- 0.03 * permit area )
In this case, an improved CP cannot be calculated since Phase 2 adds line segments to permit
features where they are missing from parcel features. Since the CP is based on the parcels and in
this case parcel line segments are missing an improved CP cannot be calculated. However, we
can recalculate the RMSE to quantify any improvements.
Results
The initial confidence interval on the estimate of RMSE for x and y at 95% probability was
calculated using 30 coordinate pairs from the entire test area. The initial values were 20.22 ±
5.74 in the x and 22.18 ± 7.29 in the y (Table 2). The RMSE measure is circular meaning that the
values are relatively similar between the x and y and indicate that there is no systematic error in
the data that would produce more errors in any particular direction.
Initial X/Y Dimension
Definitions Values
Confidence interval on the estimate of RMSEx at 95% probability
RMSEx + 1.96 * SRMSE > exi > RMSEx - 1.96* SRMSE 20.22 ± 5.74 = 14.49 to 25.95
Confidence interval on the estimate of RMSEy at 95% probability
RMSEy + 1.96 * SRMSE > eyi > RMSEy - 1.96 * SRMSE 22.18 ± 7.29 = 14.89 to 29.46
Table 2: Initial Root Mean Square error (RMSE)
8
9. 40 Figure 8, is a graph of the initial
35 horizontal accuracy distribution from 0
30 to 60 feet for all 259 permit features.
25 The distribution has two peaks at either
# 20 extreme representing a large number of
15 high accuracy features ( 0.5 feet) and
10
a large number low accuracy features
5
0
( 60 feet) in between the curve is
randomly distributed and contains a
61
0.5
4.5
8.5
12.5
16.5
20.5
24.5
28.5
32.5
36.5
40.5
44.5
48.5
52.5
56.5
significant number of features.
Buffer Distance (ft)
300 Figure 9, is a graph of the cumulative
horizontal accuracy distribution. In the
250
best case scenario this would be a
200 straight line across the y–axis at 259
# 150 indicating that all features had
100 accuracies of 0.5 feet. About 10% of
the features have accuracies 0.5 feet,
50
then there is a steady stream of features
0 of various accuracies up to 60 feet
0.5
4.5
8.5
12.5
16.5
20.5
24.5
28.5
32.5
36.5
40.5
44.5
48.5
52.5
56.5
61
(80%), and lastly about 10% of the
records were not measured because
Buffer Distance (ft) their accuracy was 60 feet.
Figure 10, is a classified map of the
initial horizontal accuracies. Permits in
green have accuracies of 1 foot,
yellow from 2 to 59 feet, and red from
60 feet to 999. Where 999 represents
features beyond our 60 foot buffer
distance.
9
10. Phase I Correction was applied once the RMSE for the feature class and individual feature
accuracies had been generated. Phase I correction consisted of buffering features at the
previously determined accuracy, using this buffer to clip parcels, and then building higher
accuracy replacement polygons. Buffer-Overlay was then used to re-calculate the horizontal
accuracy for all permits.
Figure 11 is a graph of the initial (red)
140
and Phase I (green) accuracy
120 distribution for all 259 permits. After
100 correction the number of features with
80 displacements of 1 foot increased by
# 105 records or 40%.
60
40
20
0
61
0.5
4.5
8.5
12.5
16.5
20.5
24.5
28.5
32.5
36.5
40.5
44.5
48.5
52.5
56.5
Buffer Distance (ft)
Initial Phase 1 Correction
Figure 12 is a close-up view of the
35
curve for horizontal accuracies between
30
0 and 30. Here we see that the
25 amplitude of the curve has been
20 reduced and that the Phase I curve
#
15 (green) runs above the initial conditions
10 (red) for accuracies 1 foot and below
5 for the rest of the curve.
0
4.5
6.5
0.5
2.5
8.5
10.5
12.5
14.5
16.5
18.5
20.5
22.5
24.5
26.5
28.5
Buffer Distance (ft)
Initial Phase 1 Correction
10
11. 300 Figure 13 is a graph of the cumulative
curve for both the Initial (red) and
250 Phase I (green) conditions. Here we see
200 that an addition of 105 records now
# 150
have accuracies of 1 foot.
100
50
0
0.5
4.5
8.5
12.5
16.5
20.5
24.5
28.5
32.5
36.5
40.5
44.5
48.5
52.5
56.5
61
Buffer Distance (ft)
Initial Phase 1 Correction
In Table 3, the before and after RMSE are provided for comparison displaying a significant
reduction in the mean of the RMSE.
Confidence interval on the estimate of Initial Phase I
RMSEx at 95% probability X/Y Dimension Values X/Y Dimension Values
RMSEx + 1.96 * SRMSE > exi > RMSEx - 1.96* SRMSE 20.22 ± 5.74 = 14.49 to 25.95 15.52 ± 5.78 = 9.75 to 21.3
RMSEy + 1.96 * SRMSE > eyi > RMSEy - 1.96 * SRMSE 22.18 ± 7.29 = 14.89 to 29.46 12.76 ± 4.6 = 8.16 to 17.35
Table 3: RMSE Initial and Phase 1 Correction
In Figure 14, two maps are shown depicting the horizontal accuracy before (left) and after Phase I (right).
Figure 14: Before and After Accuracy Classification
11
12. In the process of building higher accuracy features in Phase I some polygons could not be built
because of clipped line segments that did not form a complete rings. Phase II atempts to correct
these features by adding line segment at dangling nodes in order to form a complete ring. This
operation resulted in 6% or 15 additional records being classified as 1 foot (Figure 15 and 16).
35
Initial
30 Phase 1 Correction
Phase 2 Correction
25
20
#
15
10
5
0
0.5
1.5
2.5
3.5
4.5
5.5
6.5
7.5
8.5
9.5
10.5
11.5
12.5
13.5
14.5
15.5
16.5
17.5
18.5
19.5
20.5
21.5
22.5
23.5
24.5
25.5
26.5
27.5
28.5
29.5
Buffer Distance (ft)
Figure 15: Initial, Phase I, and Phase II Horizontal Accuracy Distribution
300
250
200
# 150
Initial
Phase 1 Correction
100
Phase 2 Correction
50
0
61
24.5
0.5
2.5
4.5
6.5
8.5
10.5
12.5
14.5
16.5
18.5
20.5
22.5
26.5
28.5
30.5
32.5
34.5
36.5
38.5
40.5
42.5
44.5
46.5
48.5
50.5
52.5
54.5
56.5
58.5
Buffer Distance (ft)
Figure 16: Initial, Phase I, and Phase II Cumulative Curves
12
13. Discussion
Many of the data sets stewarded by geospatial professionals are based on or directly related to
higher accuracy data sets that could be used to improve horizontal spatial accuracy. In this paper
we have demonstrated the use of Buffer-Overlay to determine and improve the accuracy of
permits whose boundaries are related to higher accuracy parcel boundaries. The initial accuracy
assessment included the RMSE for the feature class and then each feature was assigned a
horizontal spatial accuracy from 0 to 60 feet at 0.5 foot intervals. Phase I used these accuracy
measures to clip parcel lines and build higher accuracy polygons. The results were a 40%
increase in the number of records with accuracies 1 foot. Phase II examined those records that
failed to build in Phase I. Line segments were added between node gaps in order to form rings
that could be built into polygons. The result was a 6% increase in the number of records with
accuracies 1 foot.
In general, we find that Buffer Overlay is an effective method for quantifying and improving the
accuracy of features where control data exists. Most data stewards would acknowledge having a
data set that should be improved but lack the time and money to make such improvements. The
cost of improving data using Buffer Overlay is confined to algorithm development and time
requirements if automated boil down to CPU cycles leaving the steward free to focus on the
capture and accuracy of new data.
References
[1] Goodchild, F.M., and G.J. Hunter, 1997. A simple positional accuracy measure for linear
features, International Journal of Geographical Information Sciences, 11(3):299-306.
[2] Atkinson, A.D.J., and F. Ariza, 2002. Nuevo Enfoque para el Analisis de la Calidad
Posicional en cartografica Mediante Estudios Basados en la geometria Lineal, Proceedings XIV
International Congress of Engineering Graphics, Santander, Spain.
[3] Tveite, H., and S. Langaas, 1999. An accuracy assessment method for geographical line data
sets based on buffering, International Journal of Geographical Information Sciences, 13(1): 27-
47.
13