Quality key users

Quality issues…Evaluation…Metadata
Resolution-Levels of details
Antti Jakobsson
Key Users Meeting 15th Oct 2010, Brussels

Key sucesses of the WP8
• Utilization of International and Open standards
• Common understanding of what quality means
in respect to the target specifications and user
requirements and
• How to measure it !
• Provision of these results in metadata
• Automation of the quality evaluation services

Benefits
• Early data error
detection;
• Faster product
turnaround;
• Reduced
maintenance costs;
• Consistent evaluation
procedures
• Better harmonisation;
• Improved spatial
analysis;
• Confident decision
making;
• Data that is trusted
and usable.
Data providers Data consumers

Quality Spreadsheets
GEOGRAPHICAL
NAMES 0
DATA
QUALITY
ELEMENTS
COM
PLE
TEN
ESS
LOGICAL
CONSISTENCY
POSITIONAL
ACCURACY
TEMPORAL
ACCURACY
THEMATIC
ACCURACY
FEATURE
TYPE &
Attributes
COM
MIS
SIO
N
OMISSI
ON
CONCE
PTUAL
CONSIS
TENCY
DOMAIN
CONSIS
TENCY
FORMA
T
CONSIS
TENCY
TOPOL
OGICAL
CONSIS
TENCY
ABSOL
UTE
ACCUR
ACY
RELATI
VE
ACCUR
AY
GRIDDE
D DATA
POSITI
ON
ACCUR
ACY
ACCUR
ACY OF
A TIME
MEASU
REMEN
T
TEMPO
RAL
CONSIS
TENCY
TEMPO
RAL
VALIDIT
Y
CLASSI
FICATIO
N
CORRE
CTNES
S
NON-
QUANTI
TATIVE
ATTRIB
UTE
CORRE
CTNES
S
QUANTI
TATIVE
ATTRIB
UTE
ACCUR
ACY
NamedPlace DQ
basic
measure
error
rate: Id 7
DQ
basic
measure
error
count: Id
10
inspireId DQ
basic
measure
error
count: Id
16
name
(Geographical
Name)

Sampling/Full Inspection
The cells of DQ basic measures are colour coded. The colours indicate the
evaluation procedure:
 Attribute inspection by sampling according to ISO
2859 series (yellow cell)
 Variable inspection by sampling according to ISO
3951-1 (green cell)
 Full inspection (orange cell)
FEATURES AND
ATTRIBUTES SAMPLING (ISO 2859)
FULL INSPECTION
(automatic) SAMPLING (ISO 3951)
ISO 2859 states the principles of testing
sufficient items of the whole population by
sampling. When expressed as two
integers the error ratios of data subsets
can be summed up to data set error rate
by dividing the total number of errors with
the total c
If errors exist (error count > 0) the sub set
should be rejected and corrective action
by the producer is needed. It is assumed
that the number of errors found is quite
small. The customer may be attempted to
make those few corrections them selves.
This i
ISO 3951 variable sampling gives reliable
results on small sample sizes. CE95/LE95
is close enough the upper limit (U) of the
standard on AQL 4 level. The ISO 3959
offers a clear acceptance criteria based
on the sample.
Mandatory
Voidable
Optional
According to INSPIRE Data
Specifications v3

Relevant data quality measures
Relevant ISO/TS 19138 data quality
measures
1 Name Rat
e of
exc
ess
ite
ms
Rate of
missing
items
Numbe
r of
items
not
compli
ant with
the
rules of
the
concep
tual
schem
a
Numbe
r of
invalid
overlap
s of
surface
s
Numbe
r of
items
not in
confor
mance
with
their
value
domain
Physic
al
structur
e
conflict
s
number
of
faulty
point-
curve
connec
tions
number
of
missing
connec
tions
due to
unders
hoots
number
of
missing
connec
tions
due to
oversh
oots
number
of
invalid
slivers
number
of
invalid
self-
interse
ct
errors
number
of
invalid
self-
overlap
errors
mean
value
of
position
al
uncerta
inties
(1D,
2D and
3D)
Linear
map
accura
cy at
95 %
signific
ance
level
Circula
r error
at 95 %
signific
ance
level
Misclas
sificatio
n rate
Rate of
incorre
ct
attribut
e
values
attribute
value
uncertainty
at 95 %
significance
level
2 Alias - - - overlapping
surfaces
- extraneous
nodes
undershoots overshoots slivers loops kickbacks - LMAS 95 % navigation
accuracy
-
3 Data
quality
element
compl
etenes
s
completenes
s
logical
consistency
logical
consistency
logical
consistency
logical
consistency
logical
consistency
logical
consistency
logical
consistency
logical
consistency
logical
consistency
logical
consistency
positional
accuracy
positional
accuracy
positional
accuracy
thematic
accuracy
thematic
accuracy
thematic accuracy
4 Data
quality
subeleme
nt
commi
ssion
omission conceptual
consistency
conceptual
consistency
domain
consistency
format
consistency
topological
consistency
topological
consistency
topological
consistency
topological
consistency
topological
consistency
topological
consistency
absolute or
external
accuracy
absolute or
external
accuracy
absolute or
external
accuracy
classification
correctness
non-
quantitative
attribute
correctness
quantitative attribute
accuracy
5 Data
quality
basic
measure
err
or
rate
error
rate
error
count
error
count
error
count
error
count
error
count
error
count
error
count
error
count
error
count
error
count
not
aplicab
le
LE95
or
LE95I,
depen
ding
on the
evaluat
ion
CE95 error
rate
error
rate
LE95 or
LE95(r),
depending
on the
evaluation
procedure

How to utilize the quality model
• Quality model will be transformed to a rule set and conformance
levels
• ELF specifications will include these for the NMCAs
• Automated tools utilizing the rule and conformance levels

Quality requirements/Conformance
levels
• To set the requirements use the quality measures
• To consider the nature of reality
– Feature vagueness
– Change rates
– Themes
• Suggested guidance for positional accuracy
• Suggestion on setting the classification of conformance levels

Setting conformance levels (examples)
• Geometric accuracy is critical and
mostly well defined characteristic of
cadastral parcels while the
geographical names like a name of a
lake does not have just one correct
location. Any location within the area
of the lake is acceptable.
• Completeness of transportation
network is important to know and it
can be explicitly evaluated. Wetlands
may be important areas in
hydrography but their existence or
delineation can be hard to evaluate
during a dry season

Quality evaluation Process
• Step 1: Applying the data quality measure to the data to be checked.
The procedure for this is described in the the ISO19113/19114
standards
• Step 2: Reporting the score for each measure in a report form for
each measure
• Step 3: Comparing the result from step two to the defined
conformance level
• In addition, two continuing steps can be done:
• Step 4: Summarizing the conformance results into one result for
each for each data quality elements
• Step 5: Summarising the results from step 4 into one overall dataset
result

Aggregation of data quality
conformance results
• Aggregation where the measurements are on different scales and have different
units. -> transform all the data quality quantitative results into conformance results
using a set of conformance levels/classes. See previous slides
• Aggregation for inhomogeneous data. This can be done by just reporting the lowest
quality found in the most remote areas (see nature of reality slide). Another way (the
one recommended here) is to use different conformance classifications for the
different kind of areas (urban, rural, remote), and then summarise based on
“conformance score”. To make this useful, a metadata description is needed to give
the distribution between the kinds of area.
• Reporting details. The simplest way of reporting is just to give one value for the
dataset. This can be a simple “passed” or “failed” with a reference to the product
specification. But doing a lot of work in quality assessment, and just report one value,
can be considered oversimplification. One way of giving quality statements as grades
may be useful on the step 4 and step 5 (see above)..

Grading data example
Grade Data Quality description
Excellent Only class A for all quality measures
Very good A majority of A’s, but also some B’s
Good A majority of B’s, some A’s, no C’s
Adequat Only a very few C’s, the other B’s and better
Marginal A majority of C’s but also some B’s
Not good No measure reached the class B (i.e. all measures on class
C)

Where you utilize quality webservices?
• If you are a data provider for SDI
– For quality control during production (automated)
called here conformance testing (this includes edge-
matching and generalization)
– For quality evaluation after the production (semi-
automated)
• If you are the SDI co-ordinator or data custodian
– For quality audit for process accreditation or data
certification doing either conformance testing and/or
quality evalution
• If you are customer or data user
– To evaluate usability using metadata information

Rulesets &
TemplatesDatabase
Object Oriented
Geospatial Rules Engine
Collaborative
Web-based
Rule Authoring
Web
Services
Interface
Data Quality
Evaluation
Service
Business
Rules
Data for Evaluation Quality Measures
Geospatial
Data File
Rule Builder:
Intuitive user interface
to author, agree and
manage DQ measures.
DQ Client Application:
Accessible, easy to use,
automatic Data Quality
Evaluation Service
DQ Rules Engine:
W3C Web Services interface
using open standards to
describe & execute
geospatial rule evaluation.
Rule Repository:
Data Quality Rules,
derived and guided
by Quality Model.
Web Feature
Service
Quality Evaluation Service
SOAP HTTP
40

DQ Rule Builder Environment
41

DQ Evaluation Service Concept
42

Metadata approach
• Metadata needed for discovery of datasets through metadata
catalogues and registries
• Metadata needed for the evaluation of those datasets, as to whether
they are of sufficient quality to meet end users’ needs
• Metadata specific to the requirements of the ELF specifications

Are we INSPIRE compliant?
• Yes…. We suggest some of the measures to be changed in the
future editing of the INSPIRE data specification
• There are some mistakes in the current specification that should be
corrected
• We also propose additional mesures

ESDIN/INSPIRE difference Admin units
Suggested
by INSPIRE
Data
Specificatio
n v3
Administrati
ve Units
Section
Data Quality
Element
Data Quality
sub-element
ISO 19138
measure Measure name / Basic quality measure Scope ESDIN quality model Comment
7.1.1
Completene
ss Commission Id 3 Rate of excess items / error rate
dataset-
level
The same as
ESDIN
7.1.2
Completene
ss Omission Id 7 Rate of missing items / error rate
dataset-
level
The same as
ESDIN
7.2.1.1
Logical
Consistency
Topological
consistency Id 21 *
Number of faulty point-curve
connections / error count
dataset-
level
The same as
ESDIN
7.2.1.2
Logical
Consistency
Topological
consistency Id 23
Number of missing connections due
to undershoots / error count
dataset-
level
The same as
ESDIN
7.2.2
Logical
Consistency
Conceptual
consistency Id 9
Conceptual schema compliance /
correctness indicator
dataset-
level
Number of items
not compliant with
the rules of the
conceptual schema
/ error count
used ID 10 in
stead. Id 9
applicable just
on single
instance level
7.3.1
Positional
Accuracy
Absolute
External
positional
accuracy Id 28
Mean value of positional uncertainties
(1D,2D and 3D) / not applicable
dataset-
level
Linear map
accuracy at 95 %
significance level /
LE95 or LE95I
Not used, used
36 instead
* Id 21 in ISO 19138, but has the incorrect id 9 in INSPIRE DataSpecification
AU

Additional quality measures
Additional ones from
ESDIN WP8
Logical
Consistency
Conceptual
consistency Id 11 Number of invalid overlaps of surfaces / error count dataset-level Topological consistency
Logical
Consistency
Domain
consistency Id 16
Number of items not in conformance with their value
domain / error count dataset-level
Logical
Consistency
Conceptual
consistency Id 10
Number of items not compliant with the rules of the
conceptual schema / error count dataset-level
Logical
Consistency
Format
consistency Id 19 Physical structure conflicts / error count dataset-level
Logical
Consistency
Topological
consistency Id 25 number of invalid slivers / error count dataset-level
Logical
Consistency
Topological
consistency Id 26 number of invalid self-intersect errors / error count dataset-level
Logical
Consistency
Topological
consistency Id 27 number of invalid self-overlap errors / error count dataset-level
Positional
accuracy
Absolute or
external
postitional
accuracy Id 36
Linear map accuracy at 95 % significance level / LE95
or LE95I dataset-level
Thematic
accuracy
Classification
correctness Id 61 Misclassification rate / error rate dataset-level
Thematic
accuracy
Non-quantitative
attribute
correctness Id 67 Rate of incorrect attribute values / error rate dataset-level

Resolution and Level of Details
Target level of detail
Scale
1:2,500,000
1:1,000,000
1,500,000
1,250,000
1,100,000
1,50,000
1:25,000
1:10,000
1:5,000
1:2,500
Global
Target level of detail
Regional
Master
Urban
Rural
Level of details
Mountainous
Target level
of detail

Conclusions
• It is important that INSPIRE will give a platform for data quality
information; minimum data quality comformance levels set and then
ability to report other user community related conformance levels
• Quality evaluation metadata should be available for automated
conformance testing
• Introducing a quality model which uses a same principles for all
Annex I themes -> we will suggest this a guideline for INSPIRE
implementation
• Introducing comformance levels that can be evaluated using semi-
automated or automated based on ISO standards
• Automation of quality evaluation and conformance testing can be
done for all transformation related workflows including schema
transformation, generalization and edge matching
• Significant saving potential in quality reporting and improvement of
data

Quality key users

Recomendados

Recomendados

Más contenido relacionado

Similar a Quality key users

Similar a Quality key users (20)

Más de Antti Jakobsson

Más de Antti Jakobsson (12)

Último

Último (20)

Quality key users