This document discusses measuring metadata quality for records in Europeana. It proposes establishing a Europeana Data Quality Committee and developing a "Metadata Quality Assurance Framework" tool to measure metadata quality across Europeana's large collection. Key metrics would include completeness, field cardinality, uniqueness, multilinguality and conformance to requirements. The tool would provide customizable quality measurements, reports, and recommendations to help improve metadata quality.
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Measuring Metadata Quality in Europeana (ADOCHS 2017)
1. Measuring Metadata Quality
of Europeana records
ADOCHS meeting
Royal Library, Bruxelles, 2017-11-21.
Péter Király, peter.kiraly@gwdg.de
Gesellschaft für wissenschaftliche
Datenverarbeitung mbH Göttingen (GWDG)
2. Measuring metadata quality. Glossary
2
★ Metadata here: cultural heritage metadata (descriptions of books etc.)
★ Europeana a metadata aggregator from 3500+ cultural heritage
institutions http://europeana.eu
★ Big Data here: 10-100 million metadata records, 100 GB - 1.5 TB
★ EDM Europeana Data Model, Europeana’s metadata schema
★ MARC MAchine Readable Catalog, a library metadata standard
3. Measuring metadata quality. Generic title and bad thumbnail
3
more examples in Report and Recommendations from the Task Force on Metadata Quality (2015)
4. Measuring metadata quality. Multilinguality problem
4
★ Mona Lisa → 456
results
★ La Gioconda → 365
results
★ La Joconde → 71
results
http://www.europeana.eu/portal/en/record/90402/RP_F_00_351.html
5. Measuring metadata quality. Problems with title
5
more examples in Report and Recommendations from the Task Force on Metadata Quality (2015)
title: "VOETBAL-EREDIVISIE-
FEYENOORD - GO AHEAD 3-1",
description: "VOETBAL-EREDIVISIE-
FEYENOORD - GO AHEAD 3-1"
Same title and description
title: "NLD-820630-AMSTERDAM:
Straatmuzikanten proberen
geld te verdienen voor...",
Machine-readable ID in title
title: "+++EMPTY+++"
Leftover
6. Measuring metadata quality. Non-informative values
6
non informative dc:title:
“photograph, framed”,
“group photograph”
“photograph”
informative dc:title:
“Photograph of Sir Dugald Clerk”,
“Photograph of "Puffing Billy"”
bad good
7. Measuring metadata quality. Copy & paste cataloging
7
from a template?
more examples in Report and Recommendations from the Task Force on Metadata Quality (2015)
8. Measuring metadata quality. The problem
8
there are “good” and “bad” metadata records
but we don’t have clear metrics like this:
functional requirements
good
acceptable
bad
9. Measuring metadata quality. Why data quality is important?
9
“Fitness for purpose” (QA principle)
purpose: to access content
no metadata no access to data no data usage
more explanation:
Data on the Web Best Practices
W3C Working Draft, https://www.w3.org/TR/dwbp/
10. Measuring metadata quality. Hypothesis
10
by measuring structural elements we
can approximate metadata record quality
≃ metadata smell
11. Measuring metadata quality. Purposes
11
★improve the metadata
★services: good data → reliable functions
★better metadata schema & documentation
★propagate “good practice”
12. Measuring metadata quality. Proposal I.
12
Europeana Data Quality Committee
★ Analysing/revising metadata schema
★ Functional requirement analysis
★ Problem catalog
★ Multilinguality
13. Measuring metadata quality. Proposal II.
13
“Metadata Quality Assurance Framework”
a generic tool for measuring metadata quality
★ adaptable to different metadata schemes
★ scalable (to Big Data)
★ understandable reports for data curators
★ open source
15. Measuring metadata quality. What to measure?
15
★Structural and semantic features
Completeness, cardinality, uniqueness, length, dictionary entry, data type
conformance, multilinguality (generic metrics)
★Functional requirement analysis / Discovery scenarios
Requirements of the most important functions
★Problem catalog
Known metadata problems
16. Measuring metadata quality. Metadata requirements / User scenario
16
“As a user I want to be able to filter by whether a person is the
subject of a book, or its author, engraver, printer etc.”
Metadata analysis
Description of relevant metadata elements and their rules
Measurement rules
★ the relevant field values should be resolvable URI
★ each URI should be associated with labels in multiple languages
17. Measuring metadata quality. Metadata requirements / element—function map
17
Europeana sub-dimensions MARC Summary of Mapping to User Tasks
18. Measuring metadata quality. The data aggregation workflow (in Europeana)
18
data transformations Europeana Data Model (EDM)
Dublin Core,
LIDO, EAD,
MARC, EDM
custom, ...
19. Measuring metadata quality. Measurement
19
overall view collection view record view
Completeness
Field cardinality
Uniqueness
Multilinguality
Language specification
Problem catalog
etc.
links
measurements
aggregated statistics
metrics
20. Measuring metadata quality. Measurement - Field frequency per collections
20
no record has alternative title
every record has alternative title
filters
21. Measuring metadata quality. Measurement - Details of field cardinality
21
128 subjects in one record
median is 0, mean is close to 1
link to interesting records
23. Measuring metadata quality. Measurement - Distinct Languages
23
Text w/o language annotation (dc.subject: Germany):
Text w language annotation (dc.subject: Germany@en)
Text w several language annotations (dc.subject:
Germany@en, Deutschland@de)
Link to (multilingual) vocabulary (http://www.geonames.org
/2921044/federal-republic-of-germany)
0
1
2
n
25. Measuring metadata quality. Measurement - Good example
25
dc:description
dc:title
Place/skos:prefLabel
Descriptive fields Subject headings
"Brandenburger Tor"@de
"Brandenburg Gate"@en
"Grenzübergang Potsdamer Platz"@de
"Postdamer Platz border crossing"@en
"Reichstag"@de
"Reichstag building"@en
"Die Mauer muß weg!"@de
"Die Mauer muß weg! (The
Wall must go!)"@en
"Kommentiertes Fotorama mit
Bildern von 1989-1990 in
Berlin"@de
"Annotated images from 1989-
1990 in Berlin"@en
27. Measuring metadata quality. Engineering - Batch API
27
client Metadata QA
/batch/measuring/start
sessionID
/batch/[recordId]
csv
for each records
/batch/measuring/stop
“success” | “failure”
/batch/analyzing/start
“success” | “failure”
/batch/analyzing/status
“in progress” | “ready”
/batch/analyzing/retriev
e
compressed package
periodically
measurement
analysis
28. Measuring metadata quality. Community bibliography
28
zotero.org/groups/metadata_assessment
dlfmetadataassessment.github.io
29. Measuring metadata quality. Further steps
29
★Translate the results into
documentation,
recommendations
★Communication with data
providers
★Human evaluation of metadata
quality
★Cooperation with other projects
★Incorporating into ingestion
process
★Shape Constraint Language
(SHACL) for defining patterns
★Process usage statistics
★Measuring changes of scores
★Machine learning based
classification & clustering
human analysis technical