Jian Qin, Syracuse University
Jian Qin, Syracuse University; Alex Ball, UKLON; Jane Greenberg, University of North Carolina at Chapel Hill: “Functional and Architectural Requirements for Metadata: Supporting Discovery and Management of Scientific Data”
Panel: Linked data and metadata (co-sponsored by the ASIS&T Digital Libraries SIG)
Research Data Access & Preservation Summit 2013
Baltimore, MD April 4, 2013 #rdap13
RDAP 16: Building Without a Plan: How do you assess structural strength? (Pan...
RDAP13 Jian Qin: Functional and Architectural Requirements for Metadata
1. RDAP2013,http://www.asis.org/rdap/
Functional and Architectural
Requirements for Metadata:
Supporting Discovery and
Management of Scientific Data
Jian Qin Alex Ball Jane Greenberg
School of Information Digital Curation Centre School of Information and
Studies UKOLN Library Science
Syracuse University University of Bath University of North Caroline
USA UK Chapel Hill, USA
Research Data Access & Preservation Summit
Baltimore, MD, 2013
2. RDAP2013,http://www.asis.org/rdap/ 2
Metadata standards for scientific data
Ecological
Metadata
Language (EML)
CSDGM
Access to Biological
Collections Data –
ABCD
Darwin Core
6. RDAP2013,http://www.asis.org/rdap/ 6
Motivation for adoption
• Standardize the format and terminology of
metadata
• Enable fast and effective discovery of datasets
across different data repositories
• Enable data sharing, reuse, and preservation
• Provide information for obtaining datasets
from the data owners
7. RDAP2013,http://www.asis.org/rdap/ 7
Hindering factors
• large numbers of • steep learning curve
elements • difficult to automate
• many layers in structure metadata generation
– “Unwieldy to apply” • unnecessary duplicate
• been created for data entry
manual data • high costs in
entry, rather than for time, resource, and
automatic generation personnel expertise
8. RDAP2013,http://www.asis.org/rdap/ 8
Same entity data repeated
in the same record…
Seamless Daily Precipitation for the Conterminous United States
Metadata:
Identification_Information
Data_Quality_Information
Spatial_Data_Organization_Information
Spatial_Reference_Information
Entity_and_Attribute_Information
Distribution_Information
Metadata_Reference_Information
10. RDAP2013,http://www.asis.org/rdap/ 10
Research questions
• What functions do metadata standards for
scientific data serve?
• How should metadata standards for scientific
data be modeled to support these functions
by meeting the associated requirements?
11. RDAP2013,http://www.asis.org/rdap/ 11
Functions expected
• Resource discovery and use,
• Data interoperability,
• Automatic and semi-automatic metadata
generation,
• Linking of publications and underlying
datasets,
• Data/metadata quality control, and
• Data security.
15. RDAP2013,http://www.asis.org/rdap/ 15
Identity metadata
• Person: researcherID, • Name repositories
URI, FOAF, ORCID • Linked data architecture
• Institution: ORCID, URI • Customizable research
• Data object: DOI, group/community
Handle, URI member name lists
• Associated publication:
DOI
16. RDAP2013,http://www.asis.org/rdap/ 16
Semantic metadata
• Large semantic resources available in linked data format, but
usually not suitable for representing scientific data because
they are designed for publications, especially books and
journals (containers)
• Format is contemporary but the content is far from it
Smaller, specialized semantic
resources are necessary for automatic
semantic metadata generation
17. RDAP2013,http://www.asis.org/rdap/ 17
Contextual
metadata
• Provenance
Provenance data model
(W3C, http://www.w3.org/TR/prov-primer/)
Provenance data represent the origins of
digital objects and describe the entities and
activities involved in producing and delivering
or otherwise influencing a given object.
18. RDAP2013,http://www.asis.org/rdap/
Geospatial metadata
Biological sciences Climate
Ecological
Darwin Metadata
Biological Core NetCDF
Data Profile Language
(DwC) (EML) Climate and
Forecast (CF)
Georeferencing Metadata
Shoreline elements Conventions
Metadata
Profile FGDC Georeferencing
elements Astronomy
CSDGM Profiles
CSDGM
Astronomy
Visualization
Metadata
ISO 19115: 2003 Standard
Geographic information -
Metadata.
18
19. RDAP2013,http://www.asis.org/rdap/ 19
Temporal metadata
• Mean solar time • Different measurement
• Civil time systems result in
• GPS time different units and
format
• Terrestrial time
• Conversion between
• Atomic time systems
• …
• Geologic time
21. RDAP2013,http://www.asis.org/rdap/ 21
TABLE 1. Mapping data user tasks with metadata functions and
architectural building blocks
Data Metadata function Architectural building block
user
tasks
Discover Descriptive metadata Identity and semantic metadata
Identify Descriptive metadata Identity metadata
Select Descriptive, technical
Identity, semantic, scientific
metadata context, geospatial, temporal,
miscellany metadata
Obtain Descriptive metadata Identify metadata
Verify Descriptive metadata Scientific context metadata
Analyze Scientific context, geospatial, and
temporal metadata
Manage Descriptive, Identify, semantic, scientific
administrative, context, geospatial, temporal,
structural, and miscellany metadata
technical metadata
Archive Descriptive, Identify, semantic, scientific
administrative, context, geospatial, temporal,
structural, and miscellany metadata
technical metadata
Publish Descriptive metadata Identity, semantic, scientific
context, geospatial, and temporal
metadata
Cite Descriptive metadata Identify metadata
22. RDAP2013,http://www.asis.org/rdap/ 22
Development potentials
• An infrastructure of metadata services
– Entities as linked data
– Tools for “slicing” members by research
group, community, or institution to customize the
entity set
– Tools for grabbing entity data from existing
resources through interoperability protocols
23. RDAP2013,http://www.asis.org/rdap/ 23
Application scenarios
• Cross-domain discovery and verification
• Automatically populating entity information
from customized slices of entities into
metadata records
• And more…
24. RDAP2013,http://www.asis.org/rdap/ 24
Conclusion
• Scientific data are inherently complex and diverse
• Functional metadata requirements should be
translated into an effective and efficient architecture
– Three principles for modeling metadata for
scientific data
• Metadata for scientific data (or other domains at
large) should adopt an infrastructure service approach
• Much to be explored, experimented, and evaluated
Although there are vast records in bib data and indexing database, identity metadata is still being repeatedly created due to the lack of interoperability and infrastructural service