This document summarizes the history and development of metadata standards over the past 20 years on the web. It discusses how [1] Dublin Core was one of the first metadata standards for the web and focused on simplicity and cross-disciplinary use. [2] However, a lack of a common data model led to inconsistencies and impediments to interoperability. [3] Future development of metadata standards will continue through the W3C Library Linked Data Incubator Group to improve integration of library data on the semantic web.
1. Outline
The Context
Twenty Years of Metadata:
Lessons from the Dublin Core in the Metadata Matrix
First Two Decades of the Web What we did right
Stuart Weibel
University of Tsukuba Visiting Scholar
May 13, 2011 The major impediments
A few words about models
What about the future?
Image: Carved figures (Morikawa Toen), Tokyo National Museum
THe Context And now?
When I started working at OCLC in 1985: A cell phone has more computing power than the Space Shuttle
I was 4 years away from my first email address An iPod will hold WorldCat
A PC hard drive wasn’t large enough to store a
Bandwidth is more important than computing power
single high resolution digital image.
(which was ok, because…)
The library is still mostly mired in MARC
Cameras still used film me… circa 1994
There are many metadata standards (mostly struggling for traction)
Cell phones were suitcase-sized me… circa 1994
People (mostly) find things with Google
MARC Cataloging stood alone as the discovery tool for intellectual assets of
libraries
but….
No end-user access to the global library catalogs
Metadata is more than just 50 years of Metadata
search
MARC standards (library metadata)
Metadata-dependent actions
OCLC founded (shared library cataloging)
Describe ARPANET Operational - forerunner of the Internet
Networking diffuses throughout academia
Access
The Web begins... FRBR work begins
Encode/Render First Dublin Core Workshop
DCMI established
Preserve
Google is founded
Rights Management First Dublin Core Conference (Tokyo)
my first email
WorldCat introduced
Administer address
RDA introduced
“Bind” digital pages in digital books
1960s 1970s 1980s 1990s 2000s
2. The confusion:
Jenn Riley’s Metadata Map
How bad is it?
105 standards
30 most common across the top (3 predate the Web)
some share common models… most do not
Text
much overlap
many work together
Who among us can choose rationally from the array of
“This visual map of the metadata landscape is intended to assist standards, platforms, technologies?
planners with the selection and implementation of metadata standards.”
Will the results have any reasonable expectation of
http://www.dlib.indiana.edu/~jenlrile/metadatamap/ interoperability?
The real world is not The map is much more
standards-centric complicated
Metadata-
dependent actions Standard Information Entities (ex.) “This visual map of the metadata landscape is intended to assist
MARC, DC, MODS, Agents planners with the selection and implementation of metadata standards.”
Describe RDA, LCSH, MeSH…. (persons, corporate entities, devices)
Access HTTP, FTP…. Events
RDF, media-type
Encode/render dependent (many)
Time intervals or eras
Preserve PREMIS Concepts
Rights CC licenses,
Management eCommerce systems Collections
“selection and implementation of metadata standards requires a clear
Administer METS, MARC…. Media-types understanding of the information entities, the standards, and the
“Bind” digital pages METS, eBook
Structured data type
functional requirements of the system under design”
in digital books standards Image: Kyoto horizon from above the Tenru-ji Temple
Dublin Core in the
Things we did right
metadata matrix
We didn’t call it ‘cataloging’ (Web, not libraries)
The first metadata standard for
the Web A hybrid of technical engineering
and social engineering
General and cross-disciplinary
International - Major events on
5 continents, element definitions
Simple starting place, but
in 20+ languages (maintained in
extensible
Tsukuba)
International and multilingual Separated syntax and semantics
Consensus-driven (bottom-up, Built a community of practice
rather than top-down)
About the right level of complexity for a core element set
Image: Jomon Pottery, Tokyo National Museum,
Image: Harajuku train station platform, Tokyo
3. Impediments that tripped
us up Data Modeling: what is it?
Entity-relationship model defines the important concepts or things
Too many syntaxes to support
(HTML, XML, RDF-XML)
(entities), and the relationships among them
No common data model A model is a model, not reality
but we tried hard:
data model group, Designed to solve a problem,
architecture group,
not to emulate the real world
abstract model,
Singapore Framework...
The complexity of the model
Without a data model, the story we told was not consistent: confusion resulted should be mapped to the
problem, not to reality
Without a data model, details of implementation become arbitrary (and less
interoperable)
Image: Netsuke, Tokyo National Museum Identifying the right level of abstraction is an art Image: Edo Museum
Data Modeling: why is it An example of modeling
necessary? mismatch
Citation information
Without a shared
Date
understanding of the
important entities, and the Title
relationships among them, Author
systems will not Affiliation
interoperate easily
Email address
Cross-walks become
necessary: clumsy, - Which of the attributes are Dublin Core?
Changing rail car ‘bogeys’ on the
inaccurate, inefficient China/Mongolia border
- Is “email address” an attribute of the resource, or the person?
- Should there be a distinction between Title and Subtitle?
Is Dublin Core well-matched to the
problem of bibliographic description? The problem with models
Matching the complexity of models to a diverse and evolving
It is too simple to capture the precision of detailed
problem is challenging, and full of compromises
bibliographic description
too much complexity
BUT… It is good enough for many purposes, including the leads to failure
description of most simple internet resources (creeping elegance)
The trade-off between perfect matching of model and too little complexity
problem, and simplicity of use is always a compromise leads to failure
(insufficient richness
DC was intended for general resource description, not to to solve the problem)
replace MARC HOW DO YOU KNOW WHEN IT IS RIGHT?
Image: figures from a model in the Kyushu National Museum
4. Conceptual Models in the The Next Chapters in the Web
Library World Metadata story...
The dominant models for ...are being written in the W3C Incubator Group on Library Linked Data (http://
FRBR and FRAD www.w3.org/2005/Incubator/lld/)
bibliographic and authority data
Reference model for Open Many questions:
OAIS
Archive Information Systems
Will the data be open?
Conceptual Reference Model for
CIDOC CRM Who will maintain it?
cultural heritage documentation
Is semantic web infrastructure stable?
Can existing metadata be integrate
Largely unintelligible data model seamlessly into the web?
Dublin Core Abstract Model
for Dublin Core instance data
Can a model be agreed upon?
A vague framework describing
Singapore Framework
levels of metadata interoperability Will we ever have interoperability across domain silos?
Image: Stone Monk in the Nezu Museum Garden
stuart.weibel@gmail.com
http://weibel-lines.typepad.com
@stuartweibel on twitter
stuartweibel on Facebook
all photographs by the author
Image: Lantern overlooking the Irises in the Nezu Museum Garden