This document summarizes a discussion on metadata and content management systems. The discussion examined the value of metadata for effective information retrieval, potential problems with metadata like inconsistent standards and fields, and ways to safeguard against those problems. It also considered where metadata should be stored (embedded, centralized database, or both) and who should be responsible for creating and maintaining metadata (content creators, webmasters, librarians, etc.). Finally, it briefly discussed how content management systems could help address issues around metadata and content management.
IWMW 2002: The Value of Metadata and How to Realise It
1. Parallel Session on Metadata
The Value of Metadata and how
to Realise it..
Date 18th
June 2002
Facilitator: Dennis Nicholson
Centre for Digital Library Research
3. Theme: Examine, Discuss:
.…the value of using metadata as a aid
to reliable retrieval both within individual
Web sites and across distributed sites
….what the barriers to effective use of
metadata are and how they can be
overcome
….Who should be responsible for
creating and maintaining metadata -
resource creators; web-masters;
librarians?
4. Theme: Examine, Discuss:
….Whether embedding and harvesting
or a central database is the best
approach.
…plus (if time allows):
A step beyond, the value of Content
Management Systems
Focus: General
My background...
7. Effective Retrieval
What is it?
Balance of precision and recall best
suited to a given problem
High precision and low recall usually
preferred but in some cases (e.g. patents)
there may be an advantage in lowering
precision to boost recall
Level of precision and recall should be
under the user’s control not a side effect
of poor metadata
8. Effective Retrieval
Why does it matter?
Costs University, public purse to
create the material - a waste if the
people it is aimed at can’t find it
Strategic/PR considerations - If they
can’t find your courses or expertise
registers or digital images for sale if
and when you want or need them to
they won’t use you or talk or write
about you
9. Effective Retrieval
When does it matter?
Only if it is ‘stuff’ you want found
The bigger they come, the sooner
they fail…
The more ‘stuff’ you have, the more
campuses, or organisations in a
collaboration,the harder it is to ensure
effective retrieval
Especially with no or poor metadata
10. What is metadata?
Metadata is data about data
Consists of things like:
Author; Title; Subject; Description;
Level; Language; Viewer
Appropriate to function
The route to effective retrieval
Maybe...
11. What can go wrong?
Limited penetration (i.e. only some
available documents covered)
Misleading results for users
Different metadata record formats
Can the software cope? Is there a
cross-walk?
Incompatible core field sets
Cross-walk not possible
12. What can go wrong?
Different field sub-sets used (Both
use DC but different field set)
Full service limited to common fields
Different fields used for same data
element (I put subject headings in
subject field and free form keywords in
the keyword field but you put subject
headings in the keyword field)
Misleading results
13. What can go wrong?
Different or no standards applied in
creating data element content (e.g. Darwin,
C. or Charles Darwin)
Reduced retrieval; varied results
Different or no subject schemes and/or
category lists (Educational levels, LCSH v.
UNESCO v. made up)
Reduced retrieval; varied results
Insufficient granularity (If everything physical
is ‘physics’)
Poor precision, high recall
14. What can go wrong?
Varied or no methods of central
co-ordination (2 sites or campuses)
Can cause some of the other
problems listed above and below
Different sites index different fields
(One has subjects, keywords in one
index, another in separate indices)
Misleading for users
15. What can go wrong?
Missing indices (Nothing on the
subject in the index or no subject index?
(2 sites))
Misleading retrieval
Humans can cope but machines
can’t (A machine finds it harder to ‘spot’
different usages of the ‘same’ word or
alternative words for the same thing than
a human does)
Semantic web won’t work
16. Safeguards against:
Limited penetration
Policy? Training? DC Dot? Human monitor?
Different formats
Discover need, agree policy, set standards,
ensure software can cope with formats
Incompatible core field sets
Identify formats (DC, IMS, MARC?) then
agree core set of fields (e.g. 15 in DC base)
17. Safeguards against:
Different field sub-sets used
Agree, monitor, one core set
Different fields used for same data
element
Templates and examples, Central
co-ordination, Guidelines, Training
18. Safeguards against:
Different or no standards applied in
creating data element content
Template with examples
Different or no subject schemes
and/or category lists
Agree single schemes or lists, have
drop down lists, upgrade centrally
19. Safeguards against:
Insufficient granularity
Agree usable level, training, examples
Varied or no methods of central
co-ordination (2 sites or campuses)
Make sure it doesn’t happen!
Different sites index different fields
Agree approach, implement and
monitor standards
20. Safeguards against:
Missing indices
Agree not to do this, and warn users if
you can’t agree
Humans can cope but machines
can’t (semantic web)
Use standard schemes, ontologies in
standard ways and map between
different ones in a way that your
software can process
21. Where to keep it?
Pros and Cons of:
Embedding and harvesting:
Metadata creation more likely? Harder to
co-ordinate, easier to resource? More
often out of date? Harder to ensure
standardised metadata?
A central database
Easier to co-ordinate, more expensive to
resource? Easier to maintain standards?
How to ensure new stuff notified?
22. Where to keep it?
Pros and Cons of:
A mix of the two?
Worst of both worlds? Or best? How to
ensure the latter? Optimise author input of
embedded metadata but allow central
upgrades by metatada experts? I this
feasible? Is it cost-effective?
Depends on other factors?
A question of designing to be fit for
purpose?
23. Whose Responsibility?
Candidates; Their pros and cons:
Resource creators?
Au fait with the resource; Labour saving
Web-masters?
Au fait with the technical landscape
Librarians?
Au fait with knowledge and metadata domains
Public Relations?
Au fait with the needs of the University
Anybody else?
All of the above? Co-ordinated by?
24. Other Related Issues
A CMS would ensure :
Currency; Accuracy; Legality; Authority of
Content retrieved by metadata
Not to mention
Uniform look and feel control; easy total
redesign and global changes; all content
tracked; joint authorship across departments,
units, different institutions; easy repurposing
All who have some responsibility
can be involved in controlled way?
25. Facilities
It would provide:
Content authoring; collaborative
authoring; editing and workflow;
preventing unauthorised editing or
creation; scheduling publication;
tracking changes; personalising;
repurposing; metadata creation;
knowledge management through
semantic control
26. Closing Discussion…
Who has/plans to have a CMS?
What does it/will it cost?
Are they:
Essential? Optional? Impractical?
A threat to academic freedom?
Do they help solve the metadata
problem?
27. Useful URLs
Metadata
http://content.lib.washington.edu/METADATA/ (Why should we care?)
http://www.ukoln.ac.uk/metadata/dcdot/
http://www.ukoln.ac.uk/web-focus/metadata/seminar-materials/exercises/dc-dot/
http://www.ukoln.ac.uk/metadata/dcassist/
Content Management Systems
http://www.ukoln.ac.uk/nof/support/help/papers/cms.htm (what are
they?)
http://www.ariadne.ac.uk/issue30/techwatch/ (Who needs them?)
http://www.cultivate-int.org/issue5/cms/ (CMS’s available)