The document discusses metadata, which is data that provides information about other data. It describes how metadata improves search, navigation, and organization of content. The document outlines common metadata standards like Dublin Core and how metadata is applied in formats like DocBook and DITA. It also discusses best practices for working with metadata, such as creating templates and using indexing tools, and predicts future directions for metadata including automated generation and social tagging.
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Metadata Primer
1. Metadata Primer
Selvakumar T.S
1 August 9, 2009 Cadence Confidential: Cadence Internal Use Only
2. Source: Siderean Software, Inc.
All of the answers are here. Now, what was the question?
2 August 9, 2009 Cadence Confidential: Cadence Internal Use Only
3. Issues with Information Access Today
• Tons of content from disparate sources.
• Cumbersome navigation.
• Keyword search assumes you know what you are
looking for.
• L
Large number of search results -- most of them
b f h lt t f th
irrelevant.
• Lack of context in search results.
• Search engines rely on mathematical algorithms to
determine relevance and ranking of search results.
Fortune 500 companies lost $12 billion due to
inability to find information in 2003
2003.
-IDC
3 August 9, 2009 Cadence Confidential: Cadence Internal Use Only
4. A Quick Demo on Information
Access Issues and Possibilities
Source:
4
5. Agenda
• Understanding Metadata
• Metadata Applications
• Metadata Standards
• Working with Metadata
• Future of Metadata
5
7. INVENT VE
N E TI
What is Metadata?
at s etadata
Data that provides
information about other
data.
– Merriam Webster’s
Online Dictionary
Data about data. For
example, the title, subject,
author, and size of a file
constitute metadata about
the file.
file
– Microsoft Computer
7
Dictionary, Fifth Edition
10. Metadata in HTML
<META name=<property> content=“<value>” />
10 August 9, 2009 Cadence Confidential: Cadence Internal Use Only
11. Metadata Reflects Content and User Needs
11 August 9, 2009 Cadence Confidential: Cadence Internal Use Only
12. Types of Metadata
Intrinsic: metadata that an object holds about itself
File name, file size …
Descriptive: metadata that describes th object
D i ti t d t th t d ib the bj t
Subject, title, audience, keywords …
Metadata describes the who, what, when, where and
Administrative and Rights: metadata used to manage
how about every facet of data
data.
the object
12
Create date, modify date, expiry
14. Improved Search with Metadata
• Filter search by metadata.
14 August 9, 2009 Cadence Confidential: Cadence Internal Use Only
15. Improved Navigation with Metadata
• Aggregate topics with same metadata to create
browseable indexes or categories.
15 August 9, 2009 Cadence Confidential: Cadence Internal Use Only
16. Display Context and Relationships with Metadata
Cross-marketing on amazon.com.
g
16 August 9, 2009 Cadence Confidential: Cadence Internal Use Only
17. Personalization and customization
• Display content according to role or audience
audience.
17 August 9, 2009 Cadence Confidential: Cadence Internal Use Only
18. Other Metadata Applications
• Discovery and compliance
– Identify the need to update, retain, protect, and dispose content
for i t
f internal or regulatory requirements.
l l t i t
• Interoperability …
– Content tagged with same metatags (
gg g (META name) from
)
different sources can be easily integrated.
Metadata allows unstructured content to be managed
like structured content.
18 August 9, 2009 Cadence Confidential: Cadence Internal Use Only
19. Metadata Standards
19 August 9, 2009 Cadence Confidential: Cadence Internal Use Only
20. Need for Metadata Standards
• Different information providers using different metadata
schemas.
• Even metadata schemas of groups within organizations
are different or out of sync.
• The result:
– Inconsistent search results.
– Lack of interoperability.
– Information silos.
– …
An US$ 2B Oil & Gas project suffered a loss of US$120M due to inability to
locate a document or a misunderstanding about which document is needed.
-SchemaLogic, Inc.
20 August 9, 2009 Cadence Confidential: Cadence Internal Use Only
21. Some Metadata Standards
• Dublin Core
• Metadata support in DocBook and
pp
DITA
• IMS Global Learning Consortium
• LOM (IEEE’s L
(IEEE’ Learning Obj
i Object
Metadata)
• SCORM (ADL) - Learning Objects
• EAD (Encoded Archival Description)
Standard formats and approaches enable interoperability and
the sharing of metadata.
g
21 August 9, 2009 Cadence Confidential: Cadence Internal Use Only
22. Dublin Core
• http://dublincore.org
• General purpose metadata standard for use across
domains.
• 15 core elements.
• El
Element qualifiers t narrow th meaning of elements.
t lifi to the i f l t
– Example: A Date Created versus a Date Modified.
• Encoding schemes: Controlled vocabularies or parsing
rules to refine the interpretation of an element.
– Example: A term from a controlled vocabulary such as the
Library of Congress Subject Headings
Headings.
• Can be represented in HTML and in XML (RDF).
22 August 9, 2009 Cadence Confidential: Cadence Internal Use Only
23. Dublin Core Metadata Elements
• Title
• Creator
• Subject
j
• Description
• Publisher
• Contributor
• Date
• Type
• Format
• Identifier
Id tifi
• Source
• Language
• Relation
• Coverage
• Rights
23 August 9, 2009 Cadence Confidential: Cadence Internal Use Only
24. Dublin Core Metadata Example
Source: http://www.sics.se/~preben/DC/DC_guide.html
24 August 9, 2009 Cadence Confidential: Cadence Internal Use Only
25. Metadata Support in DocBook
• Metadata at different levels
– title, info and bookinfo at book level
– title, info and chapterinfo at chapter level
– title, info and chapterinfo at section level
• DocBook supports Dublin Core schema
25 August 9, 2009 Cadence Confidential: Cadence Internal Use Only
26. Metadata Support in DITA
• DITA supports a variety of standard and custom
metadata:
– Author information
– Copyright information
– Product information
– Resource ID f h l systems
R IDs for help t
– Document tracking information
– Audience information
– Keywords
K d
– Custom metadata (otherprops)
• <prolog> element defines metadata at the topic level.
• <topicmeta> element defines metadata that applies to a
topic when it appears in a map.
• Metadata at every level
26 August 9, 2009 Cadence Confidential: Cadence Internal Use Only
27. Sample of Metadata elements within <prolog> element
<prolog>
<author> (name of topic’s author)
<copyright>
<critdates> (document tracking information)
<permissions>
<publisher>
<source>
<metadata>
<audience> (intended audience)
type=“user | purchaser | administrator | … | other”
othertype=
j
job=“installing | customizing | administering | … | other”
g g g
otherjob=
experiencelevel=“novice | general | expert”
<category> (content category used for grouping topics)
<keywords> (keywords for search engines)
<prodinfo>
<othermeta>
…
…
27 August 9, 2009 Cadence Confidential: Cadence Internal Use Only
29. Creating Metadata
• Create it from scratch.
• Reuse existing metadata and build on it.
• Start with a standard.
29 August 9, 2009 Cadence Confidential: Cadence Internal Use Only
30. Metadata Process
Create Add
Content Publish
Metadata
Review & Review &
Improve Improve
• Identify content that will benefit from metadata using the 80/20 rule.
• Build a controlled vocabulary or use a vocabulary from a
commercial source such as www.taxonomywarehouse.com.
y
– Example: The Getty Thesaurus of Geographic Names (TGN)
• Apply metadata to content using templates or using indexing tools.
• Get it reviewed.
reviewed
• Evaluate search logs and user surveys to improve metadata.
• Continuously review metadata.
30 August 9, 2009 Cadence Confidential: Cadence Internal Use Only
31. Metadata Template: A Manual Approach
31 August 9, 2009 Cadence Confidential: Cadence Internal Use Only
32. Metadata Indexing and Discovery Tools
• Data Harmony http://www.dataharmony.com
• Interwoven MetaTagger http://www.interwoven.com
• Mondeca http://www.mondeca.com
• MultiTes http://www.multites.com/
• Synaptica http://www.synaptica.com
• SchemaLogic http://www.schemalogic.com
• WebChoir http //
http://www.webchoir.com
ebchoir com
• WordMap http://www.wordmap.com/
32 August 9, 2009 Cadence Confidential: Cadence Internal Use Only
33. Future of Metadata
33 August 9, 2009 Cadence Confidential: Cadence Internal Use Only
34. Future of Metadata
• Automated metadata generation.
• Social tagging – tagging by users.
• Geo tagging.
34 August 9, 2009 Cadence Confidential: Cadence Internal Use Only
35. Social Tagging Example: tagging
35 August 9, 2009 Cadence Confidential: Cadence Internal Use Only