SlideShare una empresa de Scribd logo
1 de 22
Semantics and Syntax of Dublin Core
Usage in Open Archives Initiative Data
Providers of Cultural Heritage Materials

Arwen Hutt, University of Tennessee
Jenn Riley, Indiana University
OAI-PMH

 Open Archives Initiative Protocol for Metadata Har
 Originally developed for sharing metadata

about e-prints
 Two players



Data providers
Service providers

 Requires unqualified Dublin Core be exposed

for all resources, but supplemental metadata
formats are allowed
Dublin Core [Unqualified]
 Simple, flexible metadata format
 15 elements



All repeatable
None required

 “Core” across all knowledge domains
“Cultural heritage” defined
 The intellectual creative and material output

of society
 Libraries, museums and archives generally
considered cultural heritage institutions
 Often primary source materials
 Tend to be older analog digitized for network
access
Significant variability in OAI metadata
 Ward: found that only a small number of DC

elements were used in the majority of OAI
records
 Liu: Arc service provider studied controlled
vocabulary usage in DC subject, type, format,
language, and date fields
 NSDL: found errors missing data, incorrect
data, confusing data, insufficient data
 UIUC: date, coverage, format, and type
vocabulary varies significantly
Goals of the study
 Focus on cultural heritage community
 Examined 3 DC fields: date, creator,

contributor



Semantic content
Syntactic form

 Results could inform community best

practices
 One step towards improving the overall
quality of OAI metadata
Harvesting statistics
 Successfully harvested metadata from 35

data providers
 750,945 total records harvested
 5% sample* from each data provider taken
for analysis (37,564 records)

* Minimum of 1 record per provider, values rounded up to the nearest whole number
Processing steps
 Date, creator, contributor elements extracted

into “silos”
 Repeated values grouped, keeping
connections between elements and the
records in which they appeared
 Certain characteristics tracked about each
element
 Example
Characteristics recorded for all
elements
 The presence of multiple discrete values in a

single element
<creator>Hutt, Arwen; Riley, Jenn</creator>

 The presence of pseudo-qualifiers within the

value that refined the meaning of the element
<creator>Berlin, Irving
[composer]</creator>

 Whether the value was appropriate within the

specified element based on DC rules and
usage guidelines
<date>Las Vegas, Nevada</date>
Additional characteristics of <date>
 The semantic type of the value (creation, copyright or

digitization)

<date>2000</date>

 The general specificity of the date (single date, range

or period)

<date>19th Century</date>

 Indication that a date is not definitive (that it is

estimated or approximate)

<date>ca. 1930</date>

 Whether the value is purely numeric or contains non-

numeric text

<date>March 18, 1902</date>
Additional characteristics of <creator>
and <contributor>
 The semantic type of the value (personal

name, corporate name or other)
<creator>Newton, Isaac</creator>

 Whether the entity is known, unknown or

ambiguous
<creator>Vermeer, Johannes, 1632-1675 ?</creator>

 Whether the value is inverted or in direct

order
<creator>Charles Schultz</creator>
Strategies for categorization
 Automatic

Iteratively developed
 Pattern matching
 Identification of commonly occurring values


 Manual


Where feasible

 Not perfect!
Findings for <date>
 Values largely appropriate for element
 Few “pseudo-qualifiers”
 Different events represented
 Values mostly numeric
 Many dates not expressible in W3CDTF
Findings for <creator>
 Values largely appropriate for element
 Most were personal names
 Many “pseudo-qualifiers,” in comparison to

other elements
 Often included information intended to
disambiguate a name
 Some indication of the use of controlled
vocabularies, but many different name forms
present
Findings for <contributor>
 Used infrequently
 Many values inappropriate for element
 Majority personal names, but higher

proportion of corporate names than
occurred in <creator>
 Few “pseudo-qualifiers”
OAI DC record & intellectual object
 1:1 principle – each DC record describes only

one version of a resource

BUT
 Cultural heritage materials often digitized

from analog originals, resulting in multiple
versions of each intellectual object
OAI DC record & intellectual object
 Two choices for data providers
 Adhere

to 1:1 rule but omit pertinent
information
 Violate the 1:1 rule but create more
complete records
 Many data providers in practice violate

the 1:1 rule
OAI DC record & aggregated search
environment
 Extraction of records from original

collection context
 Aggregation with records from other
collections
Moving towards better metadata –
some possibilities
 Remove the OAI requirement for simple

Dublin Core (or “the Nuclear Option”)
 Develop best practice documentation for
cultural heritage materials that deviate from
current DC best practice
 Combination of data provider education and
service provider normalization
 Improved communication between data and
service providers
 Encourage use of other metadata formats
supplementing simple DC
Some other relevant initiatives
 Digital Library Federation and NSDL OAI and

Shareable Metadata Best Practices Working
Group

Development of general OAI best practices
 Development of strategies for communication
with vendors


 DLF Aquifer Metadata Working Group
 Development of profile for DLF institutions
(strong focus on cultural heritage)
 Recommendations for specific metadata
elements
Plans for extension of this research
 Primary analysis of the subject, coverage and

publisher elements
 Analyze temporal information across date,
subject and coverage elements
 Analyze geographic information across
subject and coverage elements
 Analyze name information across creator,
contributor and publisher elements
These presentation slides:
http://www.dlib.indiana.edu/~jenlrile/presentations/jcdl2005/jcdl2005.ppt

Arwen Hutt
Metadata Librarian
University of Tennessee
Digital Library Center

Jenn Riley
Metadata Librarian
Indiana University Digital
Library Program

ahutt@utk.edu

jenlrile@indiana.edu

Más contenido relacionado

La actualidad más candente

Data Citation, The Dataverse Network ®, and Contributor Identifiers
Data Citation, The Dataverse Network ®, and Contributor IdentifiersData Citation, The Dataverse Network ®, and Contributor Identifiers
Data Citation, The Dataverse Network ®, and Contributor IdentifiersMicah Altman
 
Digital Infrastructure: Storage and Content Management
Digital Infrastructure: Storage and Content ManagementDigital Infrastructure: Storage and Content Management
Digital Infrastructure: Storage and Content ManagementNoreen Whysel
 
Sherif Metadata Talk - London (June 25th 2018)
Sherif Metadata Talk - London (June 25th 2018)Sherif Metadata Talk - London (June 25th 2018)
Sherif Metadata Talk - London (June 25th 2018)Getaneh Alemu
 
One Discovery Layer, Eight Front Doors: Implementing Blacklight @ IU
One Discovery Layer, Eight Front Doors: Implementing Blacklight @ IUOne Discovery Layer, Eight Front Doors: Implementing Blacklight @ IU
One Discovery Layer, Eight Front Doors: Implementing Blacklight @ IUCourtney McDonald
 
VRA_2015_CatalogingRoundup_Seneff
VRA_2015_CatalogingRoundup_SeneffVRA_2015_CatalogingRoundup_Seneff
VRA_2015_CatalogingRoundup_SeneffHeather Seneff
 
From the principle of sufficiency and necessity to metadata enriching
From the principle of sufficiency and necessity to metadata enrichingFrom the principle of sufficiency and necessity to metadata enriching
From the principle of sufficiency and necessity to metadata enrichingGetaneh Alemu
 
DLF Aquifer MODS Implementation Guidelines
DLF Aquifer MODS Implementation GuidelinesDLF Aquifer MODS Implementation Guidelines
DLF Aquifer MODS Implementation GuidelinesSarah Shreeves
 
Linked Data Principles and RDF: University of Florida Libraries, BIBFRAME Wor...
Linked Data Principles and RDF: University of Florida Libraries, BIBFRAME Wor...Linked Data Principles and RDF: University of Florida Libraries, BIBFRAME Wor...
Linked Data Principles and RDF: University of Florida Libraries, BIBFRAME Wor...Allison Jai O'Dell
 
MODS and RDA - ALA MidWinter 2007
MODS and RDA - ALA MidWinter 2007MODS and RDA - ALA MidWinter 2007
MODS and RDA - ALA MidWinter 2007Sarah Shreeves
 
Kevin Long - DRI Training Series Day UCC: Organising Your Collection
Kevin Long - DRI Training Series Day UCC: Organising Your CollectionKevin Long - DRI Training Series Day UCC: Organising Your Collection
Kevin Long - DRI Training Series Day UCC: Organising Your Collectiondri_ireland
 
Visualising Dissertations on Electronic Literature (Visualising E-lit seminar...
Visualising Dissertations on Electronic Literature (Visualising E-lit seminar...Visualising Dissertations on Electronic Literature (Visualising E-lit seminar...
Visualising Dissertations on Electronic Literature (Visualising E-lit seminar...Jill Walker Rettberg
 
Metadata for researchers
Metadata for researchers Metadata for researchers
Metadata for researchers Getaneh Alemu
 
Beyond the catalogue : BibFrame, Linked Data and Ending the Invisible Library
Beyond the catalogue : BibFrame, Linked Data and Ending the 	Invisible LibraryBeyond the catalogue : BibFrame, Linked Data and Ending the 	Invisible Library
Beyond the catalogue : BibFrame, Linked Data and Ending the Invisible LibraryKsenija Mincic Obradovic
 
FAST Update
FAST UpdateFAST Update
FAST UpdateOCLC
 
Metadata enriching and discovery at Solent University Library
Metadata enriching and discovery at Solent University Library Metadata enriching and discovery at Solent University Library
Metadata enriching and discovery at Solent University Library Getaneh Alemu
 
DSpace for Cultural Heritage: adding support for images visualization,audio/v...
DSpace for Cultural Heritage: adding support for images visualization,audio/v...DSpace for Cultural Heritage: adding support for images visualization,audio/v...
DSpace for Cultural Heritage: adding support for images visualization,audio/v...Andrea Bollini
 
Establishing the significant properties of digital research
Establishing the significant properties of digital researchEstablishing the significant properties of digital research
Establishing the significant properties of digital researchGarethKnight
 

La actualidad más candente (20)

Data Citation, The Dataverse Network ®, and Contributor Identifiers
Data Citation, The Dataverse Network ®, and Contributor IdentifiersData Citation, The Dataverse Network ®, and Contributor Identifiers
Data Citation, The Dataverse Network ®, and Contributor Identifiers
 
Digital Infrastructure: Storage and Content Management
Digital Infrastructure: Storage and Content ManagementDigital Infrastructure: Storage and Content Management
Digital Infrastructure: Storage and Content Management
 
Northwestern digital repository initiative: platform and persistence
Northwestern digital repository initiative: platform and persistence Northwestern digital repository initiative: platform and persistence
Northwestern digital repository initiative: platform and persistence
 
Resources, resources, resources: the three rs of the Web
Resources, resources, resources: the three rs of the WebResources, resources, resources: the three rs of the Web
Resources, resources, resources: the three rs of the Web
 
Sherif Metadata Talk - London (June 25th 2018)
Sherif Metadata Talk - London (June 25th 2018)Sherif Metadata Talk - London (June 25th 2018)
Sherif Metadata Talk - London (June 25th 2018)
 
One Discovery Layer, Eight Front Doors: Implementing Blacklight @ IU
One Discovery Layer, Eight Front Doors: Implementing Blacklight @ IUOne Discovery Layer, Eight Front Doors: Implementing Blacklight @ IU
One Discovery Layer, Eight Front Doors: Implementing Blacklight @ IU
 
VRA_2015_CatalogingRoundup_Seneff
VRA_2015_CatalogingRoundup_SeneffVRA_2015_CatalogingRoundup_Seneff
VRA_2015_CatalogingRoundup_Seneff
 
Digitization: Utah State Archives
Digitization: Utah State ArchivesDigitization: Utah State Archives
Digitization: Utah State Archives
 
From the principle of sufficiency and necessity to metadata enriching
From the principle of sufficiency and necessity to metadata enrichingFrom the principle of sufficiency and necessity to metadata enriching
From the principle of sufficiency and necessity to metadata enriching
 
DLF Aquifer MODS Implementation Guidelines
DLF Aquifer MODS Implementation GuidelinesDLF Aquifer MODS Implementation Guidelines
DLF Aquifer MODS Implementation Guidelines
 
Linked Data Principles and RDF: University of Florida Libraries, BIBFRAME Wor...
Linked Data Principles and RDF: University of Florida Libraries, BIBFRAME Wor...Linked Data Principles and RDF: University of Florida Libraries, BIBFRAME Wor...
Linked Data Principles and RDF: University of Florida Libraries, BIBFRAME Wor...
 
MODS and RDA - ALA MidWinter 2007
MODS and RDA - ALA MidWinter 2007MODS and RDA - ALA MidWinter 2007
MODS and RDA - ALA MidWinter 2007
 
Kevin Long - DRI Training Series Day UCC: Organising Your Collection
Kevin Long - DRI Training Series Day UCC: Organising Your CollectionKevin Long - DRI Training Series Day UCC: Organising Your Collection
Kevin Long - DRI Training Series Day UCC: Organising Your Collection
 
Visualising Dissertations on Electronic Literature (Visualising E-lit seminar...
Visualising Dissertations on Electronic Literature (Visualising E-lit seminar...Visualising Dissertations on Electronic Literature (Visualising E-lit seminar...
Visualising Dissertations on Electronic Literature (Visualising E-lit seminar...
 
Metadata for researchers
Metadata for researchers Metadata for researchers
Metadata for researchers
 
Beyond the catalogue : BibFrame, Linked Data and Ending the Invisible Library
Beyond the catalogue : BibFrame, Linked Data and Ending the 	Invisible LibraryBeyond the catalogue : BibFrame, Linked Data and Ending the 	Invisible Library
Beyond the catalogue : BibFrame, Linked Data and Ending the Invisible Library
 
FAST Update
FAST UpdateFAST Update
FAST Update
 
Metadata enriching and discovery at Solent University Library
Metadata enriching and discovery at Solent University Library Metadata enriching and discovery at Solent University Library
Metadata enriching and discovery at Solent University Library
 
DSpace for Cultural Heritage: adding support for images visualization,audio/v...
DSpace for Cultural Heritage: adding support for images visualization,audio/v...DSpace for Cultural Heritage: adding support for images visualization,audio/v...
DSpace for Cultural Heritage: adding support for images visualization,audio/v...
 
Establishing the significant properties of digital research
Establishing the significant properties of digital researchEstablishing the significant properties of digital research
Establishing the significant properties of digital research
 

Similar a Semantics and Syntax of Dublin Core Usage in Open Archives Initiative Data Providers of Cultural Heritage Materials

Towards OpenURL Quality Metrics: Initial Findings
Towards OpenURL Quality Metrics: Initial FindingsTowards OpenURL Quality Metrics: Initial Findings
Towards OpenURL Quality Metrics: Initial Findingsalc28
 
Collision course presentation (corrrect)
Collision course presentation (corrrect)Collision course presentation (corrrect)
Collision course presentation (corrrect)William Worford
 
A demonstration of transparent and scalable OpenURL quality metrics for use i...
A demonstration of transparent and scalable OpenURL quality metrics for use i...A demonstration of transparent and scalable OpenURL quality metrics for use i...
A demonstration of transparent and scalable OpenURL quality metrics for use i...alc28
 
Linked Data - the Future for Open Repositories?
Linked Data - the Future for Open Repositories?Linked Data - the Future for Open Repositories?
Linked Data - the Future for Open Repositories?Adrian Stevenson
 
Charleston 2012 - The Future of Serials in a Linked Data World
Charleston 2012 - The Future of Serials in a Linked Data WorldCharleston 2012 - The Future of Serials in a Linked Data World
Charleston 2012 - The Future of Serials in a Linked Data WorldProQuest
 
RDA 101: an introduction to RDA (2012)
RDA 101: an introduction to RDA (2012)RDA 101: an introduction to RDA (2012)
RDA 101: an introduction to RDA (2012)Alison Hitchens
 
OCLC Research @ U of Calgary: New directions for metadata workflows across li...
OCLC Research @ U of Calgary: New directions for metadata workflows across li...OCLC Research @ U of Calgary: New directions for metadata workflows across li...
OCLC Research @ U of Calgary: New directions for metadata workflows across li...OCLC Research
 
Knowledge Engineering for TELDAP
Knowledge Engineering for TELDAPKnowledge Engineering for TELDAP
Knowledge Engineering for TELDAPAAT Taiwan
 
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...Cory Lampert
 
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...Carole Goble
 
How to expose research data in EOSC
How to expose research data in EOSCHow to expose research data in EOSC
How to expose research data in EOSCEUDAT
 
Alphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata Matters
Alphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata MattersAlphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata Matters
Alphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata MattersNew York University
 
Identifiers for Researchers and Data: Increasing Attribution and Discovery– J...
Identifiers for Researchers and Data: Increasing Attribution and Discovery– J...Identifiers for Researchers and Data: Increasing Attribution and Discovery– J...
Identifiers for Researchers and Data: Increasing Attribution and Discovery– J...ALISS
 
Metadata standards
Metadata standardsMetadata standards
Metadata standardsmakammer
 
Digital Libraries
Digital LibrariesDigital Libraries
Digital LibrariesJack Eapen
 
Digital Libraries
Digital LibrariesDigital Libraries
Digital LibrariesJack Eapen
 
Toward universal information access on the digital object cloud
Toward universal information access on the digital object cloudToward universal information access on the digital object cloud
Toward universal information access on the digital object cloudNational Institute of Informatics
 
RDA Presentation
RDA PresentationRDA Presentation
RDA Presentationjendibbern
 

Similar a Semantics and Syntax of Dublin Core Usage in Open Archives Initiative Data Providers of Cultural Heritage Materials (20)

Towards OpenURL Quality Metrics: Initial Findings
Towards OpenURL Quality Metrics: Initial FindingsTowards OpenURL Quality Metrics: Initial Findings
Towards OpenURL Quality Metrics: Initial Findings
 
Collision course presentation (corrrect)
Collision course presentation (corrrect)Collision course presentation (corrrect)
Collision course presentation (corrrect)
 
A demonstration of transparent and scalable OpenURL quality metrics for use i...
A demonstration of transparent and scalable OpenURL quality metrics for use i...A demonstration of transparent and scalable OpenURL quality metrics for use i...
A demonstration of transparent and scalable OpenURL quality metrics for use i...
 
Linked Data - the Future for Open Repositories?
Linked Data - the Future for Open Repositories?Linked Data - the Future for Open Repositories?
Linked Data - the Future for Open Repositories?
 
Charleston 2012 - The Future of Serials in a Linked Data World
Charleston 2012 - The Future of Serials in a Linked Data WorldCharleston 2012 - The Future of Serials in a Linked Data World
Charleston 2012 - The Future of Serials in a Linked Data World
 
UAEU_MDL_Slides_rev1.ppt
UAEU_MDL_Slides_rev1.pptUAEU_MDL_Slides_rev1.ppt
UAEU_MDL_Slides_rev1.ppt
 
RDA 101: an introduction to RDA (2012)
RDA 101: an introduction to RDA (2012)RDA 101: an introduction to RDA (2012)
RDA 101: an introduction to RDA (2012)
 
OCLC Research @ U of Calgary: New directions for metadata workflows across li...
OCLC Research @ U of Calgary: New directions for metadata workflows across li...OCLC Research @ U of Calgary: New directions for metadata workflows across li...
OCLC Research @ U of Calgary: New directions for metadata workflows across li...
 
Knowledge Engineering for TELDAP
Knowledge Engineering for TELDAPKnowledge Engineering for TELDAP
Knowledge Engineering for TELDAP
 
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
 
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
 
Snac webinar v3
Snac webinar v3Snac webinar v3
Snac webinar v3
 
How to expose research data in EOSC
How to expose research data in EOSCHow to expose research data in EOSC
How to expose research data in EOSC
 
Alphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata Matters
Alphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata MattersAlphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata Matters
Alphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata Matters
 
Identifiers for Researchers and Data: Increasing Attribution and Discovery– J...
Identifiers for Researchers and Data: Increasing Attribution and Discovery– J...Identifiers for Researchers and Data: Increasing Attribution and Discovery– J...
Identifiers for Researchers and Data: Increasing Attribution and Discovery– J...
 
Metadata standards
Metadata standardsMetadata standards
Metadata standards
 
Digital Libraries
Digital LibrariesDigital Libraries
Digital Libraries
 
Digital Libraries
Digital LibrariesDigital Libraries
Digital Libraries
 
Toward universal information access on the digital object cloud
Toward universal information access on the digital object cloudToward universal information access on the digital object cloud
Toward universal information access on the digital object cloud
 
RDA Presentation
RDA PresentationRDA Presentation
RDA Presentation
 

Más de Jenn Riley

Understanding Metadata: Looking Forward
Understanding Metadata: Looking ForwardUnderstanding Metadata: Looking Forward
Understanding Metadata: Looking ForwardJenn Riley
 
The future of cataloguing? Future cataloguers!
The future of cataloguing? Future cataloguers!The future of cataloguing? Future cataloguers!
The future of cataloguing? Future cataloguers!Jenn Riley
 
Discovery elsewhere
Discovery elsewhereDiscovery elsewhere
Discovery elsewhereJenn Riley
 
Designing the Garden: Getting Grounded in Linked Data
Designing the Garden: Getting Grounded in Linked DataDesigning the Garden: Getting Grounded in Linked Data
Designing the Garden: Getting Grounded in Linked DataJenn Riley
 
Launching metaware.buzz
Launching metaware.buzzLaunching metaware.buzz
Launching metaware.buzzJenn Riley
 
Getting Comfortable with Metadata Reuse
Getting Comfortable with Metadata ReuseGetting Comfortable with Metadata Reuse
Getting Comfortable with Metadata ReuseJenn Riley
 
Handout for Digital Imaging of Photographs
Handout for Digital Imaging of PhotographsHandout for Digital Imaging of Photographs
Handout for Digital Imaging of PhotographsJenn Riley
 
Digital Imaging of Photographs
Digital Imaging of PhotographsDigital Imaging of Photographs
Digital Imaging of PhotographsJenn Riley
 
The Open Archives Initiative and the Sheet Music Consortium
The Open Archives Initiative and the Sheet Music ConsortiumThe Open Archives Initiative and the Sheet Music Consortium
The Open Archives Initiative and the Sheet Music ConsortiumJenn Riley
 
Cushman Exposed! Exploiting Controlled Vocabularies to Enhance Browsing and S...
Cushman Exposed! Exploiting Controlled Vocabularies to Enhance Browsing and S...Cushman Exposed! Exploiting Controlled Vocabularies to Enhance Browsing and S...
Cushman Exposed! Exploiting Controlled Vocabularies to Enhance Browsing and S...Jenn Riley
 
Handout for FRBR; or, How I learned to stop worrying and love the model
Handout for FRBR; or, How I learned to stop worrying and love the modelHandout for FRBR; or, How I learned to stop worrying and love the model
Handout for FRBR; or, How I learned to stop worrying and love the modelJenn Riley
 
Metadata for Brittle Books Page Turner
Metadata for Brittle Books Page TurnerMetadata for Brittle Books Page Turner
Metadata for Brittle Books Page TurnerJenn Riley
 
Digitizing and Delivering Audio and Video
Digitizing and Delivering Audio and VideoDigitizing and Delivering Audio and Video
Digitizing and Delivering Audio and VideoJenn Riley
 
Handout for Alphabet Soup: Choosing Among DC, QDC, MARC, MARCXML, and MODS
Handout for Alphabet Soup: Choosing Among DC, QDC, MARC, MARCXML, and MODSHandout for Alphabet Soup: Choosing Among DC, QDC, MARC, MARCXML, and MODS
Handout for Alphabet Soup: Choosing Among DC, QDC, MARC, MARCXML, and MODSJenn Riley
 
Alphabet Soup: Choosing Among DC, QDC, MARC, MARCXML, and MODS
Alphabet Soup: Choosing Among DC, QDC, MARC, MARCXML, and MODSAlphabet Soup: Choosing Among DC, QDC, MARC, MARCXML, and MODS
Alphabet Soup: Choosing Among DC, QDC, MARC, MARCXML, and MODSJenn Riley
 
Handout for Merging Metadata from Multiple Traditions: IN Harmony Sheet Music...
Handout for Merging Metadata from Multiple Traditions: IN Harmony Sheet Music...Handout for Merging Metadata from Multiple Traditions: IN Harmony Sheet Music...
Handout for Merging Metadata from Multiple Traditions: IN Harmony Sheet Music...Jenn Riley
 
Merging Metadata from Multiple Traditions: IN Harmony Sheet Music from Librar...
Merging Metadata from Multiple Traditions: IN Harmony Sheet Music from Librar...Merging Metadata from Multiple Traditions: IN Harmony Sheet Music from Librar...
Merging Metadata from Multiple Traditions: IN Harmony Sheet Music from Librar...Jenn Riley
 
Challenges in the Nursery: Linking a Finding Aid with Online Content
Challenges in the Nursery: Linking a Finding Aid with Online ContentChallenges in the Nursery: Linking a Finding Aid with Online Content
Challenges in the Nursery: Linking a Finding Aid with Online ContentJenn Riley
 
Making Interoperability Easier: Creating Shareable Metadata
Making Interoperability Easier: Creating Shareable MetadataMaking Interoperability Easier: Creating Shareable Metadata
Making Interoperability Easier: Creating Shareable MetadataJenn Riley
 

Más de Jenn Riley (20)

Understanding Metadata: Looking Forward
Understanding Metadata: Looking ForwardUnderstanding Metadata: Looking Forward
Understanding Metadata: Looking Forward
 
The future of cataloguing? Future cataloguers!
The future of cataloguing? Future cataloguers!The future of cataloguing? Future cataloguers!
The future of cataloguing? Future cataloguers!
 
Discovery elsewhere
Discovery elsewhereDiscovery elsewhere
Discovery elsewhere
 
Designing the Garden: Getting Grounded in Linked Data
Designing the Garden: Getting Grounded in Linked DataDesigning the Garden: Getting Grounded in Linked Data
Designing the Garden: Getting Grounded in Linked Data
 
Launching metaware.buzz
Launching metaware.buzzLaunching metaware.buzz
Launching metaware.buzz
 
Getting Comfortable with Metadata Reuse
Getting Comfortable with Metadata ReuseGetting Comfortable with Metadata Reuse
Getting Comfortable with Metadata Reuse
 
Handout for Digital Imaging of Photographs
Handout for Digital Imaging of PhotographsHandout for Digital Imaging of Photographs
Handout for Digital Imaging of Photographs
 
Digital Imaging of Photographs
Digital Imaging of PhotographsDigital Imaging of Photographs
Digital Imaging of Photographs
 
The Open Archives Initiative and the Sheet Music Consortium
The Open Archives Initiative and the Sheet Music ConsortiumThe Open Archives Initiative and the Sheet Music Consortium
The Open Archives Initiative and the Sheet Music Consortium
 
Cushman Exposed! Exploiting Controlled Vocabularies to Enhance Browsing and S...
Cushman Exposed! Exploiting Controlled Vocabularies to Enhance Browsing and S...Cushman Exposed! Exploiting Controlled Vocabularies to Enhance Browsing and S...
Cushman Exposed! Exploiting Controlled Vocabularies to Enhance Browsing and S...
 
Handout for FRBR; or, How I learned to stop worrying and love the model
Handout for FRBR; or, How I learned to stop worrying and love the modelHandout for FRBR; or, How I learned to stop worrying and love the model
Handout for FRBR; or, How I learned to stop worrying and love the model
 
Metadata for Brittle Books Page Turner
Metadata for Brittle Books Page TurnerMetadata for Brittle Books Page Turner
Metadata for Brittle Books Page Turner
 
Digitizing and Delivering Audio and Video
Digitizing and Delivering Audio and VideoDigitizing and Delivering Audio and Video
Digitizing and Delivering Audio and Video
 
Variations2
Variations2Variations2
Variations2
 
Handout for Alphabet Soup: Choosing Among DC, QDC, MARC, MARCXML, and MODS
Handout for Alphabet Soup: Choosing Among DC, QDC, MARC, MARCXML, and MODSHandout for Alphabet Soup: Choosing Among DC, QDC, MARC, MARCXML, and MODS
Handout for Alphabet Soup: Choosing Among DC, QDC, MARC, MARCXML, and MODS
 
Alphabet Soup: Choosing Among DC, QDC, MARC, MARCXML, and MODS
Alphabet Soup: Choosing Among DC, QDC, MARC, MARCXML, and MODSAlphabet Soup: Choosing Among DC, QDC, MARC, MARCXML, and MODS
Alphabet Soup: Choosing Among DC, QDC, MARC, MARCXML, and MODS
 
Handout for Merging Metadata from Multiple Traditions: IN Harmony Sheet Music...
Handout for Merging Metadata from Multiple Traditions: IN Harmony Sheet Music...Handout for Merging Metadata from Multiple Traditions: IN Harmony Sheet Music...
Handout for Merging Metadata from Multiple Traditions: IN Harmony Sheet Music...
 
Merging Metadata from Multiple Traditions: IN Harmony Sheet Music from Librar...
Merging Metadata from Multiple Traditions: IN Harmony Sheet Music from Librar...Merging Metadata from Multiple Traditions: IN Harmony Sheet Music from Librar...
Merging Metadata from Multiple Traditions: IN Harmony Sheet Music from Librar...
 
Challenges in the Nursery: Linking a Finding Aid with Online Content
Challenges in the Nursery: Linking a Finding Aid with Online ContentChallenges in the Nursery: Linking a Finding Aid with Online Content
Challenges in the Nursery: Linking a Finding Aid with Online Content
 
Making Interoperability Easier: Creating Shareable Metadata
Making Interoperability Easier: Creating Shareable MetadataMaking Interoperability Easier: Creating Shareable Metadata
Making Interoperability Easier: Creating Shareable Metadata
 

Último

fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 

Último (20)

fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 

Semantics and Syntax of Dublin Core Usage in Open Archives Initiative Data Providers of Cultural Heritage Materials

  • 1. Semantics and Syntax of Dublin Core Usage in Open Archives Initiative Data Providers of Cultural Heritage Materials Arwen Hutt, University of Tennessee Jenn Riley, Indiana University
  • 2. OAI-PMH  Open Archives Initiative Protocol for Metadata Har  Originally developed for sharing metadata about e-prints  Two players   Data providers Service providers  Requires unqualified Dublin Core be exposed for all resources, but supplemental metadata formats are allowed
  • 3. Dublin Core [Unqualified]  Simple, flexible metadata format  15 elements   All repeatable None required  “Core” across all knowledge domains
  • 4. “Cultural heritage” defined  The intellectual creative and material output of society  Libraries, museums and archives generally considered cultural heritage institutions  Often primary source materials  Tend to be older analog digitized for network access
  • 5. Significant variability in OAI metadata  Ward: found that only a small number of DC elements were used in the majority of OAI records  Liu: Arc service provider studied controlled vocabulary usage in DC subject, type, format, language, and date fields  NSDL: found errors missing data, incorrect data, confusing data, insufficient data  UIUC: date, coverage, format, and type vocabulary varies significantly
  • 6. Goals of the study  Focus on cultural heritage community  Examined 3 DC fields: date, creator, contributor   Semantic content Syntactic form  Results could inform community best practices  One step towards improving the overall quality of OAI metadata
  • 7. Harvesting statistics  Successfully harvested metadata from 35 data providers  750,945 total records harvested  5% sample* from each data provider taken for analysis (37,564 records) * Minimum of 1 record per provider, values rounded up to the nearest whole number
  • 8. Processing steps  Date, creator, contributor elements extracted into “silos”  Repeated values grouped, keeping connections between elements and the records in which they appeared  Certain characteristics tracked about each element  Example
  • 9. Characteristics recorded for all elements  The presence of multiple discrete values in a single element <creator>Hutt, Arwen; Riley, Jenn</creator>  The presence of pseudo-qualifiers within the value that refined the meaning of the element <creator>Berlin, Irving [composer]</creator>  Whether the value was appropriate within the specified element based on DC rules and usage guidelines <date>Las Vegas, Nevada</date>
  • 10. Additional characteristics of <date>  The semantic type of the value (creation, copyright or digitization) <date>2000</date>  The general specificity of the date (single date, range or period) <date>19th Century</date>  Indication that a date is not definitive (that it is estimated or approximate) <date>ca. 1930</date>  Whether the value is purely numeric or contains non- numeric text <date>March 18, 1902</date>
  • 11. Additional characteristics of <creator> and <contributor>  The semantic type of the value (personal name, corporate name or other) <creator>Newton, Isaac</creator>  Whether the entity is known, unknown or ambiguous <creator>Vermeer, Johannes, 1632-1675 ?</creator>  Whether the value is inverted or in direct order <creator>Charles Schultz</creator>
  • 12. Strategies for categorization  Automatic Iteratively developed  Pattern matching  Identification of commonly occurring values   Manual  Where feasible  Not perfect!
  • 13. Findings for <date>  Values largely appropriate for element  Few “pseudo-qualifiers”  Different events represented  Values mostly numeric  Many dates not expressible in W3CDTF
  • 14. Findings for <creator>  Values largely appropriate for element  Most were personal names  Many “pseudo-qualifiers,” in comparison to other elements  Often included information intended to disambiguate a name  Some indication of the use of controlled vocabularies, but many different name forms present
  • 15. Findings for <contributor>  Used infrequently  Many values inappropriate for element  Majority personal names, but higher proportion of corporate names than occurred in <creator>  Few “pseudo-qualifiers”
  • 16. OAI DC record & intellectual object  1:1 principle – each DC record describes only one version of a resource BUT  Cultural heritage materials often digitized from analog originals, resulting in multiple versions of each intellectual object
  • 17. OAI DC record & intellectual object  Two choices for data providers  Adhere to 1:1 rule but omit pertinent information  Violate the 1:1 rule but create more complete records  Many data providers in practice violate the 1:1 rule
  • 18. OAI DC record & aggregated search environment  Extraction of records from original collection context  Aggregation with records from other collections
  • 19. Moving towards better metadata – some possibilities  Remove the OAI requirement for simple Dublin Core (or “the Nuclear Option”)  Develop best practice documentation for cultural heritage materials that deviate from current DC best practice  Combination of data provider education and service provider normalization  Improved communication between data and service providers  Encourage use of other metadata formats supplementing simple DC
  • 20. Some other relevant initiatives  Digital Library Federation and NSDL OAI and Shareable Metadata Best Practices Working Group Development of general OAI best practices  Development of strategies for communication with vendors   DLF Aquifer Metadata Working Group  Development of profile for DLF institutions (strong focus on cultural heritage)  Recommendations for specific metadata elements
  • 21. Plans for extension of this research  Primary analysis of the subject, coverage and publisher elements  Analyze temporal information across date, subject and coverage elements  Analyze geographic information across subject and coverage elements  Analyze name information across creator, contributor and publisher elements
  • 22. These presentation slides: http://www.dlib.indiana.edu/~jenlrile/presentations/jcdl2005/jcdl2005.ppt Arwen Hutt Metadata Librarian University of Tennessee Digital Library Center Jenn Riley Metadata Librarian Indiana University Digital Library Program ahutt@utk.edu jenlrile@indiana.edu

Notas del editor

  1. Our study was performed on metadata shared by OAI-PMH. OAI is protocol for sharing metadata, not content. Data providers “expose” metadata for service providers to come get. Service providers make use of that metadata in some way. Currently, by far the most common service provided is cross-repository searching. Our study focused on data providers of cultural heritage materials.
  2. Explain the difference b/n qualified and simple dc Example of 1:1 principle – mona lisa painting, leonardo painted many years ago– digital image created by jenn riley 2000
  3. All the research in this area talks about the variability problem Quickly!
  4. What we work with Better chance of finding patterns within a general community Semantic content – the meaning of the content of a metadata element Syntactic form – the structure or format of the value
  5. Processing performed with perl and xslt scripts Show sample Grouping Link back to original record Attributes – some attributes are used all silos but there are some that are specific to the different elements [We picked characteristics to record based on our experience as OAI data providers and on reports in the OAI literature]
  6. Date specificity – ming dynasty, 1980’s, 1900-1910, etc.
  7. Perl scripts Not perfect Too many records to manually check each one Certain characteristics require subjective judgements
  8. Digitization and creation most prevalent, but a few copyright also. 3 to 1 numeric to textual W3CDTF – profile of ISO 8601 recommended by Dublin Core as a best practice for date encoding but 17% of dates cannot be represented by it. we’re seeing a need for better support for variations on date values
  9. Extraction – on the horse example, Roosevelt Aggregation – Administrative metadata is rarely useful in an aggregated environment. - System hacks like to make all years of a date range searchable typing in every year in the range.