Webinar from the Mountain West Digital Library
Sandra McIntyre, MWDL Director
Anna Neatrour, MWDL Digital Metadata Librarian
Want to understand what happens behind the scenes with the MWDL harvesting? In this webinar, Sandra McIntyre and Anna Neatrour will explain the Open Archives Initiative Protocol for Metadata Harvestiong (OAI-PMH) and how it makes metadata aggregation possible in the MWDL. They will explain the process of harvesting and how MWDL normalizes your metadata. They will also show you how you can learn from your collections' OAI stream by using the six query verbs (requests) defined in the OAI-PMH.
Harvesting Using the Open Archives Initiative Protocol: What Your OAI Stream Can Tell You
1. Harvesting Using the
Open Archives Initiative Protocol:
What Can Your OAI Stream Tell You?
Sandra McIntyre, MWDL Director
Anna Neatrour, MWDL Digital Metadata Librarian
3. Open Archives
Initiative
Open Archives Initiative
http://openarchives.org
“Standards for Web Content
Interoperability”
• Facilitate the efficient dissemination of
content contained in archives/repositories
• Low-barrier framework and standards
4. Why is a protocol
necessary?
“Give me...”
“I want it.”
“I have it.”
OAI Harvester
OAI Provider
“Here is what you requested.”
8. OAI Harvesters
Mountain West Digital Library
http://mwdl.org
OAIster
http://oaister.worldcat.org
and included in WorldCat
Digital Public Library
of America
http://dp.la/
Institute of Museum & Library Services
Digital Collections and Content
http://imlsdcc.grainger.uiuc.edu
...and thousands more
9. Harvesting at MWDL
Utah State
Archives
Utah
State
Library
Univ of
Nevada Las
Vegas
Univ of
Nevada
Reno
Utah Dvsn
Arts &
Museums
Salt Lake
Comm.
College
Arizona
Memory
Project
Snow
College
Northern
Arizona
Univ
Weber
State
Univ
Univ of
Idaho
Utah
State
Univ
Family
Search
Utah
Valley
Univ
LDS Church
History
Southern
Utah
Univ
Montana
Memory
Project
Stacks
(Idaho)
BYU
Univ of
Utah
Idaho
State
Archives
Mountain
West
Digital
Library
Boise State
Univ.
10. Why understand OAI?
• Predict what will happen with your
metadata when it is harvested
• Do self-auditing and/or peer auditing of
metadata: See patterns and find errors
11. Other metadata
harvesting options
• Handing over a hard drive
• Uploading/downloading via file transfer
protocol (FTP)
• Other requests of XML (typically
application programming interfaces,
APIs):
– Web Services
– X-Services
12. Advantages of OAI
• Update at a distance, anytime
• Specify desired records
– By collection
– By date range of last change to record
• Packets, one at a time
• Works fast
• Repeatable
22. ListSets
“What sets do you have available?”
http://contentdm.li.suu.edu/oai/oai.php?verb=ListSets
OAI
query
OAI
Harvester
OAI
Provider
“Here is the list of sets.”
OAI
response
26. ListRecords
“Give me the metadata for all
records in qualified Dublin Core.”
http://contentdm.li.suu.edu/oai/oai.php?verb=ListRecords&
metadataPrefix=oai_qdc
OAI
query
OAI
Harvester
OAI
Provider
“Here are the records.”
OAI
response
28. ListRecords
• One set only:
http://contentdm.li.suu.edu/oai/oai.php
?verb=ListRecords&metadataPrefix
=oai_qdc&set=hist_photos
• If more than one screen of records, use a
resumption token to get the additional lists (200 at a
time in this example):
http://contentdm.li.suu.edu/oai/oai.php
?verb=ListRecords&resumptionTok
en=hist_photos:200:hist_photos:0000-0000:9999-99-99:oai_qdc
29. GetRecord
• One record only:
http://contentdm.li.suu.edu/oai/oai.ph
p?verb=GetRecord&metadataPrefix
=oai_qdc&identifier=oai:contentdm.li.
suu.edu:hist_photos/0
30. CONTENTdm’s
OAI Provider
• Turning on OAI: Administrative interface in the “Server” tab
• Choosing which collections to share
• Sharing compound object level metadata only
Image from CONTENTdm OAI guide: http://contentdm.org/help6/server-admin/oai.asp
34. Some Final Things
to Remember
• Check your own OAI stream and see what
it looks like!
– Mapped to none – not in OAI stream
– Hidden set to yes – not in OAI stream
– CONTENTdm field properties template and guide available
at: http://mwdl.org/getinvolved/getinvolved.php
– Login to collection admin, click on tab, go to fields to
check and edit properties
37. We’re here to help!
• For additional questions about
self-auditing your OAI contact
Anna Neatrour:
– anna.neatrour@utah.edu
– 801-587-8883
• Any Questions?
Notas del editor
NEEDS GRAPHIC
See this in slideshow view to see the animations!
Registered at http://openarchives.org
Open Access repositories of scholarly communications materials
SANDRA
NEED GRAPHICS
NEED GRAPHICS
NEEDS GRAPHIC
MOVE UPFind your base URLAdd to OAI pageUpdate OAI page with information about base URL for other platforms.OmekaContentdmBePressMWDL Harvesting Log – example to wind up and complete process what primo is doing with OAINormalization routines are runCounter examples – mapping that is wrong May not have set enabled for OAIMetadata formats associated with OAI. Dublin core among othersOAI provider may or may not be configured to provide qualified dublin core
awhof = Arizona Women’s Hall of Fame
ANNA
Use Identify to make sure that the OAI provider is set up and working. This is a great query to use if you are uncertain of the OAI provider URL for your digital asset management system and want to test it to be sure.
This is the information that is returned from an identify query. You will see here we have the repository name, and also the administrator/contact information for the person who administers the server.
ListSets asks what sets are available for harvest. This is a great thing to check yourself to make sure that all the collections are enabled for harvest that you want, or if you have a digital collection with some sort of restriction like on-campus access only, you can check to make sure that it isn’t available for harvest.
The set spec or alias for each collection is listed. If you have a new collection that you want to be added to MWDL, the set spec is one of the pieces of information I’ll need in order to get the harvesting set up.
What metadata formats are available?
Here you will notice that both simple dublin core or qualified dublin core are available from the SUU server. MWDL prefers to harvest in qualified dublin core if possible.
In real life if you are playing around with OAI queries in your browser, you might not run this, because it gives you all of the records from the available collections in qualified dublin core. That’s a lot of records! This is typical of the type of request that MWDL would make to harvest records, in whatever type of batch the server is set up to share.
Here we can see some records coming in from SUU. I can see the set spec hist_photos and go down and see the first record coming in, including all the descriptive information that is made available for that record.
I like to check the OAI for one set at a time when I’m checking out metadata to make sure that it matches up with the MWDL Dublin Core Application profile. This is something you can do too, if you want to do a quick check to make sure that all of the required fields are showing up correctly. You can also look at one record at a time by using the identifier associated with that record.
I like to check the OAI for one set at a time when I’m checking out metadata to make sure that it matches up with the MWDL Dublin Core Application profile. This is something you can do too, if you want to do a quick check to make sure that all of the required fields are showing up correctly. You can also look at one record at a time by using the identifier associated with that record.
CONTENTdm’s OAI provider can be accessed from the server tab, then click on harvesting to see the controls. Here, we’d want enable OAI set to YES! This is also where I could look up the base URL for my repository, if I wasn’t sure what it was. I could change the name of the repository to something more descriptive, include server admin e-mails. I would want to leave enable compound object pages set to “no”. If that’s enabled, all the individual pages would be harvested as single objects. MWDL would then end up with thousands of items called “page 1” or “page 2”.By default, if no collections are specified everything is published. You might run into a situation where you want to expose some collections but not others for harvesting, in which case, you would need to add the set spec or alias for each collection that should be harvested.
Here we can take a look at what a record with local field labels looks like vs the same record’s information in OAI. Notice how the local field labels disappear, so the classification information from the Western Soundscape Archive is all mapped to dc:description.
Repeated fields are merged into one in the MWDL. For example, the local record had multiple contributors listed, this information is now in one field. The source record also had separate rights statements for the creator of the sound recording of an animal and the creator of a photo of the animal. These statements are now in one field.
Here we can see the same record with slightly different information displayed in MWDL and DPLA. DPLA has different normalizing routines, for example the designation of the digital collection associated with the record as Western Soundscape Archive isn’t in the DPLA record, but people can still click through and view that information at the source record.
You can do some self-auditing to make sure that everything in your local collection is displaying in the manner in which you would like it to be harvested. We have a CONTENTdm field properties template and guide that you can use to help make sure everything is set up correctly.
Western Soundscape archive field properties. See where things have been set to no for “hide” and mapped to dublin core. If some of these fields were unmapped or set to hidden, they would not appear as harvestable in the OAI for the collection.
We have an OAI queries page with quick links to try for everything we went over during this presentation. This is also where you can find the CONTENTdm field properties template.
Thanks for participating in the webinar today! If you have any follow-up questions please feel free to contact me!