Breaking the Kubernetes Kill Chain: Host Path Mount
Discovery of IIIF Resources
1. Discovery of
IIIF Resources
Simeon Warner (Cornell University)
https://orcid.org/0000-0002-7970-7855
IIIF Showcase: Unlocking the World’s Digital Images
6 June 2017
4. ... and just last week
+ 30,000 images from the J Paul
Getty
trusthttp://news.getty.edu/article_display.cfm?ar
ticle_id=6172
+ 70,000 images from the Yale
Center for British Art
http://britishart.yale.edu/news
... and I suspect an increasing rate of increase
7. “IIIF only” search not supported
• No way to limit searches in commodity search
engines to IIIF content only
8.
9. Scope of Discovery work?
• IIIF-wide aggregators
o Services like PontIIIF
o Large online content vendors (aggregate IIIF
content and other content)
• Selective aggregators
o Subject specific (e.g. musiclibs, or a portal for
17th century manuscripts)
o Virtual collections that span institutions
• Shared interface for tooling
o Support search tools loosely coupled to rest of
ecosystem, aggregate own content
11. 1. Crawling and Harvesting
How does a service find and get descriptions of all
available IIIF resources?
IIIF allows remote content use – no need to harvest
images etc., just descriptions
• Format for publishing lists of resources
• Recommendations for processing those lists
• Validation service that checks the lists
• Registry of institutions’ lists
• Reference implementations for list generators and
consumers
12. Harvesting approaches...
• IIIF Collections
• Sitemaps
o with ResourceSync
o with IIIF Extensions
o with HTTP layer Extensions
• Schema.org
• ActivityStreams 2.0
All have pros and cons... ongoing discussion in
Discovery TSG calls (and later this week...)
13. Harvesting discussions informed
by experiments including:
• Europeana and National Library
of Wales
• NCSU
• ResourceSync
https://www.slideshare.net/NunoFreire2/new-approaches-for-data-acquisition-at-europeana-iii
14. 2. Content Indexing
IIIF Presentation API provides information to support
presentation only – does not facilitate fielded or
advanced search
But... provides facilities for linking to external
descriptions of the objects (in native metadata
formats)
• What common metadata formats do IIIF communities
use?
• What is best linking approach?
• How should one link back to the IIIF resource?
16. Great instructions for data
providers on musiclibs:
• use IIIF descriptions
• use related to link back to
provider site
• use attribution, license,
logo
• use seeAlso for structured
metadata
17. 3. Change Notification
After successful initial harvesting, how can services
stay up-to-date with changes?
• Avoid re-crawling
• Easier to stay up to date in a timely fashion
• Easier on the providing organization
• More efficient to index only known changes
Expect work on notifications only after harvesting and
indexing. Might involve a central hub for distribution of
notifications
18. 4. Import to Viewers
IIIF designed to allow re-use in different contexts, with
different viewing applications, as appropriate to user
needs
• Specification of how content providers and
discovery applications can allow the user to import
the IIIF content into externally hosted viewers
• Recommendations around consistent UI/UX patterns
• Validation service for the import process
• Reference implementations for generators and
consumers
20. Sub-group (Drew, Ed, Rashmi,
Simeon) to:
• Review existing and new
use cases
• Consider changes to the
current approach
• Draft a v0.1 specification for
TSG and broader review
before standard editorial
process
A few years ago discovery was easy. I’d liken the community to international space station: everyone working in IIIF knew everyone else, what they were doing, and where their IIIF images were.
There really are a huge number of resources available now. This is I think the 4th time these numbers have been shown today, but perhaps I’m the first to put them in a bad light: both the shear number and the distribution of these millions of images can make it rather hard to find the ones you want.
In the last week we’ve seen two major announcements of additional content
Of course, most people start their search on a commodity web search engine most of the time...
Here the Google result takes you to an IIIF resource, complete with an IIIF icon supporting drag and drop (as demonstrated earlier), and opportunities to bring the content up in two different viewers
Attempts to find IIIF images in commodity search engies lead to results about IIIF work, not images
A great example of search of IIIF content is the musiclibs search from McGill University in Canada. This is a community specific search service – information about digitized music scores is harvested from a number of sources. Here the screenshot shows a search for Monteverdi which returns hits from the Internet Archive and Gallica from the BnF. The result shown is from Gallica and you’ll note links to the source website on the right.
Harvesting is done via a number of ad-hoc means. Andrew notes that while there isn’t consistency between how they can harvest from the different sources, each is internally consistent.
Another example is the PontIIIF search service from Brumfield Labs that searches over manifests from multiple sources (currently 13). Manifests are harvested by following the collection hierarchy. The PontIIIF logs provide a nice “collection validation service” as a side effect.