Building the New Open Linked Library

Building the New Open
Linked Library
Theory and Practice
…and results!

Keri Thompson, Joel Richard, Trish Rose- LITA National Forum, September 30,
Sandler 2011

Smithsonian Libraries
• Founded in 1846
• 1.5 m volumes in collection, plus
assorted archival collections
• 15,000 volumes scanned and online
• 20 libraries serving ~500
researchers/curators + hundreds of
fellows and interns
• 102 library staff
• 1.5 web staff
• Founding member of the Biodiversity
Heritage Library
LITA National Forum, September 30,
2011

Linked Data in our Library

WHY Linked Open Data?
• It’s cool
• “Increase and Diffusion of Knowledge”
• Share, contribute to a global database
• Create context around our data
• Allow data to be reused/repurposed by
ourselves and others
• Improve discoverability of our content

2011

Linked Data
“The Semantic Web isn’t just about putting data on the web. It is about
making links, so that a person or machine can explore the web of data.
With linked data, when you have some of it, you can find other, related,
data.”
Tim Berners-Lee, Linked Data – Design Issues

1. Use URIs as names for things
2. Use HTTP URIs so that people can look up those names
3. When someone looks up a URI, provide useful
information, using the standards (RDF, SPARQL)
4. Include links to other URIs, so that they can discover
more things.

2011

Linked Data Open Data
• Publishing structured data on the web • Freely available to use, reuse,
• RDF (Resource Description Framework) republish with no restrictions
• Enables queries computer 2 computer • Made available through various
mechanisms such as .csv files,
• uses standard ontologies (vocabularies)
APIs
• data in “triples” (“triplestore”)

URI http://library.si.edu/tl2/author/charles-darwin
Predicate owl:sameAs
Object http://viaf.org/viaf/27063124

2011

Our Website
Organically grown since 1995

• 83,000 HTML pages
• 3,700 ColdFusion pages
• 253,000 JPEG files
• 27,000 PNG files
• 46,000 PDFs

No CMS.
2011

Digital Library Planning
1. Analyze and categorize our current
& future online content
2. Create high-level data models for
common content types

Questions:
Where are we metadata-rich?
What do we have that others don’t?
What is feasible right now?

2011

Content Analysis
• 400+ Online “books”
• Exhibitions
• Research Tools
• Image Collections (60,000+ images)
• “Brochure” content (About us, Locations, Hours)
• Bibliographies, Fact Sheets, Subject Guides
• Databases, inventories and database-like books

Collections not on our website:
• ~15,000 digitized volumes, with many more planned
• Other analog collections that will be digitized

2011

Books (and book-like objects)
• expose bibliographic data for reuse
• consume links to other internal content
and external authoritative data
Databases
• expose data previously unavailable
• provide authoritative data
• consume our data and others’ to create
new aggregate websites

2011

Linked Digital Library Planning
1. Decide which data elements
should be exposed as linked data
for each content type
2. Choose appropriate vocabularies
3. Create a rough timeline and plan
for migrating site content (=1
year*)

* Optimism included in this estimate

2011


Implement all this linked open data
goodness (and a shiny new website) by
moving to Drupal 7

2011

Drupal and Linked Data
• Native support for RDFa in Drupal 7.
• RDF Extensions (rdfx) – even more features.
• Vocabularies can be imported and cached for
reuse.
• Few or no modifications to HTML to support
RDFa.

What’s the difference between RDF,
RDF/XML and RDFa?
2011

RDFa Sample
URI: http://library.si.edu/book/origin-of-species
<meta content="The Origin of Species"
about=”/book/origin-species" property="dc:title" />
<h1>The Origin of Species</h1>
<img typeof="foaf:Image"
src="http://localhost:8087/images/origin-of-species.png"
alt="The origin of species cover image”
title="The origin of species cover image" />
<div rel="bibo:authorList">
<a href="/content/darwin-charles-1809-1882">
Darwin, Charles, 1809-1882
</a>
</div>
<div property="dc:created">November 24, 1859</div>
<div property="bibo:numPages">1000</div>
<div property="dc:language">english</div>
<div rel="owl:sameAs">
<a href="http://www.worldcat.org/oclc/1184647"
target="_blank">http://www.worldcat.org/oclc/1184647</a>
</div>

2011

RDF/XML Sample
URI: http://library.si.edu/book/origin-of-species.rdf
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/terms/"
xmlns:bibo="http://purl.org/ontology/bibo/">
<rdf:Description rdf:about="http://localhost:8087/content/
origin-species">

<rdf:type rdf:resource="http://purl.org/ontology/bibo/Book"/>
<dc:title>The Origin of Species</dc:title>
<dc:created>November 24, 1859</dc:created>
<bibo:numPages>1000</bibo:numPages>
<dc:language>english</dc:language>
<bibo:authorList
rdf:resource="http://localhost:8087/content/darwin-charles"/>

<owl:sameAs rdf:resource=“http://www.worldcat.org/oclc/1184647”>
</rdf:Description>
</rdf:RDF>

2011

What other modules are we using?
• Fields, Views, Views UI
• Node Reference
• SPARQL Endpoint , SPARQL API
• RESTful Web Services
• SPARQL Views
• RDF External Vocabulary Importer

Caveat: Some modules not ready for Drupal 7
• i.e., Biblio module (no CCK, RDF capabilities)

2011

What about Namespaces/Vocabularies?
• Drupal 7 comes with several namespaces. We
will use: DC Terms, FOAF, SKOS, OWL
• We're working with books, so we
need the Bibliographic Ontology:
• Website: http://bibliontology.com/
• Namespace: http://purl.org/ontology/bibo/
• Prefix: “bibo”

• We may also create our own vocabulary.

2011

Adding a Namespace to Drupal

2011

Setting up RDF Mappings in Drupal

2011

Databases: TL-2
Taxonomic Literature 2 (1977-2009)
• The standard reference work for plant taxonomic
literature from Linnaeus to 1940.
• Contains botanists, authors, biographies, citations,
and species.
• Indexed and cross referenced.
• Should be digitized & on the web!
• SIL aims to be an authority for
botanist names on the Internet.

2011

TL-2 Page Sample

Taxonomic Literature 2 (TL-2). v1., p. 600

2011

TL-2 Page Sample
http://library.si.edu/tl2/author/darwin

tl2:creatorOf
http://library.si.edu/tl2/book/1313

owl:sameAs
http://viaf.org/viaf/27063124

dc:creator
http://library.si.edu/tl2/author/darwin

owl:sameAs
http://www.archive.org/details/
originofspecies00darwuoft

2011

TL-2 Page Sample http://library.si.edu/tl2/author/darwin
RDF Type = foaf:Person

foaf:lastName, foaf:familyName

foaf:firstName, foaf:givenName

foaf:name, skos:prefLabel

tl2:birthYear

tl2:deathYear

tl2:description

tl2:personAbbrev

RDF Type = bibo:Book

tl2:bookNumber

dc:title

event:place

dc:publisher

tl2:bookAbbreviation

dc:created

2011

TL-2 Page Sample Results
http://library.si.edu/tl2/author/darwin http://library.si.edu/tl2/book/1313

tl2:creatorOf dc:creator
“http://library.si.edu/tl2/book/1313” “http://library.si.edu/tl2/author/darwin”

owl:sameAs owl:sameAs
“http://viaf.org/viaf/27063124” ”http://www.archive.org/details/
originofspecies00darwuoft”
foaf:lastName “Darwin”
tl2:bookNumber “1313”
foaf:familyName “Darwin”
bibo:shortTitle “On the origin of species”
foaf:firstName “Charles”
dc:title “On the origin of species by means
foaf:givenName “Charles” of natural selection, or the preservation
of favoured races in the struggle for
foaf:name “Darwin, Charles Robert” life.”

skos:prefLabel “Darwin, Charles Robert” event:place “London”

tl2:birthYear “1809” dc:publisher “John Murray”

tl2:deathYear “1882” dc:created “1859”

tl2:description “British evolutionary biologist” tl2:bookAbbreviation “Origin sp.”

tl2:personAbbrev “Darwin”

2011

Setting up TL-2 in Drupal
• Two Content Types: Authors (Botanists) and Publications
• Node Reference between Authors and Publications
based on the TL-2 index.
• Other data is available when it's parsed:
• Herbaria
• Institutions
• Species names
• Bibliographies
• Handwriting Samples
• Postage Stamps

2011

Image Credits: Database: eponas-deeway (http://eponas-deeway.deviantart.com); Magnifying Glass: Flahorn (http://flahorn.deviantart.com/)
Getting Data into Drupal
• Create Content Types (Digital Library books & TL-2)
• Create import process
• May be able to use the Feeds module for import
• Must create node references during the import.
• Must accommodate the blocks of unparsed
information in TL-2
• Create a search interface specifically for TL-2

2011

What else is there to do?
Resolve /node/22365.rdf
and /tl2/author/charles-darwin
Handling "See also" and "Same as" entries in the TL-2
indexes.
Can we search our own data using SPARQL?
• Should we? Does it make sense?
Discuss/Extend vocabulary for our special needs.
Set up linked data within our site
• image collections
• trade literature
• Exhibitions
2011

Other Resources
LinkedData.org
http://linkeddata.org/guides-and-tutorials
http://linkeddatabook.com/editions/1.0/

Drupal Groups
http://groups.drupal.org/semantic-web
http://groups.drupal.org/libraries

Tim Berners-Lee, TED talks
Tim Berners-Lee on the next Web (2009)
The year open data went worldwide (2010)

2011

BHL is….
• A consortium of 13 natural history and
botanical libraries and research institutions
• An open access digital library for legacy
biodiversity literature.
• An open data repository of taxonomic names
and bibliographic information

2011

2011

Benefits of open data

Allows data which was created for a
specific purpose and audience to interact
with other data to serve new, previously
unimagined roles..

2011

What information have we
opened up?
Essentially, everything – our metadata
(descriptive, rights, structural), our image files,
scientific names, OCR’d files

2011

Technical methods for opening data
• Data exports
• APIs
• OpenURL
• OAI-PMH

2011

Who is reusing our data?
• Tropicos
• Rod Page – BioGUID, BioStor
• Encyclopedia of Life
• Ryan Schenk – Visualizing taxominic
synonyms

2011

Tropicos

2011

Rod Page – BioGUID – http://bioguid.info/bhl/

2011

Rod Page – BioStor – http://biostor.org/

2011

Encyclopedia of Life – http://eol.org/

2011

Ryan Schenk – http://ryanschenk.com/2011/02/visualizing-taxonomic-synoymns/

2011

Making open data successful
• Promote it!

2011

Do a code challenge

2011

Publicly display your data’s copyright/licensing
and API terms of service

2011

Thank You!
Building the New Open Linked Library
Keri Thompson, Head of Web Services
Smithsonian Institution Libraries
thompsonk@si.edu , @DigiKeri_SIL

Joel Richard, Lead Developer
Smithsonian Institution Libraries
richardjm@si.edu

Trish Rose-Sandler, Data Analyst
Biodiversity Heritage Library
trisha.rose-sandler@mobot.org

2011

Building the New Open Linked Library

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (11)

Similar a Building the New Open Linked Library

Similar a Building the New Open Linked Library (20)

Último

Último (20)

Building the New Open Linked Library

Notas del editor