An Inordinate Fondness for Data: The Biodiversity Heritage Library. Martin R. Kalfatovic. OCLC Digital Forum East 2009. November 5, 2009. Arlington, VA.
Unleash Your Potential - Namagunga Girls Coding Club
An Inordinate Fondness for Data: The Biodiversity Heritage Library
1. An Inordinate Fondness for Data
The Biodiversity Heritage Library
OCLC Digital Forum East 2009
5 November 2009
Arlington, VA
Martin R. Kalfatovic
Smithsonian Institution Libraries
2.
3. American Museum of Natural History (New York)
Academy of Natural Sciences Philadelphia
California Academy of Sciences (San Francisco)
Field Museum (Chicago)
Natural History Museum (London)
Smithsonian Institution Libraries (Washington)
Missouri Botanical Garden (St. Louis)
New York Botanical Garden (New York)
Royal Botanic Garden, Kew
Botany Libraries, Harvard University
Ernst Mayr Library of the Museum of Comparative
Zoology, Harvard University
Marine Biological Laboratory / Woods Hole
Oceanographic Institution
5. Education and Outreach
Smithsonian & Harvard
H
Synthesis Center
Field Museum
Species Pages & Secretariat
Smithsonian
Informatics
Marine Biological Laboratory
Missouri Botanical Garden
6.
7.
8. How much is there:
Core literature pre-
1923: 100 million
pages (?)
All pre-1923: 120-
150 million pages
All literature: 280-320
million pages
9. • Northeast Regional
Scanning Facility
(Boston)
• Jersey City Facility
• University of Illinois
• Natural History
Museum, London
• Missouri Botanical
Garden (Non-Scribe
operation)
• Fedscan (Library of
Congress)
• Smithsonian Libraries
10. BHL Members: BHL-Europe
• Museum für Naturkunde - • Stichting Nationaal
Leibniz-Institut für Evolutions- Natuurhistorisch Museum,
und Biodiversitätsforschung an Naturalis
der Humboldt-Universität zu • National Botanic Garden of
Berlin Belgium
• Natural History Museum, UK • Royal Museum for Central Africa,
• Narodni muzeum NMP CZ • Royal Belgian Institute of Natural
• Angewandte Informationstechnik Sciences
Forschungsgesellschaft mbH • Bibliothèque nationale de France
• Freie Universität Berlin • Museum national d’histoire
FUBBGBM naturelle
• Georg-August-Universität • Consejo Superior de
Göttingen Stiftung Öffentlichen Investigaciones Cientificas
Rechts
• Università degli Studi di Firenze
• Naturhistorisches Museum Wien
• Royal Botanic Garden,
• Hungarian Natural History Edinburgh
Museum
• Species 2000
• Museum and Institute of
Zoology, Polish Academy of • John Wiley & Sons limited
Sciences • Helsingin yliopisto UH-Viikki
• University of Copenhagen
11.
12. Now Online
More than:
40,000 volumes
16 million pages
Only 290 million to go!
Avg. monthly growth rate
1,500 volumes
600,000 pages
See you in 2048!
14. Acquiring other content ...
Researches scanning
their own work or
literature relevant to their
work
Journals that have
scanned their content, but
do not have a robust
platform to host it
15. Biodiversity Heritage Library
Permission Process
Working with non-profit publishers for
sharing with the BHL
To digitize and mount works under
copyright BHL must obtain permission
from the copyright holders.
Many biodiversity journals and
monographs are published by non-profit
institutions or learned societies whose
mission is to promote research and
learning.
Some of these institutions have not sold
their rights to commercial publishers and
are open to sharing with the BHL.
16. So what? Does [fill in blank] do
that?
… and more and faster?
17. So what? Does [fill in blank] do
that?
… and more and faster?
20. An inordinate fondness for data
Access
Putting biodiversity
literature in the hands
of researchers
Set the data free
Suck it; mash it;
broadcast it
Increase
Reuse, recyle, expand
21. Stats: Usage
• Jan – Sep 2009 • Daily average
– 266,000 visitors – 970 visitors
– 436,000 visits – 1,600 visits / day
– 2.1million – 7,700 pageviews /
pageviews day
Jan – Sep 2009
Launch to 30 Sep 2009
22. Global, coordinated development
New functionality from BHL-Europe
Improved deduplication tools
Semantic interface
OAIS-compliant preservation infrastructure
Building a community of developers
Funded & volunteer
RubyBHL: http://github.com/mjy/rubyBHL
PyBHL:
http://linux.softpedia.com/get/Programming/Libraries/pybhl-51612.shtml
New partners, new content
23. Open Software & Development
BHL Bits:
Portal code, utilities, services
http://code.google.com/p/bhl-bits/
Taxonomic Literature Group
Google Group for discussion of “taxonomic literature &
the services required to make literature
interoperable within biodiversity research and
biodiversity informatics.”
http://groups.google.com/group/taxonlit
24. Open Data
Downloads
Simple tab-delimited exports of core data
http://www.biodiversitylibrary.org/data/BHLExportSchema.pdf
Data model
DB schema as ERD
http://bhl-bits.googlecode.com/files/20090930_BHLDataModel.pdf
28. Services
Names Service
Return all occurrences of a name throughout BHL digitized
corpus
Documentation: http://bit.ly/2e6sg9
Access to 51million name strings using TaxonFinder
1.4million unique names
Working out a strategy for obscure species
Algorithm improvements to detect nomenclatural & taxonomic
acts
OpenURL
Facilitate links to citations: protologues, articles, references
Documentation: http://www.biodiversitylibrary.org/openurlhelp.aspx
Useful to Nomenclators, Reference Systems
IPNI
Tropicos
34. Other Consumers
EarthCape Labs
Sort/Search capabilities with harvested names
YouTube demo: http://www.youtube.com/watch?v=qw7qw87JTOs
BioGUID
BHL Name Timeline
http://bioguid.info/bhl/
BHL Name Comparison
http://bioguid.info/bhl/compare.php
35.
36.
37.
38.
39. Global BHL
Based on open access
Open content
Collaboration
Shared development
40. Uh, so what's it mean
to me?
1.9 million known
species … most
described once in a
hard to find article …
wouldn't it be nice to
know more about
your neighbors ...