This was a demo given by Trish Rose-Sandler and Kyle Jaebker at the Museums and the Web Conference on April 20th 2013 related to how BHL is improving access to its natural history illustrations via Flickr and via the Art of Life project. Authors for the poster and handouts include: Gilbert Borrego, Grace Costantino, Bianca Crowley, Kyle Jaebker, and Trish Rose-Sandler
More than just a pretty picture: improving the discoverability of illustrations in the Biodiversity Heritage Library
1. Image access via FlickrThe Biodiversity Heritage
Library (BHL) is....
Art of Life project
More than just a pretty picture: improving the discoverability of
illustrations in the Biodiversity Heritage Library
by Gilbert Borrego, Grace Costantino, Bianca Crowley, Kyle Jaebker, Trish Rose-Sandler
Hidden within BHL literature are
millions of rich illustrations
• An open access digital library
for historic biodiversity literature
• An open data repository of
taxonomic names and
bibliographic information
BHL staff manually identify and push BHL images to a
Flickr stream (www.flickr.com/photos/biodivlibrary) but
the process does not scale to the millions of images
available
The Art of Life project , enabled by a grant from NEH,
aims to automate the process of identifying and
tagging images via algorithms
Users can add tags to images
in Flickr so that they are
searchable. They are also
encouraged to add species
names via machine tags so
BHL can automatically share
these images with the
Encyclopedia of Life
(http://eol.org/collections/53002)
The project defined a metadata schema for natural history
illustrations that will help crowdsource more detailed
descriptions via image portals such as Wikimedia Commons
(http://tinyurl.com/9hm7nsb)
www.biodiversitylibrary.org
2. Uploading Images to Flickr
The Biodiversity Heritage Library (BHL, www.biodiversitylibrary.org) provides access to thousands of scientific
illustrations through the social media site, Flickr. To expedite the process of uploading these images to Flickr, a
workflow was developed within BHL’s backend database. When paginating, or enhancing a book’s page metada-
ta, staff can click a single button to upload all illustrations within that book to Flickr. Bibliographic information
and a link to the image in BHL are also embedded during the process.
This workflow was internally documented in the form of a tutorial to ensure that all BHL partners can contribute
to this effort and be part of the program’s expanding outreach efforts.
The use of Flickr as an outreach platform exposes our rich image collection to search engines and new users.
Additionally, it allows us to provide images of species to include on the Encyclopedia of Life’s taxon pages. While
the original intention of BHL’ Flickr account was to provide easy access to scientific figures, plates and illustra-
tions, the site has taken on a life of its own and is being repurposed by users all around the world in the most
imaginative ways.
From BHL’s backend dashboard, staff select the
pages to upload to Flickr.
Final view in Flickr.
Once images are uploaded, staff can create sets, add
additional bibliographic information, and assign
sets to collections.
Visit the BHL Flickr today! http://www.flickr.com/photos/biodivlibrary
Learn how you can help add species names to BHL Images:
http://www.flickr.com/groups/encyclopedia_of_life/discuss/72157629515768640/
3. The Flickr Tagging Process
Crowdsourcing Species Identification and Image Tagging
The Biodiversity Heritage Library (BHL, www.biodiversitylibrary.org), an open access digital library consortium
for biodiversity literature, utilizes Flickr to provide access to thousands of images extracted from its digital
collections. In order to improve discoverability and usability of these images, BHL crowdsources the task of
adding species name machine tags to images in Flickr.
Tags are searchable keywords that users can apply to images in Flickr. Machine tags are specially formatted to be
read by computers: taxonomy:binomial=“Genus species”
BHL encourages its users to identify the species depicted in an image using the book’s image descriptions and
add that species name to the image as a machine tag. By adding these tags to BHL images, users can search
within Flickr for images of specific species and BHL can automatically share these images with the Encyclopedia
of Life (EOL, www.eol.org).
EOL is an open access project dedicated to providing a webpage for every species. EOL harvests machine-tagged
images from the BHL Flickr, uploads them to a BHL Image Collection in EOL, and automatically associates the
images with the matching species page. To date, thousands of machine-tagged images have been added to EOL.
Visit the BHL Flickr today! http://www.flickr.com/photos/biodivlibrary
Learn how you can help add species names to BHL Images:
http://www.flickr.com/groups/encyclopedia_of_life/discuss/72157629515768640/
Find an image in Flickr
Add a species name machine tag
The image is automatically ingested into the BHL Image Collection in EOL
And automatically associated
with the corresponding species
page in EOL
4. Users clamor for the Art of Life
The Art of Life project evolved out of a need to improve access to the rich corpus of natural history illustrations
hidden within the digitized pages of books and journals in the Biodiversity Heritage Library (BHL,
www.biodiversitylibrary.org). Currently, these illustrations have no descriptive metadata such as title, creator or
subject matter that can be searched. The only way to uncover these gems is by opening up a BHL book or vol-
ume and scrolling through page by page.
One solution has been for BHL staff to manually identify pages that contain illustrations and to push those pages
into a BHL Flickr stream which allows for discovery through themed collections and in some cases species
names. While this approach has resulted in improved access to some of BHL’s illustrations, it requires significant
staff time and the process does not scale well to the millions of images that are present within the BHL pages.
Example of an illustration described using Art of Life schemaIllustration schema elements.
Visit the BHL Flickr today!
http://www.flickr.com/photos/biodivlibrary
Read more about the Art of Life project:
http://biodivlib.wikispaces.com/Art+of+Life
Elements chosen
were a mix of VRA
Core 4.0 and
Darwin Core
Workflow diagram that outlines how each illustration will move through the Art of Life processes.
Thus, the Art of Life project was designed as a solution for automating the process of image identification and
crowdsourcing their descriptions. The project is a partnership between the Missouri Botanical Garden and the
Indianapolis Museum of Art and supported by the National Endowment for the Humanities. It runs from May
2012-April 2014. The Art of Life has five primary objectives: 1) define a metadata schema appropriate for nat-
ural history illustrations, 2) build algorithms to automatically identify BHL pages with illustrations, 3) sort and
classify the illustrations, 4) crowdsource descriptions through tagging applications; and 5) integrate descriptive
metadata back into BHL and share images and descriptions with audiences outside of BHL. These illustrations
will be of interest to a diversity of audiences including: artists; biologists; humanities scholars; librarians; educa-
tors; citizen scientists.
5. Automating the Heavy Lifting
Using Algorithms to Identify Images in BHL
In the Art of Life project, the Indianapolis Museum of
Art (IMA) and the Biodiversity Heritage Library (BHL,
www.biodiversitylibrary.org) have been working to
develop algorithms to identify images from the pages
of books and journals digitized from the BHL. Multiple
algorithms are being developed including ABBYY
OCR, contrast, color, and compression. These
algorithms are being tested to determine the most
efficient and accurate means of identifying images.
The IMA developed a set of software tools for running
and analyzing the results of the algorithms. This
software allows for the import of publications and
journals determined to be good test samples for the
algorithms. These samples termed the “Gold Standard”
are being used to evaluate the algorithms for how
useful they will be in determining if a scan contains a
sketch or drawing. Using a custom built interface for
reviewing the results, accurate processing results can be
seen as well as false positives. In addition to the visual
review of results, analysis across the entire “Gold Stan-
dard” is ongoing to determine the best combination of
algorithms.
Once completed, the algorithms will be deployed on a
cluster to process the entire BHL collection. After the
processing has been completed the metadata will be
used to add additional descriptive and finding aides.
This will allow users to discover and process
illustrations from the books and journals that used to
be very hard to discover.
Visit the BHL Flickr today!
http://www.flickr.com/photos/biodivlibrary
Read more about the Art of Life project:
http://biodivlib.wikispaces.com/Art+of+Life
Learn how you can help add species names to
BHL Images:
http://www.flickr.com/groups/encyclopedia_of_life/
discuss/72157629515768640/
Algorithm Results Viewer
Compression Ratio Algorithm Analysis
Close-up Algorithm Result