1. A Mass Scanning
Workflow
Discussion
Sociological aspects of a global
mass scanning project
Biodiversity Heritage Library
Suzanne C. Pilsk- Smithsonian Institution Libraries
Matthew Person – MBLWHOI Library, Woods Hole
June 28, 2010
2. Biodiversity Heritage Library
In any well-appointed Natural History
Library there should be found every book
and every edition of every book dealing in
the remotest way with the subjects
concerned. One never knows wherein one
edition differs from or supplements the
other and unless these are on the same
table at the same time it is not possible to
collate them properly. Moreover for
accurate work it is necessary for the
student to verify every reference he may
find; it is not enough to copy from a
previous author; he must verify each
Charles Davies Sherborn (1861-1942) reference itself from the original.
Charles Davies Sherborn, Epilogue to Index Animalium, March
1922
6. You Meet… and discuss… and
meet… and discuss…
● 2003. Telluride. Encyclopedia of Life meeting
● February 2005. London. Library and Laboratory:
the Marriage of Research, Data and Taxonomic
Literature
● May 2005. Washington. Ground work for the
Biodiversity Heritage Library
● June 2006. Washington. Organizational and
Technical meeting
● August 2006. New York Botanical Garden. BHL
Director‟s Meeting.
● October 2006. St. Louis/San Francisco. Technical
meetings …and you
● February 2007. Museum of Comparative Zoology.
Organizational meeting
follow
● May 2007. Encyclopedia of Life and BHL Portal
through..
Launch. Washington DC.
7. Members
● American Museum of Natural History (New York)
● Botany Libraries, Harvard University
● Ernst Mayr Library of the Museum of
Comparative Zoology, Harvard University
● Field Museum (Chicago)
● Marine Biological Laboratory / Woods Hole
Oceanographic Institution Library
● Missouri Botanical Garden (St. Louis)
● Natural History Museum (London)
● New York Botanical Garden (New York)
● Royal Botanic Gardens, Kew
● Smithsonian Institution Libraries (Washington)
● Academy of Natural Science (Philadelphia)
● California Academy of Science (San Francisco)
8. Contributing Members and
Partners
● Internet Archive
● California Digital Libraries
● University Library of the
University of Illinois at Urbana-
Champaign
11. *Who has what?
*What should we scan and
when?
*Monographs vs Serials
*Series treated as separates
*Can it be found and used
once scanned?
12. Initial Metadata Analysis:
We have 1.3 million
catalogue records
73% are monographs
(remainder are serials at title-
level)
63% is English language
material. The next most
popular language (9%) is
German.
About 30% of material was
published before 1923.
13. The
Worker Bees
● Telephone conversations
● Email strings
● Working documents
● bhl.wikispaces.com
● Face to face meetings
● Presentations
● Articles
● Going beyond self
expectations was the norm
15. Mass Scanning Workflow
Local data flow
Vendor data flow
WonderFetch tm
Return of data
Return of material
Quality Assurance
Billing
16. EOL Bibliographic
Curator species Data from
Request Evaluateneed SIRIS
Carts delivered to scanner
title.
Need is…
Goin‟ down Picklist Put on shipping cart,
the rows “gap-fill” generate„packinglist‟ invoice
for other database
stores
BHL library select/reject/ship
Update picklist if item record
state & supplies has been changed
item metadata During cataloging touch-up
to IA Circ to scanner
Select title
no in picklist,
serial? upload to Circ to cataloging
monograph de-duper for MARC editing
yes
no The Stacks Reject in picklist,
yes Duplicate? Circ in Horizon fail
Other Return to stacks
library
“bid” ? Meta-
Reject in picklist, data
no return to stacks check pass
“Bid” Pull from stacks
Preser-
on title, Circ in ILS vation
select in Preliminary metadata check review pass
picklist And physical check
fail
17. IA scanning process
Unique IA id is assigned BHL Portal
Metadata is gathered from Periodically harvests
SIRIS and the picklist db Marc.xml (bib) and item
And associated with the scan Records, along with
JP2000s generated JP2000 from
Carts delivered & transformed Archive.org
Served on archive.org
to scanner QA is done by IA on 10%
To index and display
In the portal
Put on shipping cart,
generate „packinglist‟ Books are returned,
Invoice, alert cart contents are
scanning center verified against invoice
SIL does 20% QA Download .csv from
Update picklist Checking for metadata matching portal with SIL
to indicate With item, scan quality etc barcodes, Portal
rescan
URLs
no Pass QA?
yes
Updated in picklist as scanned Send URLs to SIRIS
Circ in Horizon
Place BHL sticker near barcode Office for batch
Return to Stacks updates
18. The work-Flow Process
● Select Book ~Pull from Shelf
● Review Physically,
and check Metadata
● Establish viability and create
pick/pack list / Wonderfetch tm
● Send to IA scanning center
48. 24 April 2009
The following came from a public librarian in Falmouth, Massachusetts:
"We recently were asked the question: who discovered the zebra fish?
In searching the Encyclopedia of Life I kept seeing the phrase
“Hamilton, 1822” next to the “danio rerio”. Wondering who Hamilton
was, I searched WorldCat and discovered that Hamilton was Francis
Hamilton who had published in 1822 An account of the fishes found in
the river Ganges and its branches. I looked at the EOL record and
clicked on the Biodiversity Heritage Library link. One of the links was to
a Hamilton book! In 1878 the book The Fishes of India was published
which included a description and a image of the danio rerio. Links were
provided to the exact place in the text where the fish was mentioned, as
well as to the plate with the fish itself illustrated. Not only that, but I
could send the patron the exact link to both pages which described her
fish. How remarkable it was to find this Harvard University book
available so easily through the Biodiversity Heritage Library. A great
success for our patron, and we looked like magicians bringing the
book to her."
49. ● Gary Anderson, Professor Emeritus at the University of Southern
Mississippi. He used to make an annual trip to our stacks to xerox hundreds
of articles at a time.
● “The Biodiversity Heritage Library is a valuable resource for
acquiring crustacean literature. At present, a search there (http://
www.biodiversitylibrary.org/Search.aspx?searchTerm=pycnogonid&searchC
at=) will turn up 5 publications (one of
which was not contributed by the Smithsonian). Also note that the BHL
has scanned these and additional literature at the site for taxonomic
terms, and provides links to those documents. There are 1592 "hits"
for Pycnogonida. It is likely that you could turn up a lot of
additional articles within larger works that way. Alternatively, you
could perform searches for volumes of interest (if you know of
specific references), to home in on the papers you want. There will
be A LOT of additional material becoming available at that site.”
50. “Yesterday whilst reading the latest edition of The
Entomologist's Record I was pleased to find that early
editions of this invaluable publication, edited by the
seminal entomologist James Tutt (no relation to Elvis's
drummer as far as I am aware) are available digitised
[…]
So I went there, and was amazed at what I found.
They even have a blog. What a fantastic project!!!”
From the blog:
http://forteanzoology.blogspot.com/2009/03/fantastic-
resource.html
51. […]Michael, an colleague researching wasps was excited that he had
discovered in the Biodiversity Heritage Library a copy of an obscure 1860s
book:
Saussure, H. de & Sichel, J. (1864). Catalogue des espèces de l'ancien
genre Scolia, contenant les diagnoses, les descriptions et la synonymie des
espèces, avec des remarques explicatives er critiques. Genève & Paris :
Henri Georg & V. Masson et Fils pp. 1–350
This book was not in our library, probably not in Australia, and almost
impossible to get hold of without travelling to the northern hemisphere.
Thanks to the BHL for their work in providing access to works of importance.
Michael is now able to use detailed content of this book in his work.
John Tann Australian Museum
52. The Biodiversity Heritage Library : Advancing Metadata
Practices in a Collaborative Digital Library
Suzanne C. Pilsk, Smithsonian Institution Libraries; Matthew Person, MBLWHOI
Library; Joseph deVeer, Ernst Mayr Library, Museum of Comparative Zoology; John F.
Furfey, MBLWHOI Library; Martin R. Kalfatovic, Smithsonian Institution Libraries
Abstract:
The Biodiversity Heritage Library is an open access digital library of
taxonomic literature, forming a single point of access to this collection for
use by a worldwide audience of professional taxonomists, as well as
“citizen scientists.” A successful mass scanning digitization program, one
that creates functional and findable digital objects, requires thoughtful
metadata workflow that parallels the workflow of the physical items from
shelf to scanner. This article examines the needs of users of taxonomic
literature, specifically in relation to the transformation of traditional library
material to digital form. It details the issues that arise in determining
scanning priorities, avoiding duplication of scanning across the founding
twelve natural history and botanical garden library collections, and the
problems related to the complexity of serials, monographs, and series.
Highlighted are the tools, procedures, and methodology for addressing the
details of a mass scanning operation.
53.
54. A Mass Scanning
Workflow
Discussion
Pilsks@si.edu
mperson@mbl.edu
Thanks to All Staff of the
BHL Members