I gave this talk at the Smithsonian National Museum of Natural History in April 2013. It deals with ZooBank and the registration of scientific names of animals, the role of type specimens and archives for both specimens and literature. It should be of interest to taxonomists, and people working on biodiversity bioinformatics and scientific bibliography.
The talk had significant input from several co-authors: Richard Pyle, David Patterson, Daphne Fautin and Jon Todd. The Smithsonian presentation was hosted by the AAZN (American Association of Zoological Nomenclature). I gave a similar talk in November 2012 at the invitation of the Field Museum, Chicago, which is available in full online here (54 minutes): http://vimeo.com/55796036 and linked with a short promo piece on scientific nomenclature here (2.8 minutes): http://vimeo.com/54956625
Unleash Your Potential - Namagunga Girls Coding Club
The power of names smithsonian talk-2013-iczn_nomenclature&bioinformatics-v2
1. Nomenclature for
the Future:
The power and
challenges for
stable and
sensible scientific
names for
animals
Ellinor Michel1,2,3
Richard Pyle1,3,4
Daphne Fautin1,3,5
David Patterson1,3,6
Jon Todd2,3
1
Int’l Commission on Zoological Nomenclature
2
The Natural History Museum, London UK
3
Int’l Committee on Bionomenclature
4
Bishop Museum, HI, USA
5
University of Kansas, USA
6
Arizona State Univ, AZ, USA
2. Names and the information revolution
All accumulated information of
a species is tied to a scientific
name, a name that serves as a
link between what has been
learned in the past and what we
today add to the body of
knowledge.
- Grimaldi & Engel, 2005
Note: they don’t say THE scientific name (i.e., singular)
3. Equivalent of 318 volumes of Systema Naturae
Estimated 2-6 names for every valid (=currently
considered definable and ‘real’) species
4,398 Species
8. Scientific concept of biodiversity
Name
Type specimen
(Scientific, common,
provisional or open)
(objective standard)
Data &
Bibliography
9. Stability, transparency and testability
Name
Type specimen
(Scientific, common,
provisional or open)
(objective standard)
Data &
Bibliography
10. Archives for
Scientific concepts of biodiversity
Name
Type specimen
(Scientific, common,
provisional or open)
Stable archives
needed
(objective standard)
Data &
Bibliography
Natural History
Collections
Libraries
Publications
Data sources
14. E-only publication amendment
to ICZN Code published 4 Sept
ZooBank improved version
released, meeting
requirements of the
Amendment
From zoobank.org/statistics
15. Registration in ZooBank
• Now required for e-only publications
• Has general community support
• Registration of all names and nomenclatural
acts is strongly encouraged and being rapidly
implemented
• Next Step: ALL names (historical and future)
registered and cross-linked!
16. A name = ‘computer’ readable
code that links information
17. “Archaeopteryx”
Easy for a human;
Hard for a computer
A5B835CF-BB3A-4CC9-BCBD-38BA253C8374
Easy for a computer;
Hard for a human
23. Zoological Names in the Future
• Global mandatory registration for all new
names – next edition of the Code?
• Ultimate Goal: Registered = Available
(Pyle & Michel, 2008; Minelli, 2013)
24. Logistics of populating ZooBank
• 16,000-20,000 new animal species described
each year
• 1.9 million described extant species
• 5-50 million estimated total extant species (R.
May, E.O. Wilson, T. Erwin)
• Fossil species multiply this by some factor
Strategic approaches required
Publishers highly supportive and beginning
to require ZooBank registration
Authors & databases contributing now
25. Logistics of populating ZooBank
Building tools to streamline the capture of prospective
content
•Publishers pipelines with XML tools
•Requested and required ZooBank registration by authors of
new papers
•(all e-only publications must be registered to be available)
Populating with retrospective content
•Major sources – Sherborn, Hymenoptera Names Server,
Hexacorallians of the World, etc.
•Committed individuals – Rod Bray, Takafumi Nakano
•Lists of Available Names (LANs)
30. LANs – Lists of Available Names
• Critical assembly of large
numbers of names
• Community debate
• Commission authoritative
ruling
Article 79 - An international body of
zoologists… in consultation with the
Commission may propose that the Commission
adopt for a major taxonomic field (or related
fields) a Part of the List of Available Names in
Zoology. The Commission will consider the
proposal and may adopt the Part subject to the
proposing body and the Commission meeting
the requirements of this Article.
31. LANs – Lists of Available Names
1)Ensures a candidate Part of the LAN is
thoroughly vetted
2)Pares away dubious names
•
like the Approved Lists of Bacterial Names that took effect on
1 January 1980 – taxonomically recognizable as well as
nomenclaturally available
3) Prevents “nomenclatural archeology”
•
long-forgotten names displacing accepted names
Creates a definitive nomenclatural inventory (a
new zero point) for a portion of the taxonomic
spectrum
Source of names for ZooBank
32. Two Possibilities
to document every
available name
within the scope
of the Part
STRICTLY NOMENCLATURAL
TAXONOMIC COMPONENT
to pare the inventory
of names within the
scope of the Part
34. Conclusions
• Names are the anchor and link for biodiversity information
exchange
• Types provide stability and meaning for taxon names
• An stable archive of names is a critical taxonomic
infrastructure
• ZooBank aims to be the authoritative source for scientific
names of animals and is growing rapidly
• The future of nomenclature includes a harmonization of
biological codes, especially through technical tools such as
ZooBank and the Global Names Architecture
35. THANKS
Natural History Museum, London
Bishop Museum, Hawaii
ITCN/ITZN supporting institutions
(MNHN (France), Senckenberg
(Germany), Naturalis, RBINS
(Belgium), AAZN (USA))
The Commissioners & Trustees of
ICZN / ITZN
ICB – International Committee on
Bionomenclature
Everyone pitching in on building
ZooBank content
In 1758 it was feasible to create a catalog of life using ink on paper.
<click>
Today, it would require the equivalent of nearly 264 volumes of Systema Naturae to achieve the same thing.
Or, you could fit the whole thing on a tiny memory card.
It’s no surprise that the biodiversity community is going digital.
<click>
Many Natural History Museums are databasing their collections.
<click>
Historical literature is being digitized by the Biodiversity Heritage Library and others…
<click>
…and many modern scientific journals are embracing the digital age directly.
<click>
Authoritative Nomenclators have been built,
<click>
and a variety of groups are working to distill the taxonomic concepts from the sea of names.
<click>
Databases of observation records are growing at a fast rate,
<click>
as are genomic databases.
<click>
The internet has made feasible the cheap and easy dissemination of multimedia files related to biodiversity.
<click>
And, of course there is biodiversity content spread across the billions of web pages indexed by Google.
<click>
To make sense of it all, several organizations serve as aggregators of all this diverse content.
And this is just a small sample of icons that could fit on a slide.
This massive effort to digitize biodiversity information is a great step in the right direction.
But it is only one step.
We must now focus our energies on integrating all of this information in a coordinated, cohesive way.
<click>
The critical informatic piece to this puzzle is Taxonomy, because almost all of these data providers link their content to taxon names one way or another.
It’s no surprise that the biodiversity community is going digital.
<click>
Many Natural History Museums are databasing their collections.
<click>
Historical literature is being digitized by the Biodiversity Heritage Library and others…
<click>
…and many modern scientific journals are embracing the digital age directly.
<click>
Authoritative Nomenclators have been built,
<click>
and a variety of groups are working to distill the taxonomic concepts from the sea of names.
<click>
Databases of observation records are growing at a fast rate,
<click>
as are genomic databases.
<click>
The internet has made feasible the cheap and easy dissemination of multimedia files related to biodiversity.
<click>
And, of course there is biodiversity content spread across the billions of web pages indexed by Google.
<click>
To make sense of it all, several organizations serve as aggregators of all this diverse content.
And this is just a small sample of icons that could fit on a slide.
This massive effort to digitize biodiversity information is a great step in the right direction.
But it is only one step.
We must now focus our energies on integrating all of this information in a coordinated, cohesive way.
<click>
The critical informatic piece to this puzzle is Taxonomy, because almost all of these data providers link their content to taxon names one way or another.
To overcome these and other problems, we need to build a Global Names Architecture.
The first and perhaps most critical component to integrating all of this biodiversity information is the broader adoption of Globally Unique Identifiers, or GUIDs.
<click>
If you already know about GUIDs, then the question of whether we should use LSIDs or DOIs or Handles, or PURLs or UUIDs is of secondary importance to their more general implementation.
If you don’t already know about GUIDs, then learn about them, or trust your IT staff when they say then need support to implement them.
Also, always keep in mind that they are intended for use by computers, not humans, so don’t worry about how ugly they may appear.
But while GUIDs make things a lot easier, they do not, by themselves, solve the problem of linking the world’s biodiversity information.
GUIDs are globally unidque identifiers, readable only to computers
The two major components of the Global Names Architecture currently under development are the Global Names Index, or “GNI”…
<click>
And the Global Names Usage Bank, or “GNUB”.
<click>
The GNI is optimized to manage information from content providers that treat names as text-string attributes of other data objects. For example, it provides a species-level index of content within data bases and facilitates linking of disparate data sources through species names.
<click>
The GNUB is designed to manage taxon names and their usages as curated data objects in and of themselves.
These components will not only help build links among their own contributing data providers, but also to bridge the gap between them.
<click>
Services will allow the GNI to serve as a “gateway” into the GNUB.
And, of course, the GNUB will serve as a source of validated taxon name strings back to GNI.
The Global Names Architecture is currently in development with support from GBIF, the Encyclopedia of Life, and National Biological Information Infrastructure.
A prototype of GNI is already available at globalnames.org, and the GNUB is currently being populated with content from Index Fungorum and ZooBank, the latter of which will include content from many of these taxon-specific nomenclators.
<click>
Ultimately, all of these content providers will be plugged into the Global Names Architecture, and the biodiversity data content will start to flow.
The good news is that Taxonomic names represent one of the greatest and long-lasting examples of true international cooperation in all of science, if not all of human history.
<click>
This is a result of the various Codes of Scientific Nomenclature; two of which have been in place for more than a century, and apply to all names going back to Linnaeus.
In a sense, the Codes of Nomenclature represent our saving grace for organizing biodiversity information.
Without their existence, longevity, and near-universal adoption, the prospects for integrating biodiversity information would be orders of magnitude more difficult today.
Unfortunately, as important as these Codes of nomenclature have been and continue to be, there are still some issues to overcome.