All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
Linking biodiversity data for ecology
1. Linking Biodiversity Data for
Ecology: Case Studies in what we
can do with existing tools
Anne E. Thessen
http://www.slideshare.net/athessen
2. Acknowledgements
• David Patterson
• Dima Mozzherin
• David Shorthouse
• Cyndy Parr
• Paula Mabee
• Wasila Dahdul
• Sami Domisch
• Global Names Project
• Phenoscape
• Encyclopedia of Life
• Map of Life
• RDA/US Scholars
Program
• US National Science
Foundation
3. Case Studies
• Capturing species interactions using the
Encyclopedia of Life (EOL) and Global Names
Recognition and Discovery (GNRD)
• Linking traits and habitats using the
Phenoscape knowledgebase, Global
Biodiversity Information Facility (GBIF), and
Map of Life
4. Species Interactions
• Mine data about interactions from text
objects in EOL
• Create a “digital ecosystem”
• PLoS ONE
10.1371/journal.pone.
0089550
5. Workflow and Methods
• EOL provided us with a list of ID numbers for
their species
• Use the EOL API to access text objects under
the “Associations”, “Trophic Strategy”, etc.
Headings (http://eol.org/api)
• Use the GNRD API to find names in the text
objects (http://gnrd.globalnames.org/)
• Use Resolver to get the EOL ID corresponding
to each name found in the text
6. Workflow and Methods
EOL GNRD Results
EOL
TraitBank
Interaction mediated
between EOL and
GNRD APIs
GNRD API returns
results resolved to
EOL IDs
Machine-readable
results are added to
GloBI and visible in
TraitBank
7. GNRD
• Tool for finding taxonomic names in text
• Combination of TaxonFinder and Neti Neti
• Capable of some name reconciliation
• Recent overall performance evaluation on
published manuscripts and data files
– Precision = 0.880
– Recall = 0.642
– F1 Score = 0.742
• Largest sources of error were caused by table and
figure formatting and unusual punctuation
8. Resolver
• Tool for resolving taxonomic names in text
against an authority
• User can turn resolver on or off
• User can choose all or one of eight authorities
including CoL, IPNI, GBIF, etc.
• We resolved against EOL to get the
corresponding taxon IDs
9. Results
• Association detection performance
– Precision = 0.844
– Recall = 0.930
– F1 Score = 0.885
• Information extracted from entirety of EOL
and data set is part of GloBI
10. Linking Phenotypes to Environment
• Phenoscape and TraitBank link phenotypes to
taxa
• GBIF and Map of Life link taxa to locations and
habitats
• Can we take the extra step and link phenotypes
to environments?
11. Workflow and Methods
• Phenoscape provided us with a list of fish with
the miniaturization phenotype and a list of
sister taxa that are not miniaturized
• A search of these taxa in GBIF returned
geographic coordinates
• Those coordinates were given to Map of Life
who provided environmental data
• Used double-tailed t test to analyze result
13. Results
• We got back temperature, precipitation, slope,
land cover, and geology data
• Miniature fishes occur in wetter, warmer
environments
Variable Type Mean p value
Temperature mini 24.8 0.002
non mini 22.6
Precipitation mini 6.9 X 107 0.008
non mini 1.8 X 107
14. Relevant References
• Midford, P., P. Mabee, T. Vision, H. Lapp, J. Balhoff, W. Dahdul, C. Kothari, J.
Lundberg, and M. Westerfield. 2009. Phenoscape: Ontologies for large
multi-species phenotype datasets. Zoological Journal of the Linnean
Society 151: 691-757.
• Parr, C.S., N. Wilson, P. Leary, K.S. Schulz, K. Lans, L. Walley, J.A. Hammock,
A. Goddard, J. Rice, M. Studer, J.T.G. Holmes, and R.J. Corrigan Jr. 2014. The
Encyclopedia of Life v 2: Providing global access to knowledge about life
on Earth. Biodiversity Data Journal 2:e1079
• Thessen, A.E. and C.S. Parr. 2014. Knowledge extraction and semantic
annotation of text from the Encyclopedia of Life. PLoS ONE
http://dx.plos.org/10.1371/journal.pone.0089550
• Thessen, A.E., D.P. Shorthouse, D. Mozzherin, and D.J. Patterson. in prep.
Taxonomic names discovery to improve data discoverability.
• Thessen, A.E. et al. in prep. Linking phenotypes to environments
• Tuanmu, M.N. and W. Jetz. 2014. A global 1-km consensus land-cover
product for biodiversity and ecosystem modeling. Global Ecology and
Biogeography 23:1031-1045