Smith, V.S. 2013. Delivering biodiversity knowledge in the information age. Hellenic Botanical Society, Thessaloniki, Greece, 3-6 Oct. 2013. [Delivered via video link through Google Hangouts]
2. Overview
1. Background – biodiversity data diversity
•
•
An introduction to me (lice to data infrastructures)
The problem (integrating biodiversity research)
2. Example tools to manage biodiversity data
•
•
•
Scratchpads (a platform to manage data)
Biodiversity Data Journal (incentives to work digitally)
eMonocot (aggregating data across communities)
3. Big community challenges – three examples
•
•
•
Social issues (openness)
Data issues (mobilizing existing data)
Synthetic issues (modeling data)
4. Next steps
•
Toward an integrated view for H2020 (strategy)
4. Lice to data infrastructures (1997-2004)
Systematics (circa 1998)
- No high level keys
- Poor high level taxonomy
- Just one phylogeny
- Few living experts!
Circa 5,000 spp.
Mammals & birds
12,000 associations
15,000 potential hosts
5. Lousy data infrastructures (circa 2004)
Images
Specimens
(SID)
LouseBASE
Glasgow version at:
http://darwin.zoology.gla.ac.uk/~rpage/LouseBase/2/
Lab Notebook
Literature
http://darwin.zoology.gla.ac.uk/~SID/
Host-Parasite Checklists
PHPBib
http://www2.flmnh.ufl.edu/pdb/
http://myphpbib.sourceforge.net/
http://www2.flmnh.ufl.edu/adb/
6. The problem – integrating biodiversity research (2004>)
How to we join up these activities?
What infrastructures do we need?
(technologies, tools, standards…)
What processes do we need?
(Modelling, workflows…)
What data do we need?
(Genes, localities…)
How do we use this as a tool?
Species conservation & protected areas
Impacts of human development
Biodiversity & human health
Impacts of climate change
Food, farming & biofuels
Invasive alien species
8. Scratchpads – a space for your data
• Hosted websites for
biodiversity data
• Virtual research
environments
• Completely open access
& open source
• Modular & flexible
• Running since 2007
• Making taxonomy
digital, open & linked
http://scratchpads.eu
9. Scratchpads– a space for your data
Taxa
Projects
544 Scratchpad Communities
by
6,644 active registered users
covering
91,631 taxa
in 535,317 pages.
http://scratchpads.eu
Regions
Societies
In total more than
1,300,000 visitors
81 paper citations in 2012
10. Biodiversity Data Journal – incentivising data publishing
• New, Open Access data journal
• Linked to Scratchpads via Publication
Module
• Supports the life cycle of a manuscript
• Writing, submission, review, publication
& dissemination, all in one place
• Structured, reusable, standardised data
• Launched in Sept 2013 with 24 articles
http://biodiversitydatajournal.com
11. Biodiversity Data Journal – easy manuscript assembly
Structured data
Select, describ
e & annotate
data
Publication module
Review, Publish
, cite &
disseminate
http://biodiversitydatajournal.com
EOL
Dryad
GBIF
Wiki Species-Id
PubMed
Plazi
12. eMonocot – aggregating data across communities
• Online resource for monocot
plants
• Collaboration between
Kew, Oxford University and
NHM
• Data to be open and usable
by other scientists
http://e-monocot.org
13. eMonocot – aggregating data across communities
• Linking monocot
communities
• Identification, checklist
& taxonomic data for:
-
275,000 taxa
8,300 images
15 identification keys
3 phylogenies
• A sustainable digital
portal
• A source of data for
analysis
http://e-monocot.org
14. 3. Example challenges
- Social issues (openness)
- Data issues (mobalising existing data)
- Synthetic issues (modelling)
15. Social challenges: openness
“A piece of data or content is open if anyone is free to use, reuse, and redistribute it subject, at most, to the requirement to attribute and/or share-alike.” http://opendefinition.org/
• Sharing data is a foundation
for our activities
• Normal practice in some
communities (molecular)
• Mandated by some funders
& governments
Many kinds of openness:
• Open Access
• Open Data
• Open Science
• Open Source
E. Archambault et. al., Proportion of Open Access Peer-Reviewed Papers at the
European and World Levels--2004-2011, June 2013, Science-Metrix Inc.
“One-half of all papers are now freely available
within a year or two of publication”
Need to continue to incentivise openness
16. Data challenges: mobilising existing data
Collections, literature & metadata
How can we quickly, efficiently and cost
effectively mobilise biological data at scale?
Collections
• 1.5-3B specimens in collections worldwide
• Fragments efforts / need coordination
Biodiversity literature
• >300M pages, BHL scanned 41M to date
• Copyright post-1923 & article metadata
NHM
Digitisation
BHL
literature
Informatics challenges
• Automation & annotation
• Storage & persistence
• Business models to sustain activity
Bibliography of Life
(RefFinder & RefBank)
17. Synthetic challenges: Modeling the biosphere
Reasoning across large, linked biodiversity datasets
A clear, singular, long-term vision, which
biodiversity data can contribute too
Conceptually has many potential uses
• Identifying trends
• Explaining patterns
• Making predictions
• Real time alerts
- when data contradicts current knowledge
• The ultimate policy tool
Major informatics challenges
• Technical very difficult (many years off)
• Needs effective prototypes & platforms
• Some first steps e.g. Local Ecological Footprint Tool
Nature 2013, doi:10.1038/493295a