2. Overview
1. (my) Background
•
•
Lice to data infrastructures!
Why data infrastructures at the NHM
2. Building data infrastructures
•
•
•
Recent core investment in NHM infrastructures
Leveraging external investment in NHM infrastructures
Infrastructure design principles & coordination
3. NHM 5-year data infrastructure horizons
•
•
•
Collections digitisation
Large-scale use of collections data
New approaches to biodiversity discovery
4. Decadal community infrastructure challenges
•
•
The long view – science data strategies
Data modeling and real time monitoring as a unifying theme
4. Lice to data infrastructures!
Systematics (circa 1998)
- No high level keys
- Poor high level taxonomy
- Just one phylogeny
- Few living experts!
Circa 5,000 spp.
Mammals & birds
12,000 associations
15,000 potential hosts
5. My data infrastructure (circa 1998)
- Taxonomic names
- Authorities (name concepts)
- Citations
- Collection data
- Morphological characters
- Textual descriptions
- Diagnostic keys
- Illustrations
- Photographs
Palma, R.L., and
R.L.C. Pilgrim.
2002. A revision
of the genus
Naubates
(Insecta:
Phthiraptera:
Philopteridae). J.
R. Soc. N.Z. 32:760.
142 pieces of “raw”
data in 4 of 54 pages,
in 1 of 9,110 taxonomic
papers on lice
6. “The bane of my existence is doing
things that I know the computer could do
for me”
-- Dan Connolly, The XML Revolution
(Nature, 1998)
7. My data infrastructures (circa 2004)
Images
Specimens
(SID)
LouseBASE
Glasgow version at:
http://darwin.zoology.gla.ac.uk/~rpage/LouseBase/2/
Lab Notebook
Literature
http://darwin.zoology.gla.ac.uk/~SID/
Host-Parasite Checklists
PHPBib
http://www2.flmnh.ufl.edu/pdb/
http://myphpbib.sourceforge.net/
http://www2.flmnh.ufl.edu/adb/
8. My publications in 2004 (enabled by these infrastructures)
Making louse research more efficient, more collaborative and more productive
Biol. Letters
Zoo. Scripta
Syst. Biol.
Specimens
Grzimek’s Ency.
Mol. Phyl. Evol.
Images
Ent. Abh.
Proc. R. Soc. B
Lab Notebooks
PLoS Biology
Science
Literature
Checklists
9. Why data infrastructures at the NHM: lots of potential
Card indices
Library
Archives
Staff
Frozen Tissue
Labels
Slides
Spirit
Dry
11. Recent NHM investment in science data infrastructures
1. KE EMu (collections data)
•
•
•
Improved interface (speed, complexity, data quality, support)
Rapid Data Entry Web-Interface
Improved import & export functionality (CLD & data portal)
2. DAMS (multimedia) ?
•
Review (Digital Strategy Group)
3. NHM Virtual Library (literature)
•
•
Integrated search & discovery of NHM resources
Better integration with external resources
4. NHM Data Portal (access, citation & archival)
•
•
•
•
Discovery & visualisation of collections data on the Web
Web exposure & archival of NHM research datasets
Sub-portals for collaborative projects
As strategically important as the Web in 3 years time!
Enabling the NHM mission?
Collections
Public Engagement Research
12. What are Scratchpads? (http://scratchpads.eu)
External investment in science data infrastructures
1. ViBRANT (EU FP7 Infrastructures, 17 partners, €4.75M)
•
•
Virtual Biodiversity Research & Access Network for Taxonomy
Building & integrating tools supporting biodiversity research communities
(publishing, literature & vocabulary management, ID keys, conservation assessments,
mapping & visualisation tools, citizen science support)
2. e-Monocot (NERC Consortium; Kew Oxford & NHM, £2.38M)
•
•
Sustainable, integrated resource on Monocot plants
Content and supporting digital infrastructure
(Complete family level keys & taxon pages; generic keys & pages for 8 families; select
species-level resources from European Monocots, Red-list species and Slipper orchids)
3. SYNTHESYS 1,2 & 3 (EU FP5/6/7 Infrastructures, 18 partners, €10M)
•
•
Support for physical access to participating collections
JRA: Research into mass collections digitisation
(Image analysis, segmentation, transcription & crowdsourcing)
4. Others
•
•
Open-UP
BHL-EUROPE
ViBRANT
Virtual Biodiversity
13. What are Scratchpads? (http://scratchpads.eu)
Scratchpad VRE: foundation for ViBRANT & eMonocot
Taxa
(Classifications, taxon profiles, specimens, literature, images, maps, phenotypic, genotypic
& morphometric datasets, keys, phylogenies)
Conservation
Projects
Regions
Societies
14. Impact: What are Scratchpads? (http://scratchpads.eu)
Scratchpad usage (July 2013)
525 Scratchpad Communities
by
6,550 active registered users
covering
73,444 taxa
in 535,317 pages.
In total more than
1,300,000 visitors
81 paper citations in 2012
Per month unique visitors to Scratchpad sites
119 NHM staff,
83 sites
65,000
unique visitors/month
16. Digital Ambition: NHM Science Strategy 2013-2017
A New Voyage of Discovery
Three Focal Areas
1. Scientific discovery
2. Scientific infrastructure
3. Scientific engagement
Five Challenges
1. The digital NHM
2. Origins, evolution & futures
3. Biodiversity discovery
4. Natural resources & hazards
5. Science, society & skills
Resources & funding
Measuring success
17. Digital Ambition: NHM Science Strategy 2013-2017
A New Voyage of Discovery
Three Focal Areas
1. Scientific discovery
2. Scientific Infrastructure
3. Scientific engagement
Five Challenges
1. The digital NHM
2. Origins, evolution & futures
3. Biodiversity discovery
4. Natural resources & hazards
5. Science, society & skills
Resources & funding
Measuring success
Collections digitisation
Large-scale use of collections data
New approaches to biodiversity discovery
18. Collections digitisation (data mobalisation)
Target
20M specimens available digitally in 5-years
Challenges
Current fragmented efforts
Heterogeneity of process
Existing data (2.8M lots; 400k geo.; 120k images)
Scale of operation (iCollections, 130k in 1 year)
Transcription (Citizen Sci. / crowdsourcing)
Data quality, annotation & feedback
Resources & funding
Expensive (£20-£60M @ £1-3 per specimen)
Linked to our public offer
Next steps (Sept. 2013)
Coll. Descriptions & protocols
Greater coordination of effort
Programme group with project portfolio?
Planning of digital access via NHM Data Portal
19. Large scale use of collections data (or why digitise)
Data applications help set digitisation priorities
1000
Crop Wild Relatives
500
Invasive alien species
Impacts of climate change
Species conservation & protected areas
Impacts of human development
Biodiversity & human health
Food, farming & biofuels
Sustainable delivery of data
0
Poaceae
Legumino…
Brassicac…
Rosaceae
Solanaceae
Composit…
Rubiaceae
Vitaceae
Anacardi…
Araceae
Arecaceae
Moraceae
Malvaceae
Musaceae
Cucurbita…
Amaryllid…
Grossular…
Amarant…
Aquifoliac…
Theaceae
Juglandac…
Euphorbi…
Apiaceae
Caricaceae
Asparaga…
Dioscorea…
Pedaliace…
Rutaceae
Lauraceae
Betulaceae
Convolvul…
Myrtaceae
Oleaceae
Zingibera…
Bromelia…
Piperaceae
Lecythida…
Potential applications for NHM data
NHM Data Portal
NHM Data portal
Promote access & reuse of data
Sub-portals for specific themes
Delivering content to third parties (e.g. GBIF)
Next steps (requirements)
Storage (Access, backup & archival)
Citation, linking & measuring impact (identifiers)
Data layering & visualisation
H.P.C. (Ecol. niche modeling & analysis)
Data visualisation
20. New approaches to biodiversity discovery (new types of data)
Take home messages from NHM Tropical Biodiversity Symposium
Molecular approaches
Molecular detection & monitoring of organisms is routine
Metagenomics (env. sequencing) commonplace
Whole genomes are normal
The primary route to understanding biodiversity for many
Ecological observatories
3-4 June 2013, NHM
Automated biodiversity detection
Remote sensing (e.g. satellite & acoustic data, drones, camera traps)
Monitoring conspicuous, rare or invasive spp. (algal blooms, palms)
Monitoring human activity
Supplement field research, fills in gaps & scales
Digital infrastructure requirements
Very large quantities of data (2.5-10TB per researcher per yr.)
Doesn’t map to existing NHM collections infrastructures
Challenge current networking & storage capacity
Digital and physical collections become equally important?
22 July, 2013
22. The long view: community informatics challenges
GBIF GBIC Report
(Coming soon)
EU Biodiversity Strategy
(2011)
Biodiv. Inf. Challenges
(2013)
23. Modeling the biosphere: a (the) 30 year goal?
A clear, singular
long-term
vision, that NHM
data can
contribute too
Nature 2013, doi:10.1038/493295a
25. What are Scratchpads?
Infrastructure design principals* (http://scratchpads.eu)
= experience from 7-years with the Scratchpads
= lessons for building NHM data infrastructures?
1. Start with needs - focus on real user needs (not just the ‘official process’)
2. Do less - if someone else is doing it, link to it or use it
3. Design with data - prototype and test with real users on the live website
4. Do the hard work to make it simple - let the computer take the strain
5. Iterate. Then iterate again. - iteration reduces risk & is more sustainable
6. Build for inclusion – it’s easier in the long run
7. Understand context - we are designing for people, not a screen or a brand
8. Build digital services, not websites - there is life beyond the website
9. Be consistent, not uniform - every circumstance is different
10. Make things open: it makes things better - it’s more sustainable
*https://www.gov.uk/designprinciples
26. What are Scratchpads? (http://scratchpads.eu)
Better NHM digital coordination from 2013
Digital Strategy
Group
Developing common vision
High level strategy
Director level engagement
(Science, PEG & Corp. Services)
Digital Design
Group
Digital
Programme
Group
Delivering & leading digital activities
Fund raising (internal & external)
Prioritisation
Administrative support
Resource management
Analysis of impact