Making your data work for you: Scratchpads, publishing & the Biodiversity Data Journal
1. Making your data work for you:
Scratchpads, publishing & the
Biodiversity Data Journal
Linnean Society, UK Vince Smith1, Dave Roberts1 & Lyubomir Penev2
20 September, 2012 1. Natural History Museum, London
2. Pensoft Publishers, Sofia, Bulgaria
vince@vsmith.info
2. Our informatics grand challenge…
“Link together evolutionary
data… by developing
analytical tools and proper
documentation and then
use this framework to
conduct comparative
analyses, studies of
evolutionary process and
biodiversity analyses”
Cyndy Parr, Rob Guralnick, Nico
Cellinese and Rod Page. TREE.
doi:10.1016/j.tree.2011.11.001
3. Our informatics grand challenge…
“Link together evolutionary This requires data, information
data… by developing & knowledge to be…
analytical tools and proper
documentation and then • Digital
use this framework to Not printed paper
conduct comparative • Openly accessible
analyses, studies of
evolutionary process and Not behind barriers
biodiversity analyses” • Linked-up
Not in silos
Cyndy Parr, Rob Guralnick, Nico
Cellinese and Rod Page. TREE.
doi:10.1016/j.tree.2011.11.001
4. Most of our output is not digital, open or linked
• 15-20k new spp. described annually (2M total)1
• 30k nomenclatural acts (12M total) 1
• 20k phylogenies (750k total)2
• 31k taxa sequenced (360k taxa total)3
• 800k BioMed papers (40M total pp. of taxonomy) 4
• Countless specimens, images, maps, keys…
Typically generated by small
communities for “local” research
projects
Figures from 1) Zhang, Zootaxa 2011 4, 1-4; 2) Web-of-Science; 3) Genbank and 4) PubMed.
6. What is a Scratchpad?
A website for you & your community
1 2 3
Your data Uploaded & “Published” & reviewed
tagged on your site
Fast Intuitive Fit for use
7. Scratchpads
• EDIT (07-11), ViBRANT / eMonocot (11-13)
• Hosted websites for taxonomists
• Taxonomic, regional or societal
• Research & publication platform
• Supports the taxonomic workflow
• Modular (Drupal) & flexible
• Two full time developers
• Ecosystem of communities (~450)
http://scratchpads.eu
9. What can Scratchpads do?
+Administration +Groups +Specimens
-Change your site information -Creating a group -Creating a record
-Change you front page -Subscribing to a group -Importing from a spreadsheet
-Change your logo +Image -Linking specimen & location records
-Activity and access logs -Uploading & basic annotation -Linking specimen & pub. records
+Backup -Linking image & location records +Tasks
-Backing up your data -Linking image & specimen records -Creating a tasklist
-Restoring your data -Linking image & publication records +Taxonomy
+Bibliography -Overlay annotations on images -Importing from a spreadsheet
-Creating a record +Layout -Importing from ClassificationBank
-Importing from a ref. manager -Change your theme -Starting from scratch
-Exporting to a reference manager -Menus -Taxonomy manager
+Blog -Blocks and sidebars -Displaying a classification
-Creating and adding a blog +Locations -Adding names
+Custom Content -Creating a record -Deleting names
-Defining a CCK -Importing from a spreadsheet -Taxonomy & panels
-Importing from a spreadsheet +Pages +Users
-Creating a custom view -Creating, editing, cloning & deleting -Your settings
+Fileshare -Configuring the panels template -Adding a new user
-Creating and using a fileshare +Panels -User roles and permissions
+Forum -Adding & configuring content -Adding and editing user profile fields
-Altering the forum settings -Creating a new panel -Logging in
-Creating a container for a forum -Citing a Panels page +Webform
-Creating a new forum +Phylogeny -Creating and using webforms
-Creating a new topic inside a forum -Adding a phylogenetic tree
10. Summary of what Scratchpads can do
• Taxon pages, generated from tagged content (plant/animal)
• Bibliography management
• Character matrixes
• Specimen records
• Distribution maps (from specimens and regional)
• Images, video and sound (bulk import)
• Excel spreadsheet import (dynamically generated)
• Darwin Core Archive export
• Tabular data editing
• Custom content
• User management
• Custom webforms
• EOL data import (taxonomy, species information)
• GBIF Map integration
11. Scratchpad v.1 usage (2007- Mar. 2012)
Nodes, 430, 948
Sites 326
Users 6809
Active Users 5733
(273 w / 759 m)
Users
Range: 1-1049 Sites
Mean: 15
Mode: 1
• Prof. scientists
• Amateur naturalists
• Citizen scientists
ViBRANT SP 2
12. Scratchpad 2 – the new version of Scratchpads
• Launched March 2012
• 120 sites to date
• EOL Fellows
• SP1 migration ongoing
• More professional
• Easier to…
- configure (workflows)
- navigate (facets)
- & populate (MS Excel templates)
• Greater standardisation
• Still highly flexible
• Project profiles (eMonocot)
• Framework for integration
e.g. http://ihs.myspecies.info/
14. Sustainable training, support & development
• Wiki
- Training manuals, videos & glossary
• In-site Support
- One click help within your site
• Training Courses (12 in 2012)
- UK (6), Sweden, (2) Greece (1),
Bulgaria (1), South Africa (1), Brazil (1)
• Ambassadors Programme
- Enthusiastic experienced users
- Local support
• Embedded Issues Queue
- Bug reports
- Feature requests
• Sandbox Site
- http://sandbox.scratchpad.eu
• Open Source Development
http://scratchpad.eu/help - http://scratchpad.eu/develop
15. Online community revision
• Taxonomy is in perpetual beta
- Constantly evolving
- Changing contributors
- Small granular contributions
• Sustainability
- A permanent space to work
- Guaranteed access (2016)
- Easy ways to get the data out
• Open science
- Beyond Open Access
- New ways of working
- Data management plans
Freeloader flies
http://milichiidae.info • Need incentives to use
- More efficient (functions & reuse)
- Attribution & provenance
- Credit via citation
• New forms of publication
16. Publishing observations & taxon data
http://scratchpads.eu > http://gbif.org & http://eol.org
Specimen records & species Pushed to GBIF & EOL
pages on Scratchpads (requires site registration with
GBIF & EOL)
Darwin
Core
Archive
(DwCA)
>19K specimen records >377M specimen records GBIF
> 122k species pages > 1 M species pages in EOL
17. Experiments with article publishing
http://scratchpads.eu > http://pensoft.net
Paper assembled from XML submission, peer review &
Scratchpad database marked-up publication by Pensoft
doi:10.3897/zookeys.50.539
XML
HTML
PDF
5-step workflow for selecting data, Published in Zookeys & Phytokeys
adding metadata & previewing (worldwide coverage)
18. Example papers via Scratchpads…
Blagoderov V, Hippa H, Nel A (2010). ZooKeys 50: Faulwetter S, Chatzigeorgiou G, Galil BS, Brake I, von Tschirnhaus M (2010). ZooKeys 50:
79–90. doi: 10.3897/zookeys.50.506 Nicolaidou A, Arvanitidis C (2011. ZooKeys 150: 91–96. doi: 10.3897/zookeys.50.505
327–345. doi: 10.3897/zookeys.150.1877
http://sciaroidea.info/node/44428 http://polychaetes.marbigen.org/node/35 http://milichiidae.info/node/14995
Live (updated) versions of these papers
19. But…
• Limited uptake in 2 years
- 1 genus
- 6 n. spp
- 11 re-descriptions
• Software bugs
- Pushing the boundaries of SP1
- Fixed in SP2
• Focused on synthetic papers
- Not suited to small papers
- Less emphasis on data
- Hard to properly link in the data
• More effort than MS Word
- Especially for new SP users
21. Why do we need another new journal!!!
Taxonomy needs less fragmentation, not more!
BUT…
• We need to encourage taxonomists to mobilize & describe their data
• This takes considerable effort (e.g. Scratchpads)
• “Arguably” this is best rewarded through credit
• This means papers and citations
• Process must be very easy for authors
• Process must facilitate data reuse
• Meet “Open Data” policy commitments
• The Biodiversity Data Journal is very different…
22. Biodiversity Data Journal (BDJ)
• All data matters: No lower or upper limit of manuscript size!
• Multiple publishing routes (not just Scratchpads)
• ALL within a single online collaborative platform, including
the writing of the manuscript!
• New collaborative article authoring tool
• Community peer review with “open” &“public” options
• This is in addition to conventional peer-review
• Online editorial process and version control
• Standards-compliant (Darwin Core, Dublin Core, NLM etc.)
• Pre-defined Code-compliant article templates
23. BDJ publication & dissemination workflow
GBIF-generated Manuscripts
Scratchpads-
manuscripts from generated from
generated manuscripts
metadata descriptions authors’ databases
Authors
Conventional manuscripts
(MS Word, Open Office) Pensoft Journal System Pensoft Writing Tool
(PJS) (PWT)
Marked up final publication in PDF, HTML and XML formats
24. Pensoft manuscript writing tool
Contributors • Collaborative online editing
(mentor, linguis c editor, copy editor,
poten al reviewer, colleague/friend) Con • Rich text capabilities
trib
u
ng • Various templates for taxon treatments
Inv
ite • Identification keys builder
Taxon treatment • Species occurrence data
Template- import (Darwin Core
based Interac ve key compliant)
manuscript Checklist
Authoring • Smart citation for figures,
Lead author crea on tables, references &
Data paper automated positioning
Inv
ite
g
• Assembling plates from single figures
orin
A uth • References import
• (CrossRef, PubMed Central, etc.)
Co-authors
25. Testing screenshots of the writing tool
Manuscript preview Multi-figure plates Plate layout
ID Key ID Key
preview builder
26. Why publish in the BDJ?
• Joining (small) data into a large data pool
• Open-access, archiving and re-using your data
through data aggregators
• Providing citation record and creditability for data in
the form of peer-reviewed publications
• Facilitating online article authoring and editorial
process for authors, reviewers and editors
• Using a truly innovative dissemination of atomized
content
• Very low-cost. Free in the launch phase, thereafter at
fee that anyone can afford!
27. What will BDJ publish?
• Single taxon treatments and nomenclatural acts
• Local or regional checklists
• Sampling reports and occasional inventories
• Habitat-based checklists and inventories
• Ecological and biological observations of species
and communities?
• Single identification keys
• ANY KIND of biodiversity-related database, including
genomic, ecological and environmental data (data
papers)
• Biodiversity-related software tools
Starting late 2012, early 2013 Recruiting
editors now
28. Acknowledgements
• Scratchpad technical development
- Simon Rycroft, Ben Scott, Ed Baker, Alice Heaton, Katherine Boulton,
• Scratchpad outreach
- Irina Brake, Laurence Livermore, Dimitris Koureas
• E-Monocot
- Paul Wilkin &the Kew team, Charles Godfray & the Oxford team
• ViBRANT
- Dave Roberts, Lucy Reeve & many many more
• Pensoft
- Lyubomir Penev, Teodor Georgiev & colleagues
• Our 7,000+ users
29.
30.
31. Penso Penso Peer-review op ons
Wri ng Journal Public
Community
Tool System Closed
(PWT) (PJS)
Review
Review
Nominated reviewers
requests
Review
Editor
Collabora ve Panel reviewers
online wri ng Online edi ng
Review
Editorial
decision & feedback Public reviewers
Authors
Publica on & All reviews assembled into a
Online edi ng dissemina on single online version
Author’s revised
manuscript
32. Why we need new methods of publishing…
RE-USE
of
CONTENT
Publishing and sharing of primary data
Primary data
Drawings: Slavena Peneva