SlideShare una empresa de Scribd logo
1 de 33
Making your data work for you:
                            Scratchpads, publishing & the
                                Biodiversity Data Journal



Linnean Society, UK   Vince Smith1, Dave Roberts1 & Lyubomir Penev2
20 September, 2012                   1. Natural History Museum, London
                                   2. Pensoft Publishers, Sofia, Bulgaria

                                                      vince@vsmith.info
Our informatics grand challenge…

 “Link together evolutionary
 data… by developing
 analytical tools and proper
 documentation and then
 use this framework to
 conduct comparative
 analyses, studies of
 evolutionary process and
 biodiversity analyses”


         Cyndy Parr, Rob Guralnick, Nico
         Cellinese and Rod Page. TREE.
         doi:10.1016/j.tree.2011.11.001
Our informatics grand challenge…

 “Link together evolutionary               This requires data, information
 data… by developing                       & knowledge to be…
 analytical tools and proper
 documentation and then                       • Digital
 use this framework to                            Not printed paper
 conduct comparative                          • Openly accessible
 analyses, studies of
 evolutionary process and                         Not behind barriers
 biodiversity analyses”                       • Linked-up
                                                   Not in silos
         Cyndy Parr, Rob Guralnick, Nico
         Cellinese and Rod Page. TREE.
         doi:10.1016/j.tree.2011.11.001
Most of our output is not digital, open or linked
 •      15-20k new spp. described annually (2M total)1
 •      30k nomenclatural acts (12M total) 1
 •      20k phylogenies (750k total)2
 •      31k taxa sequenced (360k taxa total)3
 •      800k BioMed papers (40M total pp. of taxonomy) 4
 •      Countless specimens, images, maps, keys…


     Typically generated by small
     communities for “local” research
     projects

     Figures from 1) Zhang, Zootaxa 2011 4, 1-4; 2) Web-of-Science; 3) Genbank and 4) PubMed.
Scratchpad
Virtual Research Environments

    Making taxonomy digital, open & linked
What is a Scratchpad?
 A website for you & your community




         1                      2                 3
     Your data             Uploaded &   “Published” & reviewed
                             tagged           on your site

     Fast                 Intuitive       Fit for use
Scratchpads
                        • EDIT (07-11), ViBRANT / eMonocot (11-13)
                        • Hosted websites for taxonomists
                        • Taxonomic, regional or societal
                        • Research & publication platform
                        • Supports the taxonomic workflow
                        • Modular (Drupal) & flexible
                        • Two full time developers
                        • Ecosystem of communities (~450)




http://scratchpads.eu
Categories of Scratchpads




                                      Taxa
 (Classifications, taxon profiles, specimens, literature, images, maps, phenotypic,
               genotypic & morphometric datasets, keys, phylogenies)




    Conservation           Projects             Regions              Societies
What can Scratchpads do?
 +Administration                         +Groups                                    +Specimens
  -Change your site information            -Creating a group                         -Creating a record
  -Change you front page                   -Subscribing to a group                   -Importing from a spreadsheet
  -Change your logo                      +Image                                      -Linking specimen & location records
  -Activity and access logs                -Uploading & basic annotation             -Linking specimen & pub. records
 +Backup                                   -Linking image & location records        +Tasks
  -Backing up your data                    -Linking image & specimen records         -Creating a tasklist
  -Restoring your data                     -Linking image & publication records     +Taxonomy
 +Bibliography                             -Overlay annotations on images            -Importing from a spreadsheet
  -Creating a record                     +Layout                                     -Importing from ClassificationBank
  -Importing from a ref. manager           -Change your theme                        -Starting from scratch
  -Exporting to a reference manager        -Menus                                    -Taxonomy manager
 +Blog                                     -Blocks and sidebars                      -Displaying a classification
  -Creating and adding a blog            +Locations                                  -Adding names
 +Custom Content                           -Creating a record                        -Deleting names
  -Defining a CCK                          -Importing from a spreadsheet             -Taxonomy & panels
  -Importing from a spreadsheet          +Pages                                     +Users
  -Creating a custom view                  -Creating, editing, cloning & deleting    -Your settings
 +Fileshare                                -Configuring the panels template          -Adding a new user
  -Creating and using a fileshare        +Panels                                     -User roles and permissions
 +Forum                                    -Adding & configuring content             -Adding and editing user profile fields
  -Altering the forum settings             -Creating a new panel                     -Logging in
  -Creating a container for a forum        -Citing a Panels page                    +Webform
  -Creating a new forum                  +Phylogeny                                  -Creating and using webforms
  -Creating a new topic inside a forum     -Adding a phylogenetic tree
Summary of what Scratchpads can do
  •   Taxon pages, generated from tagged content (plant/animal)
  •   Bibliography management
  •   Character matrixes
  •   Specimen records
  •   Distribution maps (from specimens and regional)
  •   Images, video and sound (bulk import)
  •   Excel spreadsheet import (dynamically generated)
  •   Darwin Core Archive export
  •   Tabular data editing
  •   Custom content
  •   User management
  •   Custom webforms
  •   EOL data import (taxonomy, species information)
  •   GBIF Map integration
Scratchpad v.1 usage (2007- Mar. 2012)


   Nodes, 430, 948
   Sites 326
   Users 6809
   Active Users 5733
   (273 w / 759 m)




                                                  Users
  Range: 1-1049          Sites
  Mean: 15
  Mode: 1


 • Prof. scientists
 • Amateur naturalists
 • Citizen scientists
                                 ViBRANT   SP 2
Scratchpad 2 – the new version of Scratchpads
                                     • Launched March 2012
                                     • 120 sites to date
                                     • EOL Fellows
                                     • SP1 migration ongoing

                                     • More professional
                                     • Easier to…
                                         - configure (workflows)
                                         - navigate (facets)
                                         - & populate (MS Excel templates)
                                     •   Greater standardisation
                                     •   Still highly flexible
                                     •   Project profiles (eMonocot)
                                     •   Framework for integration
e.g. http://ihs.myspecies.info/
Getting data in and out of Scratchpads 2
Sustainable training, support & development
                            • Wiki
                              - Training manuals, videos & glossary
                            • In-site Support
                              - One click help within your site
                            • Training Courses (12 in 2012)
                              - UK (6), Sweden, (2) Greece (1),
                                Bulgaria (1), South Africa (1), Brazil (1)
                            • Ambassadors Programme
                              - Enthusiastic experienced users
                              - Local support
                            • Embedded Issues Queue
                              - Bug reports
                              - Feature requests
                            • Sandbox Site
                              - http://sandbox.scratchpad.eu
                            • Open Source Development
http://scratchpad.eu/help     - http://scratchpad.eu/develop
Online community revision
                          • Taxonomy is in perpetual beta
                            - Constantly evolving
                            - Changing contributors
                            - Small granular contributions
                          • Sustainability
                            - A permanent space to work
                            - Guaranteed access (2016)
                            - Easy ways to get the data out
                          • Open science
                            - Beyond Open Access
                            - New ways of working
                            - Data management plans
Freeloader flies
http://milichiidae.info   • Need incentives to use
                            - More efficient (functions & reuse)
                            - Attribution & provenance
                            - Credit via citation
                          • New forms of publication
Publishing observations & taxon data
http://scratchpads.eu > http://gbif.org & http://eol.org

   Specimen records & species                     Pushed to GBIF & EOL
     pages on Scratchpads                       (requires site registration with
                                                         GBIF & EOL)




                                      Darwin
                                       Core
                                     Archive
                                     (DwCA)




     >19K specimen records                     >377M specimen records GBIF
      > 122k species pages                      > 1 M species pages in EOL
Experiments with article publishing
http://scratchpads.eu > http://pensoft.net

     Paper assembled from                     XML submission, peer review &
      Scratchpad database                    marked-up publication by Pensoft
                                             doi:10.3897/zookeys.50.539




                                             XML
                                             HTML
                                             PDF

5-step workflow for selecting data,           Published in Zookeys & Phytokeys
  adding metadata & previewing                      (worldwide coverage)
Example papers via Scratchpads…
  Blagoderov V, Hippa H, Nel A (2010). ZooKeys 50:        Faulwetter S, Chatzigeorgiou G, Galil BS,      Brake I, von Tschirnhaus M (2010). ZooKeys 50:
        79–90. doi: 10.3897/zookeys.50.506             Nicolaidou A, Arvanitidis C (2011. ZooKeys 150:        91–96. doi: 10.3897/zookeys.50.505
                                                          327–345. doi: 10.3897/zookeys.150.1877




  http://sciaroidea.info/node/44428                  http://polychaetes.marbigen.org/node/35             http://milichiidae.info/node/14995

                                                Live (updated) versions of these papers
But…

       • Limited uptake in 2 years
        - 1 genus
        - 6 n. spp
        - 11 re-descriptions
       • Software bugs
        - Pushing the boundaries of SP1
        - Fixed in SP2
       • Focused on synthetic papers
        - Not suited to small papers
        - Less emphasis on data
        - Hard to properly link in the data
       • More effort than MS Word
        - Especially for new SP users
BDJ
The Biodiversity Data Journal

        Making small data big!
Why do we need another new journal!!!
    Taxonomy needs less fragmentation, not more!

 BUT…
 • We need to encourage taxonomists to mobilize & describe their data
 • This takes considerable effort (e.g. Scratchpads)
 • “Arguably” this is best rewarded through credit
 • This means papers and citations
 • Process must be very easy for authors
 • Process must facilitate data reuse
 • Meet “Open Data” policy commitments

 • The Biodiversity Data Journal is very different…
Biodiversity Data Journal (BDJ)

• All data matters: No lower or upper limit of manuscript size!
• Multiple publishing routes (not just Scratchpads)
• ALL within a single online collaborative platform, including
  the writing of the manuscript!
• New collaborative article authoring tool
• Community peer review with “open” &“public” options
• This is in addition to conventional peer-review
• Online editorial process and version control
• Standards-compliant (Darwin Core, Dublin Core, NLM etc.)
• Pre-defined Code-compliant article templates
BDJ publication & dissemination workflow
                             GBIF-generated                                    Manuscripts
                                                       Scratchpads-
                            manuscripts from                                 generated from
                                                   generated manuscripts
                           metadata descriptions                            authors’ databases

      Authors

Conventional manuscripts
 (MS Word, Open Office)    Pensoft Journal System                  Pensoft Writing Tool
                                    (PJS)                                (PWT)



                            Marked up final publication in PDF, HTML and XML formats
Pensoft manuscript writing tool

                             Contributors                                              • Collaborative online editing
              (mentor, linguis c editor, copy editor,
              poten al reviewer, colleague/friend)              Con                    • Rich text capabilities
                                                                   trib
                                                                       u
                                                                           ng          • Various templates for taxon treatments
                    Inv
                       ite                                                             • Identification keys builder

                                                        Taxon treatment                               • Species occurrence data
                     Template-                                                                          import (Darwin Core
                       based                            Interac ve key                                  compliant)
                     manuscript                         Checklist
                                                                           Authoring                  • Smart citation for figures,
Lead author           crea on                                                                           tables, references &
                                                        Data paper                                      automated positioning
              Inv
                    ite

                                                                           g
                                                                                       • Assembling plates from single figures
                                                                       orin
                                                                A   uth                • References import
                                                                                       • (CrossRef, PubMed Central, etc.)


                              Co-authors
Testing screenshots of the writing tool




  Manuscript preview   Multi-figure plates   Plate layout




  ID Key                                        ID Key
  preview                                       builder
Why publish in the BDJ?

• Joining (small) data into a large data pool
• Open-access, archiving and re-using your data
  through data aggregators
• Providing citation record and creditability for data in
  the form of peer-reviewed publications
• Facilitating online article authoring and editorial
  process for authors, reviewers and editors
• Using a truly innovative dissemination of atomized
  content
• Very low-cost. Free in the launch phase, thereafter at
  fee that anyone can afford!
What will BDJ publish?

• Single taxon treatments and nomenclatural acts
• Local or regional checklists
• Sampling reports and occasional inventories
• Habitat-based checklists and inventories
• Ecological and biological observations of species
  and communities?
• Single identification keys
• ANY KIND of biodiversity-related database, including
  genomic, ecological and environmental data (data
  papers)
• Biodiversity-related software tools

    Starting late 2012, early 2013                        Recruiting
                                                         editors now
Acknowledgements
  • Scratchpad technical development
   - Simon Rycroft, Ben Scott, Ed Baker, Alice Heaton, Katherine Boulton,
  • Scratchpad outreach
   - Irina Brake, Laurence Livermore, Dimitris Koureas
  • E-Monocot
   - Paul Wilkin &the Kew team, Charles Godfray & the Oxford team
  • ViBRANT
   - Dave Roberts, Lucy Reeve & many many more
  • Pensoft
   - Lyubomir Penev, Teodor Georgiev & colleagues


  • Our 7,000+ users
Penso                    Penso                               Peer-review op ons
Wri ng                   Journal                                 Public
                                                                          Community
Tool                     System                                                       Closed
(PWT)                    (PJS)
                                                                                                             Review



                                                 Review
                                                                                        Nominated reviewers
                                                 requests
                                                                                                             Review
                                    Editor
      Collabora ve                                                                        Panel reviewers
      online wri ng              Online edi ng


                                                                                                             Review

                                    Editorial
                              decision & feedback                                         Public reviewers
 Authors



                                                  Publica on &                                          All reviews assembled into a
    Online edi ng                                 dissemina on                                               single online version
                      Author’s revised
                        manuscript
Why we need new methods of publishing…



                                                                      RE-USE
                                                                        of
                                                                     CONTENT




                    Publishing and sharing of primary data
     Primary data

                                                             Drawings: Slavena Peneva
Source: Wikipedia

Más contenido relacionado

Similar a Making your data work for you: Scratchpads, publishing & the Biodiversity Data Journal

Scratchpads past,present,future
Scratchpads past,present,futureScratchpads past,present,future
Scratchpads past,present,future
Edward Baker
 
CSC 8101 Non Relational Databases
CSC 8101 Non Relational DatabasesCSC 8101 Non Relational Databases
CSC 8101 Non Relational Databases
sjwoodman
 
Datos enlazados BNE and MARiMbA
Datos enlazados BNE and MARiMbADatos enlazados BNE and MARiMbA
Datos enlazados BNE and MARiMbA
Daniel Vila Suero
 
eMonocot Plenary 09/2011
eMonocot Plenary 09/2011eMonocot Plenary 09/2011
eMonocot Plenary 09/2011
Edward Baker
 

Similar a Making your data work for you: Scratchpads, publishing & the Biodiversity Data Journal (20)

Scratchpads past,present,future
Scratchpads past,present,futureScratchpads past,present,future
Scratchpads past,present,future
 
Scratchpads training course introduction
Scratchpads training course introductionScratchpads training course introduction
Scratchpads training course introduction
 
Small pieces loosely joined: getting louse research online.
Small pieces loosely joined: getting louse research online.Small pieces loosely joined: getting louse research online.
Small pieces loosely joined: getting louse research online.
 
CSC 8101 Non Relational Databases
CSC 8101 Non Relational DatabasesCSC 8101 Non Relational Databases
CSC 8101 Non Relational Databases
 
Don't make me think: biodiversity data publishing made easy
Don't make me think: biodiversity data publishing made easyDon't make me think: biodiversity data publishing made easy
Don't make me think: biodiversity data publishing made easy
 
Bertenthal
BertenthalBertenthal
Bertenthal
 
Building a Digital Library
Building a Digital LibraryBuilding a Digital Library
Building a Digital Library
 
Don’t make me think: biodiversity data publishing made easy
Don’t make me think: biodiversity data publishing made easyDon’t make me think: biodiversity data publishing made easy
Don’t make me think: biodiversity data publishing made easy
 
Scratchpads: past, present and future
Scratchpads: past, present and futureScratchpads: past, present and future
Scratchpads: past, present and future
 
Scratchpads: past, present and future
Scratchpads: past, present and futureScratchpads: past, present and future
Scratchpads: past, present and future
 
Datos enlazados BNE and MARiMbA
Datos enlazados BNE and MARiMbADatos enlazados BNE and MARiMbA
Datos enlazados BNE and MARiMbA
 
Scratchpads: the Virtual Research Environment for biodiversity data
Scratchpads: the Virtual Research Environment for biodiversity dataScratchpads: the Virtual Research Environment for biodiversity data
Scratchpads: the Virtual Research Environment for biodiversity data
 
Roberts leiden110213
Roberts leiden110213Roberts leiden110213
Roberts leiden110213
 
Exploring Data Preparation and Visualization Tools for Urban Forestry
Exploring Data Preparation and Visualization Tools for Urban ForestryExploring Data Preparation and Visualization Tools for Urban Forestry
Exploring Data Preparation and Visualization Tools for Urban Forestry
 
What is New in W3C land?
What is New in W3C land?What is New in W3C land?
What is New in W3C land?
 
Nuxeo World Session: CMIS - What's Next?
Nuxeo World Session: CMIS - What's Next?Nuxeo World Session: CMIS - What's Next?
Nuxeo World Session: CMIS - What's Next?
 
Unicum Dish2011
Unicum Dish2011Unicum Dish2011
Unicum Dish2011
 
Just Digitise It! - Daniel Wilksch
Just Digitise It! - Daniel WilkschJust Digitise It! - Daniel Wilksch
Just Digitise It! - Daniel Wilksch
 
eMonocot Plenary 09/2011
eMonocot Plenary 09/2011eMonocot Plenary 09/2011
eMonocot Plenary 09/2011
 
Publishing biodiversity: The interplay between Scratchpads and the new Biodiv...
Publishing biodiversity: The interplay between Scratchpads and the new Biodiv...Publishing biodiversity: The interplay between Scratchpads and the new Biodiv...
Publishing biodiversity: The interplay between Scratchpads and the new Biodiv...
 

Más de Vince Smith

Más de Vince Smith (20)

DiSSCo institutional benefits
DiSSCo institutional benefitsDiSSCo institutional benefits
DiSSCo institutional benefits
 
NHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeNHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-Life
 
Moving beyond the box: automating the digitisation of insect collections
Moving beyond the box: automating the digitisation of insect collectionsMoving beyond the box: automating the digitisation of insect collections
Moving beyond the box: automating the digitisation of insect collections
 
FP7 Funded RI Project experiences: some overly honest tips from a project coo...
FP7 Funded RI Project experiences: some overly honest tips from a project coo...FP7 Funded RI Project experiences: some overly honest tips from a project coo...
FP7 Funded RI Project experiences: some overly honest tips from a project coo...
 
Use it or lose it: a hybrid model for sustaining e-infrastructures
Use it or lose it: a hybrid model for sustaining e-infrastructuresUse it or lose it: a hybrid model for sustaining e-infrastructures
Use it or lose it: a hybrid model for sustaining e-infrastructures
 
No specimen left behind: Collections digitisation at the NHM, London*
No specimen left behind:  Collections digitisation at the NHM, London*No specimen left behind:  Collections digitisation at the NHM, London*
No specimen left behind: Collections digitisation at the NHM, London*
 
SYNTHESYS 3 Overview
SYNTHESYS 3 OverviewSYNTHESYS 3 Overview
SYNTHESYS 3 Overview
 
Scratchpad 2014-introduction
Scratchpad 2014-introductionScratchpad 2014-introduction
Scratchpad 2014-introduction
 
Consolidated ViBRANT Project Final Review Presentations
Consolidated ViBRANT Project Final Review PresentationsConsolidated ViBRANT Project Final Review Presentations
Consolidated ViBRANT Project Final Review Presentations
 
Assisted restructure of web content for paper-based presentation: a look at w...
Assisted restructure of web content for paper-based presentation: a look at w...Assisted restructure of web content for paper-based presentation: a look at w...
Assisted restructure of web content for paper-based presentation: a look at w...
 
Bibliography of Life: Comprehensive services for biodiversity bibliographic r...
Bibliography of Life: Comprehensive services for biodiversity bibliographic r...Bibliography of Life: Comprehensive services for biodiversity bibliographic r...
Bibliography of Life: Comprehensive services for biodiversity bibliographic r...
 
Next generation sequencing requires next generation publishing: the Biodivers...
Next generation sequencing requires next generation publishing: the Biodivers...Next generation sequencing requires next generation publishing: the Biodivers...
Next generation sequencing requires next generation publishing: the Biodivers...
 
Use it or lose it: crowdsourcing support and outreach activities in a hybrid ...
Use it or lose it: crowdsourcing support and outreach activities in a hybrid ...Use it or lose it: crowdsourcing support and outreach activities in a hybrid ...
Use it or lose it: crowdsourcing support and outreach activities in a hybrid ...
 
Vince smith-delivering biodiversity knowledge in the information age-notext
Vince smith-delivering biodiversity knowledge in the information age-notextVince smith-delivering biodiversity knowledge in the information age-notext
Vince smith-delivering biodiversity knowledge in the information age-notext
 
The biodiversity informatics landscape: a systematics perspective
The biodiversity informatics landscape: a systematics perspectiveThe biodiversity informatics landscape: a systematics perspective
The biodiversity informatics landscape: a systematics perspective
 
Building data infrastructures for science
Building data infrastructures for scienceBuilding data infrastructures for science
Building data infrastructures for science
 
Delivering biodiversity knowledge in the information age
Delivering biodiversity knowledge in the information ageDelivering biodiversity knowledge in the information age
Delivering biodiversity knowledge in the information age
 
The Biodiversity Informatics Landscape
The Biodiversity Informatics LandscapeThe Biodiversity Informatics Landscape
The Biodiversity Informatics Landscape
 
Digitised collections: Toward a digital strategy for for the NHM, London
Digitised collections: Toward a digital strategy for for the NHM, LondonDigitised collections: Toward a digital strategy for for the NHM, London
Digitised collections: Toward a digital strategy for for the NHM, London
 
Virtual Research Environments supporting biodiversity research: Needs & prior...
Virtual Research Environments supporting biodiversity research: Needs & prior...Virtual Research Environments supporting biodiversity research: Needs & prior...
Virtual Research Environments supporting biodiversity research: Needs & prior...
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

Making your data work for you: Scratchpads, publishing & the Biodiversity Data Journal

  • 1. Making your data work for you: Scratchpads, publishing & the Biodiversity Data Journal Linnean Society, UK Vince Smith1, Dave Roberts1 & Lyubomir Penev2 20 September, 2012 1. Natural History Museum, London 2. Pensoft Publishers, Sofia, Bulgaria vince@vsmith.info
  • 2. Our informatics grand challenge… “Link together evolutionary data… by developing analytical tools and proper documentation and then use this framework to conduct comparative analyses, studies of evolutionary process and biodiversity analyses” Cyndy Parr, Rob Guralnick, Nico Cellinese and Rod Page. TREE. doi:10.1016/j.tree.2011.11.001
  • 3. Our informatics grand challenge… “Link together evolutionary This requires data, information data… by developing & knowledge to be… analytical tools and proper documentation and then • Digital use this framework to Not printed paper conduct comparative • Openly accessible analyses, studies of evolutionary process and Not behind barriers biodiversity analyses” • Linked-up Not in silos Cyndy Parr, Rob Guralnick, Nico Cellinese and Rod Page. TREE. doi:10.1016/j.tree.2011.11.001
  • 4. Most of our output is not digital, open or linked • 15-20k new spp. described annually (2M total)1 • 30k nomenclatural acts (12M total) 1 • 20k phylogenies (750k total)2 • 31k taxa sequenced (360k taxa total)3 • 800k BioMed papers (40M total pp. of taxonomy) 4 • Countless specimens, images, maps, keys… Typically generated by small communities for “local” research projects Figures from 1) Zhang, Zootaxa 2011 4, 1-4; 2) Web-of-Science; 3) Genbank and 4) PubMed.
  • 5. Scratchpad Virtual Research Environments Making taxonomy digital, open & linked
  • 6. What is a Scratchpad? A website for you & your community 1 2 3 Your data Uploaded & “Published” & reviewed tagged on your site Fast Intuitive Fit for use
  • 7. Scratchpads • EDIT (07-11), ViBRANT / eMonocot (11-13) • Hosted websites for taxonomists • Taxonomic, regional or societal • Research & publication platform • Supports the taxonomic workflow • Modular (Drupal) & flexible • Two full time developers • Ecosystem of communities (~450) http://scratchpads.eu
  • 8. Categories of Scratchpads Taxa (Classifications, taxon profiles, specimens, literature, images, maps, phenotypic, genotypic & morphometric datasets, keys, phylogenies) Conservation Projects Regions Societies
  • 9. What can Scratchpads do? +Administration +Groups +Specimens -Change your site information -Creating a group -Creating a record -Change you front page -Subscribing to a group -Importing from a spreadsheet -Change your logo +Image -Linking specimen & location records -Activity and access logs -Uploading & basic annotation -Linking specimen & pub. records +Backup -Linking image & location records +Tasks -Backing up your data -Linking image & specimen records -Creating a tasklist -Restoring your data -Linking image & publication records +Taxonomy +Bibliography -Overlay annotations on images -Importing from a spreadsheet -Creating a record +Layout -Importing from ClassificationBank -Importing from a ref. manager -Change your theme -Starting from scratch -Exporting to a reference manager -Menus -Taxonomy manager +Blog -Blocks and sidebars -Displaying a classification -Creating and adding a blog +Locations -Adding names +Custom Content -Creating a record -Deleting names -Defining a CCK -Importing from a spreadsheet -Taxonomy & panels -Importing from a spreadsheet +Pages +Users -Creating a custom view -Creating, editing, cloning & deleting -Your settings +Fileshare -Configuring the panels template -Adding a new user -Creating and using a fileshare +Panels -User roles and permissions +Forum -Adding & configuring content -Adding and editing user profile fields -Altering the forum settings -Creating a new panel -Logging in -Creating a container for a forum -Citing a Panels page +Webform -Creating a new forum +Phylogeny -Creating and using webforms -Creating a new topic inside a forum -Adding a phylogenetic tree
  • 10. Summary of what Scratchpads can do • Taxon pages, generated from tagged content (plant/animal) • Bibliography management • Character matrixes • Specimen records • Distribution maps (from specimens and regional) • Images, video and sound (bulk import) • Excel spreadsheet import (dynamically generated) • Darwin Core Archive export • Tabular data editing • Custom content • User management • Custom webforms • EOL data import (taxonomy, species information) • GBIF Map integration
  • 11. Scratchpad v.1 usage (2007- Mar. 2012) Nodes, 430, 948 Sites 326 Users 6809 Active Users 5733 (273 w / 759 m) Users Range: 1-1049 Sites Mean: 15 Mode: 1 • Prof. scientists • Amateur naturalists • Citizen scientists ViBRANT SP 2
  • 12. Scratchpad 2 – the new version of Scratchpads • Launched March 2012 • 120 sites to date • EOL Fellows • SP1 migration ongoing • More professional • Easier to… - configure (workflows) - navigate (facets) - & populate (MS Excel templates) • Greater standardisation • Still highly flexible • Project profiles (eMonocot) • Framework for integration e.g. http://ihs.myspecies.info/
  • 13. Getting data in and out of Scratchpads 2
  • 14. Sustainable training, support & development • Wiki - Training manuals, videos & glossary • In-site Support - One click help within your site • Training Courses (12 in 2012) - UK (6), Sweden, (2) Greece (1), Bulgaria (1), South Africa (1), Brazil (1) • Ambassadors Programme - Enthusiastic experienced users - Local support • Embedded Issues Queue - Bug reports - Feature requests • Sandbox Site - http://sandbox.scratchpad.eu • Open Source Development http://scratchpad.eu/help - http://scratchpad.eu/develop
  • 15. Online community revision • Taxonomy is in perpetual beta - Constantly evolving - Changing contributors - Small granular contributions • Sustainability - A permanent space to work - Guaranteed access (2016) - Easy ways to get the data out • Open science - Beyond Open Access - New ways of working - Data management plans Freeloader flies http://milichiidae.info • Need incentives to use - More efficient (functions & reuse) - Attribution & provenance - Credit via citation • New forms of publication
  • 16. Publishing observations & taxon data http://scratchpads.eu > http://gbif.org & http://eol.org Specimen records & species Pushed to GBIF & EOL pages on Scratchpads (requires site registration with GBIF & EOL) Darwin Core Archive (DwCA) >19K specimen records >377M specimen records GBIF > 122k species pages > 1 M species pages in EOL
  • 17. Experiments with article publishing http://scratchpads.eu > http://pensoft.net Paper assembled from XML submission, peer review & Scratchpad database marked-up publication by Pensoft doi:10.3897/zookeys.50.539 XML HTML PDF 5-step workflow for selecting data, Published in Zookeys & Phytokeys adding metadata & previewing (worldwide coverage)
  • 18. Example papers via Scratchpads… Blagoderov V, Hippa H, Nel A (2010). ZooKeys 50: Faulwetter S, Chatzigeorgiou G, Galil BS, Brake I, von Tschirnhaus M (2010). ZooKeys 50: 79–90. doi: 10.3897/zookeys.50.506 Nicolaidou A, Arvanitidis C (2011. ZooKeys 150: 91–96. doi: 10.3897/zookeys.50.505 327–345. doi: 10.3897/zookeys.150.1877 http://sciaroidea.info/node/44428 http://polychaetes.marbigen.org/node/35 http://milichiidae.info/node/14995 Live (updated) versions of these papers
  • 19. But… • Limited uptake in 2 years - 1 genus - 6 n. spp - 11 re-descriptions • Software bugs - Pushing the boundaries of SP1 - Fixed in SP2 • Focused on synthetic papers - Not suited to small papers - Less emphasis on data - Hard to properly link in the data • More effort than MS Word - Especially for new SP users
  • 20. BDJ The Biodiversity Data Journal Making small data big!
  • 21. Why do we need another new journal!!! Taxonomy needs less fragmentation, not more! BUT… • We need to encourage taxonomists to mobilize & describe their data • This takes considerable effort (e.g. Scratchpads) • “Arguably” this is best rewarded through credit • This means papers and citations • Process must be very easy for authors • Process must facilitate data reuse • Meet “Open Data” policy commitments • The Biodiversity Data Journal is very different…
  • 22. Biodiversity Data Journal (BDJ) • All data matters: No lower or upper limit of manuscript size! • Multiple publishing routes (not just Scratchpads) • ALL within a single online collaborative platform, including the writing of the manuscript! • New collaborative article authoring tool • Community peer review with “open” &“public” options • This is in addition to conventional peer-review • Online editorial process and version control • Standards-compliant (Darwin Core, Dublin Core, NLM etc.) • Pre-defined Code-compliant article templates
  • 23. BDJ publication & dissemination workflow GBIF-generated Manuscripts Scratchpads- manuscripts from generated from generated manuscripts metadata descriptions authors’ databases Authors Conventional manuscripts (MS Word, Open Office) Pensoft Journal System Pensoft Writing Tool (PJS) (PWT) Marked up final publication in PDF, HTML and XML formats
  • 24. Pensoft manuscript writing tool Contributors • Collaborative online editing (mentor, linguis c editor, copy editor, poten al reviewer, colleague/friend) Con • Rich text capabilities trib u ng • Various templates for taxon treatments Inv ite • Identification keys builder Taxon treatment • Species occurrence data Template- import (Darwin Core based Interac ve key compliant) manuscript Checklist Authoring • Smart citation for figures, Lead author crea on tables, references & Data paper automated positioning Inv ite g • Assembling plates from single figures orin A uth • References import • (CrossRef, PubMed Central, etc.) Co-authors
  • 25. Testing screenshots of the writing tool Manuscript preview Multi-figure plates Plate layout ID Key ID Key preview builder
  • 26. Why publish in the BDJ? • Joining (small) data into a large data pool • Open-access, archiving and re-using your data through data aggregators • Providing citation record and creditability for data in the form of peer-reviewed publications • Facilitating online article authoring and editorial process for authors, reviewers and editors • Using a truly innovative dissemination of atomized content • Very low-cost. Free in the launch phase, thereafter at fee that anyone can afford!
  • 27. What will BDJ publish? • Single taxon treatments and nomenclatural acts • Local or regional checklists • Sampling reports and occasional inventories • Habitat-based checklists and inventories • Ecological and biological observations of species and communities? • Single identification keys • ANY KIND of biodiversity-related database, including genomic, ecological and environmental data (data papers) • Biodiversity-related software tools Starting late 2012, early 2013 Recruiting editors now
  • 28. Acknowledgements • Scratchpad technical development - Simon Rycroft, Ben Scott, Ed Baker, Alice Heaton, Katherine Boulton, • Scratchpad outreach - Irina Brake, Laurence Livermore, Dimitris Koureas • E-Monocot - Paul Wilkin &the Kew team, Charles Godfray & the Oxford team • ViBRANT - Dave Roberts, Lucy Reeve & many many more • Pensoft - Lyubomir Penev, Teodor Georgiev & colleagues • Our 7,000+ users
  • 29.
  • 30.
  • 31. Penso Penso Peer-review op ons Wri ng Journal Public Community Tool System Closed (PWT) (PJS) Review Review Nominated reviewers requests Review Editor Collabora ve Panel reviewers online wri ng Online edi ng Review Editorial decision & feedback Public reviewers Authors Publica on & All reviews assembled into a Online edi ng dissemina on single online version Author’s revised manuscript
  • 32. Why we need new methods of publishing… RE-USE of CONTENT Publishing and sharing of primary data Primary data Drawings: Slavena Peneva