SlideShare una empresa de Scribd logo
1 de 31
brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress
Brian Hole, Founder and CEO
ISI CODATA Workshop, Bangalore, 9th March 2015
Preparing data for (open) publication
brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress
Overview
 Think open
 Plan early and well
 Examples
 Follow best practices
brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress
think open
brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress
brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress
Plan early & well
brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress
Data management plans
• US:
• National Science Foundation (NSF)
• National Endowment for the Humanities (NEH)
• National Aeronautics and Space Administration (NASA)
• National Oceanic and Atmospheric Administration (NOAA)
• Institute of Museum and Library Services (IMLS)
• Agency for Healthcare Research and Quality (AHRQ)
• Gordon & Betty Moore Foundation
• Alfred P. Sloan Foundation
• UK: Economic and Social Research Council (ESRC)
• Encourage that best practices are followed
• Provide a structured approach to data throughout its
lifecycle
• Now mandated by many funders
• Europe: Horizon 2020
• Other international mandates: http://www.sherpa.ac.uk
brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress
DMP structure
5. Storage and backup
6. Selection and preservation
3. Documentation and metadata
1. Administrative data
4. Ethics and legal compliance
2. Data collection
7. Data sharing
8. Responsibilities and resources
Source: DCC. (2013). Checklist for a Data Management Plan. v.4.0. Edinburgh: Digital Curation Centre.
Available online: http://www.dcc.ac.uk/resources/data-management-plans
brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress
1. Administrative data
• Basic information e.g. project
title, your name, contact details,
reference numbers / IDs
Here you should record basic information to identify and
contextualise your plan. Identifiers may help to link your DMP
with information held in other systems. You should include:
• A summary of the research to
explain the purpose for which
data are being collected
• Details of related policies and
procedures e.g. institutional data
policy or departmental guidelines
Source: XKCD, http://xkcd.com/97/
brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress
2. Data collection
• How will you structure and name your folders and files?
• What quality assurance processes will you adopt?
• What standards or
methodologies will you
use to create data?
Here you should consider what data you will collect and how.
• Do your chosen formats and
software enable sharing and
long-term access to the data?
• Are there any existing data
that you can reuse?
Source: SMBC, http://smbc-comics.com/
index.php?db=comics&id=1849
brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress
3. Documentation and metadata
• How will you capture / create this
documentation and metadata?
• What documentation and
metadata will accompany the data?
Here you should consider what
information is needed for the data
to be to be read and interpreted in
the future. Estimate how much
time and effort will be needed to
create this supporting
documentation and ensure that you
allow for sufficient resource.
• What metadata standards will you
use and why?
Source: Gary Larson
brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress
4. Ethics and metadata
• How will you protect the identity of participants if required?
e.g. via anonymisation
• Will data sharing be postponed / restricted? e.g. to publish
or seek patents
Here you should consider any
ethical or legal issues,
particularly in terms of
restrictions they may place on
data sharing.
• Have you gained consent for
data sharing and preservation?
• How will the data be licensed for reuse?
Source: SMBC, http://smbc-comics.com/
index.php?db=comics&id=1957
brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress
5. Storage and backup
• What are the risks to data security
and how will these be managed?
Here you should consider where
the data will be stored and any
implications this has for backup,
access and security.
• Who will be responsible for backup
and recovery?
• Do you have sufficient storage or
will you need to include charges for
additional services?
• How will you ensure that
collaborators can access your data
securely?
Source: SMBC, http://smbc-comics.com/
index.php?db=comics&id=2237
brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress
6. Selection and preservation
• What are the foreseeable research uses for your data?
• Which data should be preserved and potentially shared?
• Which data must be retained or
destroyed for contractual, legal,
or regulatory purposes?
Here you should determine which data
are of long-term value and should be
preserved. Decide how best to preserve
those data, for example by depositing in
repositories.
• What is the long-term preservation plan for the dataset?
• Have you costed in the time and effort required to prepare
the data for preservation and sharing?
Source: XKCD, http://xkcd.com/309/
brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress
7. Data sharing
• When will you make the data
available?
• Are any restrictions on data sharing required?
Here you should consider which data
you will share and how. The methods
used will depend on a number of
factors such as the type, size,
complexity and sensitivity of the data.
Also consider how people might
acknowledge the reuse of your data
(e.g. via citations) so you gain impact.
• With whom will you share the data,
and under what conditions?
• What action will you take to overcome or minimise restrictions?
• How will potential users find out about your data?
Source: SMBC, http://smbc-comics.com/
index.php?db=comics&id=100
brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress
7. Responsibilities and resources
• Who is responsible for implementing the
DMP, and ensuring it is reviewed and
revised?
• How will responsibilities be split across
partner sites in collaborative research projects?
Here you should assign roles and
responsibilities for all data management
activities. Also carefully consider any
resources needed to deliver your plan.
These costs can usually be written into
grant applications but need to be clearly
outlined and justified.
• What resources will you require to deliver your plan?
• Is additional specialist expertise or equipment required?
Source: SMBC, http://smbc-comics.com/
index.php?db=comics&id=1893
brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress
Online tools
• DMPOnline (UK Digital Curation Centre)
https://dmponline.dcc.ac.uk/
• DMPTool (California Digital Library)
https://dmp.cdlib.org/
brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress
examples
brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress
Repositories
Modified from: XKCD
brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress
GenBank
• Two upload tools – Bankit for short sequences, Sequin for complex or multiple
sequences
• Sequence data uploaded as a FASTA file
• Immediate or future release instruction
• Citation of a reference paper
• Names of source organisms and any related descriptive data
• Sequence features (e.g. CDS, gene, rRNA, tRNA, with nucleotide intervals and
product names) and topology
• Organism name, applicable source modifiers, location
• Genus and species names (if not previously provided in FASTA file)
• If name is new or unrecognized, provide best known taxonomic lineage
• If genus and/or species names are not known, provide most specific name known
(for example:Bacillus sp., Uncultured bacterium, Uncultured archaeon)
• Most complete name for any synthetic vector (for example: Cloning vector
pAB234, Transfer vector p789Abc)
• Source modifiers include: strain, clone, isolate, specimen-voucher, isolation-
source, country
• Location: organelle (mitochondrion, chloroplast, etc); map and/or chromosome
brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress
ClinicalTrials.gov
• Web-based data entry system called the Protocol Registration and Results System
(PRS)
• Section 801 of the US Food and Drug Administration Amendments Act of 2007
requires clinical trial registration and the submission of results
• Standard format
• Study Type: ‘Observational’ or ‘Interventional’
• Outcome Measures: The Primary and Secondary Outcome Measure Titles and
Descriptions
• Outcome Measure Time Frame
• Conditions or Focus of the Study
• Intervention Information: Each intervention is entered separately using the
Intervention Type, Name, and Description data elements
• Eligibility: List of key inclusion and exclusion criteria
• Locations
brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress
FlyBase
• FlyBase contains a complete annotation of the Drosophila melanogaster genome
• It also includes a searchable bibliography of research on Drosophila genetics
• Detail which genes feature in
your paper, and FB will link
your paper to those genes for
the next release cycle.
• Provide additional
information during the
submission process about
your publication and help the
Curators to speed up your
curation.
• The whole process takes
about 5mins!
brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress
Figshare
• Accepts all types and formats of data, no restrictions
• Concise metadata, takes around 5 minutes
• Multiple licenses, no cost.
brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress
Dryad
Typical process:
• Authors submit their manuscripts to the journal for consideration.
• Journal provides information about manuscripts to Dryad through automated notices
from the manuscript processing system, which creates a provisional Dryad record for the
data.
• Journal invites authors to archive data in Dryad, through a custom submission link that
brings the author to the provisional record.
• Authors upload their files to Dryad through the submission link supplied by the journal; no
redundant information need be entered and the article details are correct.
• Dryad Curators process and approve the data files and register the Digital Object Identifier
(DOI), a permanent identifier that allows the data to be cited and tracked; curators convey
the DOI to the journal.
• Journal and publisher add the Dryad DOI to all forms of the final article, enabling readers
of the article to access the data.
• Dryad can also provide links to data in other repositories, including sequences in GenBank
and phylogenetic trees in TreeBASE.
• License: CC0
• Cost: $80 / ₹5,000
brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress
brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress
Nature Scientific Data
• Papers are called
“data descriptors”
• Fill out and submit a
paper template
• Requires an ISA-tab
metadata file
• Quality of data a major
focus.
• CC-BY/NC
• APC £890 / ₹84,000
brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress
GigaScience
• Data submitted to public databases, complemented by its citable
form in GigaDB
• License: CC0
• APC: £650 / ₹61,000
brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress
Ubiquity Press
1. The paper contents
a. The methods section of the paper must provide
sufficient detail that a reader can understand how
the resource was created.
b. The resource must be correctly described.
c. The reuse section must provide concrete and useful
suggestions for reuse of the reuse.
2. The deposited resource
a. The repository must be suitable for resource
and have a sustainability model.
b. Open license permits unrestricted access (e.g. CC0),
or access guaranteed if criteria met (must qualify)
c. A version in an open, non-proprietary format.
d. Labeled in such a way that a 3rd party can make
sense of it.
e. Must be actionable.
brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress
The basics of the model
Data papers are short
1) Low barrier data publication
Peer review is quick and objective
2) Online authoring
Low APC: £100 / ₹1,000
Lower cost (straight to XML)
Encourages shorter form
3) Open access only (CC-BY)
4) The publisher is not the repository
No-questions-asked waivers
brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress
follow best
practices
brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress
11 Best practices for data publication
• Record your methodology well – think about reproducibility
• Make sure you can export to open formats
• Record your selection and QA processes well
• Choose appropriate metadata standards, record from the beginning
• Ensure you obtain proper consent, and that it allows for open
publication if possible
• Consider the timing of data publication – e.g. to coincide with research
papers
• Consider potential reuse scenarios from the start
• Choose an appropriate repository
• Think about possible restrictions and access conditions early – justify
and seek to minimise
• Plan to publish with maximum dissemination – data paper?
• Allocate time and funding for data publication in grant proposals
brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress
Any questions?
Or please feel free to contact
brian.hole@ubiquitypress.com

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Building a scalable, sustainable service with OJS
Building a scalable, sustainable service with OJSBuilding a scalable, sustainable service with OJS
Building a scalable, sustainable service with OJS
 
Research Data Publishing
Research Data PublishingResearch Data Publishing
Research Data Publishing
 
Brian Hole Open Access - LSE 2013 talk
Brian Hole Open Access - LSE 2013 talkBrian Hole Open Access - LSE 2013 talk
Brian Hole Open Access - LSE 2013 talk
 
Introducing PRIME:Publisher, Repository and Institutional Metadata Exchange
Introducing PRIME:Publisher, Repository and Institutional Metadata ExchangeIntroducing PRIME:Publisher, Repository and Institutional Metadata Exchange
Introducing PRIME:Publisher, Repository and Institutional Metadata Exchange
 
Open Access: Advantages, Funding, Opportunities
Open Access: Advantages, Funding, Opportunities Open Access: Advantages, Funding, Opportunities
Open Access: Advantages, Funding, Opportunities
 
PRIME: Achievements, Challenges & Recommendations
PRIME: Achievements, Challenges & RecommendationsPRIME: Achievements, Challenges & Recommendations
PRIME: Achievements, Challenges & Recommendations
 
Obtaining Credit for Research Software
Obtaining Credit for Research SoftwareObtaining Credit for Research Software
Obtaining Credit for Research Software
 
Brian Hole - The Shift to Open Access Publishing, UCL DH 2013
Brian Hole - The Shift to Open Access Publishing, UCL DH 2013Brian Hole - The Shift to Open Access Publishing, UCL DH 2013
Brian Hole - The Shift to Open Access Publishing, UCL DH 2013
 
Data Citation: A Critical Role for Publishers
Data Citation: A Critical Role for PublishersData Citation: A Critical Role for Publishers
Data Citation: A Critical Role for Publishers
 
The Ubiquity Partner Network: Enabling Library-Based Publishing
The Ubiquity Partner Network: Enabling Library-Based PublishingThe Ubiquity Partner Network: Enabling Library-Based Publishing
The Ubiquity Partner Network: Enabling Library-Based Publishing
 
Overcoming Obstacles to Sharing Research Data
Overcoming Obstacles to Sharing Research DataOvercoming Obstacles to Sharing Research Data
Overcoming Obstacles to Sharing Research Data
 
The Shift to Open Access Publishing
The Shift to Open Access PublishingThe Shift to Open Access Publishing
The Shift to Open Access Publishing
 
Data availability policies and licensing
Data availability policies and licensingData availability policies and licensing
Data availability policies and licensing
 
PRIME: Publisher, Repository & Institutional Metadata Exchange
PRIME: Publisher, Repository & Institutional Metadata ExchangePRIME: Publisher, Repository & Institutional Metadata Exchange
PRIME: Publisher, Repository & Institutional Metadata Exchange
 
Disrupting Academic Publishing: Returning Control to Universities
Disrupting Academic Publishing: Returning Control to UniversitiesDisrupting Academic Publishing: Returning Control to Universities
Disrupting Academic Publishing: Returning Control to Universities
 
Brian Hole - Text and Data Mining - European Parliament presentation
Brian Hole - Text and Data Mining - European Parliament presentationBrian Hole - Text and Data Mining - European Parliament presentation
Brian Hole - Text and Data Mining - European Parliament presentation
 
Disrupting Academic Publishing
Disrupting Academic PublishingDisrupting Academic Publishing
Disrupting Academic Publishing
 
Ubiquity Press: open scholarship
Ubiquity Press: open scholarshipUbiquity Press: open scholarship
Ubiquity Press: open scholarship
 
The Journal of Open Economics Data
The Journal of Open Economics DataThe Journal of Open Economics Data
The Journal of Open Economics Data
 
Open Access is Just the Beginning: Disrupting Publishing
Open Access is Just the Beginning: Disrupting PublishingOpen Access is Just the Beginning: Disrupting Publishing
Open Access is Just the Beginning: Disrupting Publishing
 

Destacado

PTSD Dec 2013 Publication Uwo Dev Dis Newsletter
PTSD Dec 2013 Publication Uwo Dev Dis NewsletterPTSD Dec 2013 Publication Uwo Dev Dis Newsletter
PTSD Dec 2013 Publication Uwo Dev Dis Newsletter
Bob King
 
Xiance Thesis-China’s Foreign Policy under Xi Jinping
Xiance Thesis-China’s Foreign Policy under Xi JinpingXiance Thesis-China’s Foreign Policy under Xi Jinping
Xiance Thesis-China’s Foreign Policy under Xi Jinping
Xiance Wang
 

Destacado (15)

Test
TestTest
Test
 
Tomorrowland
TomorrowlandTomorrowland
Tomorrowland
 
TE QUEDASTE SIN SABER QUE HACER CUANDO
TE QUEDASTE SIN SABER QUE HACER CUANDOTE QUEDASTE SIN SABER QUE HACER CUANDO
TE QUEDASTE SIN SABER QUE HACER CUANDO
 
PTSD Dec 2013 Publication Uwo Dev Dis Newsletter
PTSD Dec 2013 Publication Uwo Dev Dis NewsletterPTSD Dec 2013 Publication Uwo Dev Dis Newsletter
PTSD Dec 2013 Publication Uwo Dev Dis Newsletter
 
Promotional keyring about us
Promotional keyring about usPromotional keyring about us
Promotional keyring about us
 
Reglamento estudiantil
Reglamento estudiantilReglamento estudiantil
Reglamento estudiantil
 
AAUP 2016: Accessibility Presentation (J. Axelrod)
AAUP 2016: Accessibility Presentation (J. Axelrod)AAUP 2016: Accessibility Presentation (J. Axelrod)
AAUP 2016: Accessibility Presentation (J. Axelrod)
 
AAUP 2012: PDA and Libraries (R. Anderson)
AAUP 2012: PDA and Libraries (R. Anderson)AAUP 2012: PDA and Libraries (R. Anderson)
AAUP 2012: PDA and Libraries (R. Anderson)
 
Trudow lager
Trudow lagerTrudow lager
Trudow lager
 
Actividades
ActividadesActividades
Actividades
 
Manual de gmail
Manual de gmailManual de gmail
Manual de gmail
 
Организация процесса самообразования в педагогической деятельности учителей ...
Организация процесса самообразования  в педагогической деятельности учителей ...Организация процесса самообразования  в педагогической деятельности учителей ...
Организация процесса самообразования в педагогической деятельности учителей ...
 
Valentine´s Day (dia dos namorados
Valentine´s Day (dia dos namorados Valentine´s Day (dia dos namorados
Valentine´s Day (dia dos namorados
 
Xiance Thesis-China’s Foreign Policy under Xi Jinping
Xiance Thesis-China’s Foreign Policy under Xi JinpingXiance Thesis-China’s Foreign Policy under Xi Jinping
Xiance Thesis-China’s Foreign Policy under Xi Jinping
 
Изобразительное искусство и музыка в странах Европы и США
Изобразительное искусство и музыка в странах Европы и СШАИзобразительное искусство и музыка в странах Европы и США
Изобразительное искусство и музыка в странах Европы и США
 

Similar a Preparing Data for (Open) Publication

Similar a Preparing Data for (Open) Publication (20)

Cal Poly - Data Management and the DMPTool
Cal Poly - Data Management and the DMPToolCal Poly - Data Management and the DMPTool
Cal Poly - Data Management and the DMPTool
 
DMP health sciences
DMP health sciencesDMP health sciences
DMP health sciences
 
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
 
Creating a Data Management Plan for your Research
Creating a Data Management Plan for your ResearchCreating a Data Management Plan for your Research
Creating a Data Management Plan for your Research
 
Demography pro sem
Demography pro semDemography pro sem
Demography pro sem
 
Guidelines for OSTP Data Access Plans
Guidelines for OSTP Data Access PlansGuidelines for OSTP Data Access Plans
Guidelines for OSTP Data Access Plans
 
Data and Donuts: How to write a data management plan
Data and Donuts: How to write a data management planData and Donuts: How to write a data management plan
Data and Donuts: How to write a data management plan
 
Data Management for librarians
Data Management for librariansData Management for librarians
Data Management for librarians
 
Digital Curation 101 - Taster
Digital Curation 101 - TasterDigital Curation 101 - Taster
Digital Curation 101 - Taster
 
Data Management and Horizon 2020
Data Management and Horizon 2020Data Management and Horizon 2020
Data Management and Horizon 2020
 
Data Management Planning for Engineers
Data Management Planning for EngineersData Management Planning for Engineers
Data Management Planning for Engineers
 
Data Management Planning
Data Management PlanningData Management Planning
Data Management Planning
 
Datat and donuts: how to write a data management plan
Datat and donuts: how to write a data management planDatat and donuts: how to write a data management plan
Datat and donuts: how to write a data management plan
 
Data Management Lab: Data management plan instructions
Data Management Lab: Data management plan instructionsData Management Lab: Data management plan instructions
Data Management Lab: Data management plan instructions
 
Meeting Federal Research Requirements
Meeting Federal Research RequirementsMeeting Federal Research Requirements
Meeting Federal Research Requirements
 
BLC & Digital Science: Mark Hahnel, Figshare
BLC & Digital Science: Mark Hahnel, FigshareBLC & Digital Science: Mark Hahnel, Figshare
BLC & Digital Science: Mark Hahnel, Figshare
 
Workshop - finding and accessing data - Cambridge August 22 2016
Workshop - finding and accessing data - Cambridge August 22 2016Workshop - finding and accessing data - Cambridge August 22 2016
Workshop - finding and accessing data - Cambridge August 22 2016
 
Responsible Conduct of Research: Data Management
Responsible Conduct of Research: Data ManagementResponsible Conduct of Research: Data Management
Responsible Conduct of Research: Data Management
 
From Data Sharing to Data Stewardship
From Data Sharing to Data StewardshipFrom Data Sharing to Data Stewardship
From Data Sharing to Data Stewardship
 
DataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data SharingDataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data Sharing
 

Más de Brian Hole

Más de Brian Hole (20)

For-Profit and Unconditionally Open
For-Profit and Unconditionally OpenFor-Profit and Unconditionally Open
For-Profit and Unconditionally Open
 
Up levy 20181024
Up levy 20181024Up levy 20181024
Up levy 20181024
 
Up lpf 20180523
Up lpf 20180523Up lpf 20180523
Up lpf 20180523
 
Open Scholarship: more important than ever. OA week 2018
Open Scholarship: more important than ever. OA week 2018Open Scholarship: more important than ever. OA week 2018
Open Scholarship: more important than ever. OA week 2018
 
Researcher-led Open Access Publishing
Researcher-led Open Access PublishingResearcher-led Open Access Publishing
Researcher-led Open Access Publishing
 
Developments in Researcher-led, Open Access Publishing
Developments in Researcher-led, Open Access PublishingDevelopments in Researcher-led, Open Access Publishing
Developments in Researcher-led, Open Access Publishing
 
FutureTDM: Increasing Uptake of Text and Data Mining in the EU
FutureTDM: Increasing Uptake of Text and Data Mining in the EUFutureTDM: Increasing Uptake of Text and Data Mining in the EU
FutureTDM: Increasing Uptake of Text and Data Mining in the EU
 
Open Access via Open Source
Open Access via Open SourceOpen Access via Open Source
Open Access via Open Source
 
Ubiquity Press
Ubiquity PressUbiquity Press
Ubiquity Press
 
New models for Open Access Monograph funding
New models for Open Access Monograph fundingNew models for Open Access Monograph funding
New models for Open Access Monograph funding
 
The Growing Role of Libraries in Publishing
The Growing Role of Libraries in PublishingThe Growing Role of Libraries in Publishing
The Growing Role of Libraries in Publishing
 
Revolution by 1000 cuts: University Presses are the Future of Publishing
Revolution by 1000 cuts: University Presses are the Future of PublishingRevolution by 1000 cuts: University Presses are the Future of Publishing
Revolution by 1000 cuts: University Presses are the Future of Publishing
 
Publishing for a truly global research community
Publishing for a truly global research communityPublishing for a truly global research community
Publishing for a truly global research community
 
Open Access Publishing
Open Access PublishingOpen Access Publishing
Open Access Publishing
 
Disrupting Academic Publishing
Disrupting Academic PublishingDisrupting Academic Publishing
Disrupting Academic Publishing
 
Disrupting Academic Publishing
Disrupting Academic PublishingDisrupting Academic Publishing
Disrupting Academic Publishing
 
Innovation in Open Access Monographs, Archives and Journals
Innovation in Open Access Monographs, Archives and JournalsInnovation in Open Access Monographs, Archives and Journals
Innovation in Open Access Monographs, Archives and Journals
 
Emerging models in digital scholarship, research, publication and open science
Emerging models in digital scholarship, research, publication and open scienceEmerging models in digital scholarship, research, publication and open science
Emerging models in digital scholarship, research, publication and open science
 
The Shift to Open Access Publishing
The Shift to Open Access PublishingThe Shift to Open Access Publishing
The Shift to Open Access Publishing
 
Open Science: A New Publisher Perspective
Open Science: A New Publisher PerspectiveOpen Science: A New Publisher Perspective
Open Science: A New Publisher Perspective
 

Último

Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Sérgio Sacani
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Sérgio Sacani
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 

Último (20)

Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 

Preparing Data for (Open) Publication

  • 1. brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress Brian Hole, Founder and CEO ISI CODATA Workshop, Bangalore, 9th March 2015 Preparing data for (open) publication
  • 2. brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress Overview  Think open  Plan early and well  Examples  Follow best practices
  • 5. brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress Plan early & well
  • 6. brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress Data management plans • US: • National Science Foundation (NSF) • National Endowment for the Humanities (NEH) • National Aeronautics and Space Administration (NASA) • National Oceanic and Atmospheric Administration (NOAA) • Institute of Museum and Library Services (IMLS) • Agency for Healthcare Research and Quality (AHRQ) • Gordon & Betty Moore Foundation • Alfred P. Sloan Foundation • UK: Economic and Social Research Council (ESRC) • Encourage that best practices are followed • Provide a structured approach to data throughout its lifecycle • Now mandated by many funders • Europe: Horizon 2020 • Other international mandates: http://www.sherpa.ac.uk
  • 7. brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress DMP structure 5. Storage and backup 6. Selection and preservation 3. Documentation and metadata 1. Administrative data 4. Ethics and legal compliance 2. Data collection 7. Data sharing 8. Responsibilities and resources Source: DCC. (2013). Checklist for a Data Management Plan. v.4.0. Edinburgh: Digital Curation Centre. Available online: http://www.dcc.ac.uk/resources/data-management-plans
  • 8. brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress 1. Administrative data • Basic information e.g. project title, your name, contact details, reference numbers / IDs Here you should record basic information to identify and contextualise your plan. Identifiers may help to link your DMP with information held in other systems. You should include: • A summary of the research to explain the purpose for which data are being collected • Details of related policies and procedures e.g. institutional data policy or departmental guidelines Source: XKCD, http://xkcd.com/97/
  • 9. brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress 2. Data collection • How will you structure and name your folders and files? • What quality assurance processes will you adopt? • What standards or methodologies will you use to create data? Here you should consider what data you will collect and how. • Do your chosen formats and software enable sharing and long-term access to the data? • Are there any existing data that you can reuse? Source: SMBC, http://smbc-comics.com/ index.php?db=comics&id=1849
  • 10. brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress 3. Documentation and metadata • How will you capture / create this documentation and metadata? • What documentation and metadata will accompany the data? Here you should consider what information is needed for the data to be to be read and interpreted in the future. Estimate how much time and effort will be needed to create this supporting documentation and ensure that you allow for sufficient resource. • What metadata standards will you use and why? Source: Gary Larson
  • 11. brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress 4. Ethics and metadata • How will you protect the identity of participants if required? e.g. via anonymisation • Will data sharing be postponed / restricted? e.g. to publish or seek patents Here you should consider any ethical or legal issues, particularly in terms of restrictions they may place on data sharing. • Have you gained consent for data sharing and preservation? • How will the data be licensed for reuse? Source: SMBC, http://smbc-comics.com/ index.php?db=comics&id=1957
  • 12. brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress 5. Storage and backup • What are the risks to data security and how will these be managed? Here you should consider where the data will be stored and any implications this has for backup, access and security. • Who will be responsible for backup and recovery? • Do you have sufficient storage or will you need to include charges for additional services? • How will you ensure that collaborators can access your data securely? Source: SMBC, http://smbc-comics.com/ index.php?db=comics&id=2237
  • 13. brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress 6. Selection and preservation • What are the foreseeable research uses for your data? • Which data should be preserved and potentially shared? • Which data must be retained or destroyed for contractual, legal, or regulatory purposes? Here you should determine which data are of long-term value and should be preserved. Decide how best to preserve those data, for example by depositing in repositories. • What is the long-term preservation plan for the dataset? • Have you costed in the time and effort required to prepare the data for preservation and sharing? Source: XKCD, http://xkcd.com/309/
  • 14. brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress 7. Data sharing • When will you make the data available? • Are any restrictions on data sharing required? Here you should consider which data you will share and how. The methods used will depend on a number of factors such as the type, size, complexity and sensitivity of the data. Also consider how people might acknowledge the reuse of your data (e.g. via citations) so you gain impact. • With whom will you share the data, and under what conditions? • What action will you take to overcome or minimise restrictions? • How will potential users find out about your data? Source: SMBC, http://smbc-comics.com/ index.php?db=comics&id=100
  • 15. brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress 7. Responsibilities and resources • Who is responsible for implementing the DMP, and ensuring it is reviewed and revised? • How will responsibilities be split across partner sites in collaborative research projects? Here you should assign roles and responsibilities for all data management activities. Also carefully consider any resources needed to deliver your plan. These costs can usually be written into grant applications but need to be clearly outlined and justified. • What resources will you require to deliver your plan? • Is additional specialist expertise or equipment required? Source: SMBC, http://smbc-comics.com/ index.php?db=comics&id=1893
  • 16. brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress Online tools • DMPOnline (UK Digital Curation Centre) https://dmponline.dcc.ac.uk/ • DMPTool (California Digital Library) https://dmp.cdlib.org/
  • 18. brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress Repositories Modified from: XKCD
  • 19. brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress GenBank • Two upload tools – Bankit for short sequences, Sequin for complex or multiple sequences • Sequence data uploaded as a FASTA file • Immediate or future release instruction • Citation of a reference paper • Names of source organisms and any related descriptive data • Sequence features (e.g. CDS, gene, rRNA, tRNA, with nucleotide intervals and product names) and topology • Organism name, applicable source modifiers, location • Genus and species names (if not previously provided in FASTA file) • If name is new or unrecognized, provide best known taxonomic lineage • If genus and/or species names are not known, provide most specific name known (for example:Bacillus sp., Uncultured bacterium, Uncultured archaeon) • Most complete name for any synthetic vector (for example: Cloning vector pAB234, Transfer vector p789Abc) • Source modifiers include: strain, clone, isolate, specimen-voucher, isolation- source, country • Location: organelle (mitochondrion, chloroplast, etc); map and/or chromosome
  • 20. brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress ClinicalTrials.gov • Web-based data entry system called the Protocol Registration and Results System (PRS) • Section 801 of the US Food and Drug Administration Amendments Act of 2007 requires clinical trial registration and the submission of results • Standard format • Study Type: ‘Observational’ or ‘Interventional’ • Outcome Measures: The Primary and Secondary Outcome Measure Titles and Descriptions • Outcome Measure Time Frame • Conditions or Focus of the Study • Intervention Information: Each intervention is entered separately using the Intervention Type, Name, and Description data elements • Eligibility: List of key inclusion and exclusion criteria • Locations
  • 21. brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress FlyBase • FlyBase contains a complete annotation of the Drosophila melanogaster genome • It also includes a searchable bibliography of research on Drosophila genetics • Detail which genes feature in your paper, and FB will link your paper to those genes for the next release cycle. • Provide additional information during the submission process about your publication and help the Curators to speed up your curation. • The whole process takes about 5mins!
  • 22. brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress Figshare • Accepts all types and formats of data, no restrictions • Concise metadata, takes around 5 minutes • Multiple licenses, no cost.
  • 23. brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress Dryad Typical process: • Authors submit their manuscripts to the journal for consideration. • Journal provides information about manuscripts to Dryad through automated notices from the manuscript processing system, which creates a provisional Dryad record for the data. • Journal invites authors to archive data in Dryad, through a custom submission link that brings the author to the provisional record. • Authors upload their files to Dryad through the submission link supplied by the journal; no redundant information need be entered and the article details are correct. • Dryad Curators process and approve the data files and register the Digital Object Identifier (DOI), a permanent identifier that allows the data to be cited and tracked; curators convey the DOI to the journal. • Journal and publisher add the Dryad DOI to all forms of the final article, enabling readers of the article to access the data. • Dryad can also provide links to data in other repositories, including sequences in GenBank and phylogenetic trees in TreeBASE. • License: CC0 • Cost: $80 / ₹5,000
  • 25. brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress Nature Scientific Data • Papers are called “data descriptors” • Fill out and submit a paper template • Requires an ISA-tab metadata file • Quality of data a major focus. • CC-BY/NC • APC £890 / ₹84,000
  • 26. brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress GigaScience • Data submitted to public databases, complemented by its citable form in GigaDB • License: CC0 • APC: £650 / ₹61,000
  • 27. brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress Ubiquity Press 1. The paper contents a. The methods section of the paper must provide sufficient detail that a reader can understand how the resource was created. b. The resource must be correctly described. c. The reuse section must provide concrete and useful suggestions for reuse of the reuse. 2. The deposited resource a. The repository must be suitable for resource and have a sustainability model. b. Open license permits unrestricted access (e.g. CC0), or access guaranteed if criteria met (must qualify) c. A version in an open, non-proprietary format. d. Labeled in such a way that a 3rd party can make sense of it. e. Must be actionable.
  • 28. brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress The basics of the model Data papers are short 1) Low barrier data publication Peer review is quick and objective 2) Online authoring Low APC: £100 / ₹1,000 Lower cost (straight to XML) Encourages shorter form 3) Open access only (CC-BY) 4) The publisher is not the repository No-questions-asked waivers
  • 29. brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress follow best practices
  • 30. brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress 11 Best practices for data publication • Record your methodology well – think about reproducibility • Make sure you can export to open formats • Record your selection and QA processes well • Choose appropriate metadata standards, record from the beginning • Ensure you obtain proper consent, and that it allows for open publication if possible • Consider the timing of data publication – e.g. to coincide with research papers • Consider potential reuse scenarios from the start • Choose an appropriate repository • Think about possible restrictions and access conditions early – justify and seek to minimise • Plan to publish with maximum dissemination – data paper? • Allocate time and funding for data publication in grant proposals
  • 31. brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress Any questions? Or please feel free to contact brian.hole@ubiquitypress.com