ICT role in 21st century education and it's challenges.
The Rise of the Data Journal
1. IASSIST, Cologne, May 2013.
Funded by:
This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK: Scotland
License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/2.5/scotland/ ; or,
(b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.
The Rise of the Data Journal
IASSIST, Cologne, Germany, May 31, 2013
Marieke Guy & Monica Duke
DCC, University of Bath
m.guy@ukoln.ac.uk
2. IASSIST, Cologne, May 2013.
Digital Curation Centre (DCC)
• Consortium comprising units from the Universities of Bath
(UKOLN), Edinburgh (DCC Centre) and Glasgow (HATII)
• Launched 1st March 2004 as a national centre for solving
challenges in digital curation that could not be tackled by any
single institution or discipline
• Funded by JISC with additional HEFCE funding from 2011 for
targeted institutional development
• Support selection of tools: DAF, CARDIO, DMP Online, tools
and metadata schema catalogues
• Offer advice and support through ‘How to Guides’, ‘Briefing
papers’ and Web site
3. IASSIST, Cologne, May 2013.
Support from the DCC
•Assess
Needs
•Make the case
•Develop
support
and
services
RDM policy
development
DAF & CARDIO
assessments Guidance and
training
Workflow
assessment
DCC
support
team
Advocacy with senior
management
Institutional
data catalogues
Pilot RDM
tools
Customised Data
Management Plans
•…and support policy implementation
4. IASSIST, Cologne, May 2013.
A history of journals and data
• Data publication in journals is not new
• Earliest academic scientific journal is Journal
des sçavans first published on 5 Jan 1665
• Data usually structured
• Recently data has grown significantly in volume
and more data is digital
• Data published in supplementary materials
• Supplementary files become too big, journals
have begun to stop accepting them e.g. Journal
of Neuroscience.
• Journals become ‘data dumping grounds’
5. IASSIST, Cologne, May 2013.
Changing face of journals
•Image
from
V.Kiemer,
Nature
publishing
group
6. IASSIST, Cologne, May 2013.
Enhanced publications
• A publication enhanced with:
• research data (evidence of the research)
• extra materials (to illustrate or clarify)
• post-publication data (commentaries, ranking) - Driver II
• Extra materials:
• Audio files, illustrative images and video fragments
• GIS or interactive maps
• Models, algorithms
• Metadata sets
• Issues around who creates enhancements, who manages them?
• http://xposre.nl/epfeatures/
7. IASSIST, Cologne, May 2013.
Why publish data?
“Data that underpin a journal article
should be made concurrently available
in an accessible database"
•Science as an open enterprise Report by Royal
Society, June 2012
•Data should be accessible, intelligible,
assessable and usable
8. IASSIST, Cologne, May 2013.
BENEFITS
• Avoid duplication
• Scientific integrity
• More collaboration
• Better research
• More reuse & value
• Increased citation
9-30% increase depending
on e.g. discipline (Piwowar et
al, 2007, 2013
DRIVERS
• Public expectations
• Government agenda
• Funder policy
• Institutional policy
• EU expectations
• Preservation of data
Other reasons…
10. IASSIST, Cologne, May 2013.
Research data vs journal articles
Research data
• Difficult to manage after
funding stops
• Who has it?
• Where is it?
• Who does it belong to?
• How do I make available?
• Where’s the reward?
Journal articles
• Held by libraries
• Well preserved
• Impact monitored
• Easy to find
• Is published
• Promotion and tenure
processes recognise it
•In the past data have been be a “second-
class citizen in the scholarly record”
12. IASSIST, Cologne, May 2013.
What is a data paper/article?
• A paper that describes a data set – usually stored in a repository
• Gives details of the data collection (when, why, how)
• Gives details of processing, software, file formats etc.
• Has a cover sheet and set of links to archived artefacts
• The cover sheet contains familiar elements such at title, date,
authors, abstract, persistent identifiers (DOI, ARK)
• There is no novel analyses or ground breaking conclusions
• Authors could include those involved in data management and
processing
• The data paper/article format is widened out into a data
journal
14. IASSIST, Cologne, May 2013.
Data journal benefits
• Academic credit for data scientists and curators
• Data likely to be uploaded to a trusted repository
• Data available for peer review, integrity of data checked
• Data journals helpful for those wanting to reuse the data
• Use of data journals shows transparency in the process
• Process allows collaboration with others working in data area
• The result is more than just a metadata landing page!
15. IASSIST, Cologne, May 2013.
Data journal challenges
• Linking issues - problems when linking data to the scientific
record e.g. issues with persistence, granularity, attribution
• Validation issues - validation of data sets puts a burden on the
peer review process. Who reviews the data?
• Effort issues - a need to use already submitted metadata, use
tried and tested approaches e.g. DOIs
• Access issues - need trusted repository (?), open access
• Consistency issues - journal workflows vary: ‘engaged submitter’,
‘data dumper’, ‘third party requester’, variations in wording,
approach, across disciplines
• Responsibility issues - Who is responsible for the data? Data
Availability Policy (DAP) Is the data checked?
16. IASSIST, Cologne, May 2013.
Jisc MRD programme projects
• Projects looking at innovative research data publication
• What policies would achieve greater levels of data sharing,
citation and linkages between publications and datasets?
• What partnerships between journals, data centres and
research organisations are necessary?
• How can costs of long term data archiving be met?
• What characterises a suitable repository ?
• What peer review of data is appropriate before publication?
• Projects: Peer REview for Publication & Accreditation of Research
data in the Earth sciences (PREPARDE), Journal Research Data
Policy Bank (JoRD) and Publisher, Repository and Institutional
Metadata Exchange (PRIME)
17. IASSIST, Cologne, May 2013.
PREPARDE project
• 12 month JISC-funded activity, 7 partners from academica,
publishing and library
• Peer REview for Publication & Accreditation of Research data
in the Earth sciences (PREPARDE) project
• Aiming to capture the processes and procedures required to
publish a scientific dataset, ranging from ingestion into a data
repository, through to formal publication in a data journal.
• Looking at key issues arising in the data publication paradigm:
• How does one peer-review a dataset?
• How can datasets and journal publications be cross-linked
for the benefit of the wider research community?
18. IASSIST, Cologne, May 2013.
PREPARDE list of data journals
• Very varied
• Lots of earth science, many
disciplines missing
• Repository criteria vary
• Some hold data set too
• Majority require OA for
article, some for data set
too
•http://proj.badc.rl.ac.uk/preparde/blog/DataJournalsList
19. IASSIST, Cologne, May 2013.
Current data journals
• GigaScience – Biomedcentral - publishes 'big-data' studies from
the entire spectrum of life and biomedical sciences
• Standard manuscript publication linked to a database that hosts
all associated data and provides data analysis tools and cloud-
computing resources
• Journal of Open Archaeology data (JOAD) – Ubiquity press -
features peer reviewed data papers describing archaeology
datasets with high reuse potential
• Work with institutional data repositories to ensure associated
data are professionally archived, preserved, and openly
available
• Geoscience data journal - Wiley-Blackwell
20. IASSIST, Cologne, May 2013.
• Scientific Data - Nature journal - focuses on the life,
biomedical and environmental science communities.
• Launching in Spring 2014, and open for submissions in
Autumn 2013, open-access, online-only publication
Forthcoming data journals
21. IASSIST, Cologne, May 2013.
Journal Research Data Policy Bank
• JoRD conducted a feasibility study into the scope and shape
of a sustainable service to collate and summarise journal
policies on research data (JoRD policy bank service)
• Carried out by Centre for Research Communications Research
at Nottingham University (UK), Research Information Network
and Mark Ware Consulting Ltd.
• Have carried out literature review, study of journal policies
(400 international and national journals)
•“Although idea of making scientific data openly accessible for share
is widely accepted in the scientific community, the practice
confronts serious obstacles. The most immediate of these obstacles
is the lack of a consolidated infrastructure for the easy sharing of
data.” JoRD
22. IASSIST, Cologne, May 2013.
PRIME project
• Publisher, Repository and Institutional Metadata Exchange
(PRIME) aims to enable the automated exchange of metadata
between publishers and repositories
• Partners; UCL, Ubiquity press and Archaeology Data Service
• Building on work of build upon the work of three other Jisc-
funded projects: DryadUK, REWARD, and SWORD-ARM
• Plan to enable the exchange of metadata between UCL
Discovery, the ADS, and JOAD
• Release a metadata schema, open source plugins and case-
studies
23. IASSIST, Cologne, May 2013.
Joint Data Archiving Policy (JDAP)
• JDAP describes a requirement by a journal that supporting
data be publicly available
• Evolved in 2011 field of evolution and has since been adopted
by other journals across various disciplines
• Journals that adopt JDAP often recommend Dryad as a data
repository, however the JDAP initiative is distinct from Dryad
(Dryad uses CC0 public domain dedication)•<< Journal >> requires, as a condition for publication, that data supporting the results in
the paper should be archived in an appropriate public archive, such as << list of approved
archives here <<. Data are important products of the scientific enterprise, and they
should be preserved and usable for decades in the future. Authors may elect to have the
data publicly available at time of publication, or, if the technology of the archive allows,
may opt to embargo access to the data for a period up to a year after publication.
Exceptions may be granted at the discretion of the editor, especially for sensitive
information such as human subject data or the location of endangered species.
24. IASSIST, Cologne, May 2013.
DCC support
• The DCC will continue to support institutions in the data
publication area
• We will do this by:
• Writing briefing papers in this area
• Support for stakeholder engagement e.g. workshops
• Awareness of pros and cons of the different models
• Engaging with institutions on data publication issues
25. IASSIST, Cologne, May 2013.
Final thoughts…
• More journals are encouraging publication of data in some way
• Often data are become the focus of the publication alongside a
supporting narrative
• Is there a role for the data journal when traditional journals also
coming on board? What will be cited?
• Mixed market – lots of different approaches working along
sided each other
• Changing landscape offers opportunities and challenges for the
publisher, author and data manager
• Collaboration & communication the most effective way forward
e.g. The ‘Now and Future of Data Publishing meeting’
26. IASSIST, Cologne, May 2013.
Thanks - any questions?
Acknowledgements:
Thanks to Sarah Callaghan and Angus Whyte (PREPARDE project)
, Brian Hole (Ubiquity Press) & presenters from IDCC data
publishing workshop for help with slides
Notas del editor
Science about reproducibility – if we don ’ t have the data we can ’ t do that Internet allows us to link to things easily Science you want a fixed thing, still have problems when linking data to the scientific record – data persistence, data and metadata quality, attribution and credit for data producers
Maybe aware of history of journals – historically data published in journals – data grown in volume, extent to which digital – less in paper Contained scientific material: obituaries, church history, & legal reports. Philosophical Transactions of the Royal Society first journal in the world exclusively devoted to science ( 6 March 1665) Journals have always published data
Setting the scene
Opporunities for data exchange (ODE) – The Data Publications Pyramid illustrates the most common ways to make data accessible. Research data comes in many different manifestation forms. Publications have always contained data, usually in a very condensed, processed and summarised way via graphs, tables and illustrations. At the other end of the spectrum is raw data and original data sets which too often remain unaccessible on people's computers, hard disks or in drawers. Many authors add their underlying research data in supplements to journal articles. In disciplines with community supported data archives (examples are Genbank, World Protein Database and Pangaea) researchers can deposit their data in a safe and reKiable way and publishers can ensure persistent links between the data and related publications.
Challenge that data journals address Turning supplementary material into something that can be mined
Better example – journal of open psychology data – link with DANS Address issues around fraud in Netherlands Giga science – no article processing costs Rapid peer review Our insistence on fast and thorough peer review enables us to process manuscripts quickly; we aim to reach initial decisions within 6 weeks. Rapid publication Following the acceptance of a manuscript, it is published, with final citation details, as a provisional PDF file with minimal delay (subject to formatting checks, copy-editing and author verification). Fully formatted versions of the article replace the accepted manuscript within 4 weeks. Open access All articles published in GigaScience are open access (freely available on the journal website, with the copyright retained by the author). Research articles are deposited in at least one widely and internationally-recognized open access repository complying with the NIH Public Access Policy and the Wellcome Trust Open Access Policy. To cover the cost of open access publishing we levy an article-processing charge. High visibility within the field Your work is freely accessible to a global audience. In addition, articles are available through INIST in France and in e-Depot, the National Library of the Netherlands' digital archive of all electronic publications. We are also in discussion with other permanent digital archives including the British Library. Permanence All articles published in GigaScience are archived in a number of safe open access archives so permanent accessibility is assured.