1. Natasha Simons
What’s coming next? The future of
research data management
Australian National Data Service
iSchools Data Science Winter Institute
Hong Kong, 7-8 December 2017
5. NCRIS
• National Collaborative Research Infrastructure
Strategy (NCRIS)
• Australian government program
• Drives research excellence and collaboration
between researchers, government and industry
to deliver practical outcomes
• Funds research infrastructure projects
including ANDS, Nectar and RDS
• 2016 National Research Infrastructure
Roadmap outlines Australian research
infrastructure required over next decade
6. ANDS/Nectar/RDS
Aligned set of joint investments to deliver four key
transformations in the research sector:
1. A world leading data advantage
2. Accelerated innovation
3. Collaboration for borderless research
4. Enhanced translation of research
7. Our approach
Building on and leveraging previous investments and
relationships:
1. Research domain program
2. Research data platforms
3. Sector-wide support and engagement
8. What is the
future of
Research Data
Management?
Photo by Michal Lomza on Unsplash
9. Trend #1 Data policies
Funder data sharing policies are on the rise.
Examples:
Data sharing is essential for expedited translation
of research results into knowledge, products and
procedures to improve human health….[and it]
should be made as widely and freely available
as possible... - National Institutes of Health USA
(1)
Publicly funded research data are a public
good...which should be made openly available
with as few restrictions as possible in a timely
and responsible manner - Research Councils UK
(2)
(1) National Institutes of Health. 2003. Data Sharing
Policy and Implementation Guidance.
(2) Research Councils UK. 2011 (revised 2015). RCUK
Common Principles on Data Policy.
Photo by Christine Roy on Unsplash
10. Trend #1 Data policies
Research data principle: as open as possible, as closed as necessary - European Commission
Horizon 2020 Guidelines (1)
We expect our researchers to maximise the availability of research data, software and materials
with as few restrictions as possible - Wellcome Trust (2)
The ARC is committed to maximising the benefits from ARC-funded research, including by
ensuring greater access to research data. Since 2007, the ARC has encouraged researchers to
deposit data arising from research projects in publicly accessible repositories. The ARC’s
position reflects an increased focus in Australian and international research policy and
practice on open access to data generated through publicly funded research. - Australian
Research Council (3)
(1) National Institutes of Health. 2003. Data Sharing Policy and Implementation Guidance.
(2) Research Councils UK. 2011 (revised 2015). RCUK Common Principles on Data Policy.
(3) Australian Research Council. http://www.arc.gov.au/research-data-management.
11. Trend #1 Data policiesGovernment open data policies are on the rise.
Examples:
Newly-generated [USA] government data is required to
be made available in open, machine-readable formats,
while continuing to ensure privacy and security (1)
All EU institutions are invited to make their data publicly
available whenever possible (2)
The Japanese government is promoting the Open Data
initiative, in which the government widely discloses public
data (3)
(1) USA Federal Government. 2013. Memorandum - Open Data
Policy - Managing Information as an Asset.
(2) European Union. Open Data Portal - About.
(3) Japan Open Data Initiative
12. Trend #1 Data policies
The Australian Government Open Data Declaration is
about making more government information available to
the public online (4)
The public sector information portal of the Government
of the Hong Kong Special Administrative Region with
datasets from different government departments and
public/private organisations (5)
Keywords: transparency, openness, return on
investment, economy, industry/government/research
collaborations, innovation
(1) Australian Government. 2010. Declaration of Open
Government.
(2) Hong Kong Open Data Portal. data.gov.hk.
13. Trend #1 Data policies
Publisher/Journal data policies and initiatives are on the
rise. Examples:
PLOS journals require authors to make all data underlying
the findings described in their manuscript fully available
without restriction, with rare exception - PLOS (1)
A condition of publication in a Nature Research journal
is that authors are required to make materials, data, code,
and associated protocols promptly available to readers
without undue qualifications. - Nature (2)
(1) PLOS. Data availability.
(2) Nature. Availability of data, materials and methods.
14. Trend #1 Data policiesPublisher signed statement examples:
The ultimate measure of success is in the replicability of
science, generation of new discoveries, and in progress
on the grand challenges facing society that depend on
the integration of open data, tools, and models from
multiple sources. This statement of commitment signals
important progress and a continuing commitment by
publishers and data facilities to enable open data in
the Earth and space sciences - COPDESS (1)
Transparency, open sharing, and reproducibility are
core values of science. Over 5,000 journals and
organizations have already become signatories of the
TOP Guidelines. - TOP Guidelines (2)
(1) COPDESS. Statement of Commitment.
(2) Centre for Open Science. Transparency and Openness
Guidelines.
Photo by Drew Hays on Unsplash
15. Policy challenges
● Existence of data policy e.g. the higher the Impact Factor of the journal the
more likely they are to have a data availability policy and to enforce it (1)
● Data policies vary widely: content; discoverability; ease of interpretation;
infrastructure providers; support for compliance (2)
● Most journal data sharing policies do not provide specific guidance on the
practices that ensure data is maximally available and reusable (3)
(1) Piwowar, HA and Chapman, WW (2010) Public sharing of research datasets: A pilot study of associations. Journal of Informetrics, 4 (2).
148 - 156. ISSN 1751-1577
(2) Naughton, L. & Kernohan, D., (2016). Making sense of journal research data policies. Insights. 29(1), pp.84–89. DOI:
http://doi.org/10.1629/uksg.284
(3) Vasilevsky NA, Minnier J, Haendel MA, Champieux RE. (2017) Reproducible and reusable research: are journal data sharing policies
meeting the mark? PeerJ 5:e3208https://doi.org/10.7717/peerj.3208
16. Policy challenges
● Data availability declines over time (1)
● The most effective journal data policies mandate data sharing in a repository
and a data availability statement with a link to the data (2)
● Data availability from authors on request has been found wanting in several
studies/case studies (3-5)
● The introduction of a data availability policy can polarize the research
community e.g. PLOS, ICMJE
(1) Vines et al. (2013) Current Biology. DOI: http://dx.doi.org/10.1016/j.cub.2013.11.014
(2) Vines, et al. (2013) FASEB J doi: 10.1096/fj.12-218164
(3) Systematic Reviews 2014, 3:97 doi:10.1186/2046-4053-3-97
(4) American Psychologist, Vol 61(7), Oct 2006, 726-728. doi:10.1037/0003-066X.61.7.726
(5) 1.PLoS ONE 4(9): e7078. doi:10.1371/journal.pone.0007078
Thanks to Iain Hyrnaszkiewicz, Springer Nature, for dot points 1-3 above.
17. Trend #2 Data sharing
Figshare open data survey 2017:
● 82% aware of open data sets
● 80% willing to reuse open data sets in own research
● 60% routinely share their data (frequently or
sometimes)
● 21% have never made a data set openly available
● 74% are now curating their data for sharing
● 77% value a data citation the same as an article
Science, Digital (2017): The State of Open Data 2017 Report -
Infographic.
figshare.https://doi.org/10.6084/m9.figshare.5519155.v1 pp. 7-
11
18. Trend #2 Data sharing
We can see strong signals that
open data is becoming more
embedded [but] there is still a lack
of confidence around open data.
Figshare open data survey 2017
19. Trend #2 Data sharing
A 2011 study of 500 papers that were published in 2009 from 50 top-ranked
research journals showed that only 47 papers (9%) of those reviewed had
deposited full primary raw data online.
As another study notes, the number of datasets being shared annually has
increased by more than 400% from 2011 to 2015, and this pace will likely
continue.
What Constitutes Peer Review of Data? A Survey of Peer Review Guidelines by Todd A. Carpenter. Scholarly
Kitchen blog post 11 April 2017. https://scholarlykitchen.sspnet.org/2017/04/11/what-constitutes-peer-review-
research-data/
20. Trend #2 Data sharing
More than two thirds of Wiley
researchers reported they are now
sharing their data. Though this
varies geographically and across
research disciplines we are seeing
that more researchers are sharing
their data and taking efforts to make
it reproducible.
Wiley Global Data Sharing
Infographic June 2017.
https://authorservices.wiley.com/aut
hor-resources/Journal-
Authors/licensing-open-
access/open-access/data-
sharing.html
21. Data sharing challenges
Lack of understanding of the open/shared/closed model.
Lack of skills/understanding about how to share sensitive data.
Still too few “rewards” for data sharing.
Researchers may lack skills needed to manage and share data.
Wiley survey - Top 4 reasons why researchers
are hesitant to share their data:
● 50% Intellectual Property or confidentiality
issues
● 31% Ethics concerns
● 23% Concerns about misinterpretation or
misuse of my research
● 22% Concerns that my research will be
scooped
Photo by rawpixel.com on Unsplash
22. Trend #3 Connected research/data
Connected research (researchers, research organisations, publications, data, grants,
software, methods and more) is important:
● for better discovery of research (data)
● to assist the ability to reproduce research
● to research transparency
● to aid attribution and credit
● to track use and impact
Persistent Identifiers (PIDs) and global standards play a key role in connecting research.
23. Looks something like this..
Research Graph is an open collaborative project that builds the capability for connecting researchers, publications,
research grants and research datasets (data in research).
http://researchgraph.org/
24. Trend #3 Connected research/data
Examples of progress:
The ability to access and review the data behind research is a well sought after, but
often elusive, resource. In recognition of this, Scopus has been working to incorporate
new tools that can make it easier to search and share data - Scopus makes strides
in data linking
Major publishers have committed to requiring ORCID iDs in the publishing
process for their journals and invite other publishers to do the same - Requiring
ORCID in Publication Workflows: Open Letter
Approximately 148 million DOIs have been assigned [to publications, data,
software and more] through a federation of Registration Agencies world-wide -
Frequently asked questions about the DOI system
25. Connected data challenges
Photo by William Bout on Unsplash
● Raise PiD adoption levels
e.g. THOR Project
● ORCIDs - need to be
populated and used
● Increasing PiDs in research
workflows
● Need standard ways to
exchange information e.g.
Scholix initiative to link data
and publications
● Data Citation practice
challenges
26. Trend #4 Data reuse
There is a push for reusable research data. Examples:
Why enable reuse? The UK Data Archive provides many reasons, including: encouraging
scientific enquiry and debate; promoting innovation and potential new data uses.
2013 study: “We further conclude that, at least for gene expression microarray data, a substantial
fraction of archived datasets are reused, and that the intensity of dataset reuse has been
steadily increasing since 2003” - Piwowar HA, Vision TJ. (2013) Data reuse and the open data
citation advantage. PeerJ1:e175 https://doi.org/10.7717/peerj.175.
Early 2017: Springer Nature responded to the US National Institutes of Health’s request for
information on Strategies for NIH Data Management, Sharing, and Citation. They made a number
of recommendations to the NIH, and funding organisations, including: Encouraging researchers
to share and describe datasets in a way that facilitates reuse and reproducibility.
27. Data reuse challenges
“87% of researchers don’t know what licence to apply to their data” - Daniel
Hook, CEO Digital Science, 3/1//17
There is a quality issue: (a) sharing data is necessary but not sufficient for future
reuse, (b) ensuring that data is “independently understandable” is crucial, and (c)
incorporating a data review process is feasible - Peer et al. Committing to a Data
Quality review. IDCC14 Practice Paper.
Other issues: Geographic differences and differences across age groups: younger
respondents feel more favorably toward data sharing and reuse, yet make less of
their data available than older respondents - Tenopir et al. Changes in Data Sharing
and Data Reuse Practices and Perceptions among Scientists Worldwide
28. So the future of RDM is...FAIR Data
#1 Data policies help support Findable data
#2 Data sharing helps create Accessible data
#3 Connected research/data is part of Interoperable data
#4 Data reuse is enabled by Reusable data
FORCE11 Fair Data Principles
By SangyaPundir (Own work) [CC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0)], via Wikimedia Commons
29. FAIR Data...
● Requires good data
management across the whole
lifecycle.
● Requires many stakeholders to
work together
iProfessionals have:
● a challenge
● an opportunity
● an incredible amount of skills
and knowledge to contribute!
ANDS FAIR Data flyer
30. Research lifecycle - traditional
University of Bournemouth - http://blogs.bournemouth.ac.uk/research/tag/rkeo/
DATA
31. Research lifecycle - data infused
Find data
Plan to manage data
Publish data
Collect, store, analyse, visualise data
Cite data
32. The future is an opportunity
"The challenge of the
unknown future is so
much more exciting
than the stories of the
accomplished past."
- Simon Sinek
Photo by Warren Wong on Unsplash
33. With the exception of third party images or where otherwise indicated, this work is licensed under the Creative
Commons 4.0 International Attribution Licence.
ANDS, Nectar and RDS are supported by the Australian Government through the National Collaborative Research
Infrastructure Strategy Program (NCRIS).
Natasha.simons@ands.org.au
orcid.org/0000-0003-0635-1998
@n_simons
Natasha Simons