Research data management in UK universities: A collaborative venture
1. Rachel Bruce, DeputyChief Innovation Officer, Jisc
Research data management in UK universities:
a collaborative venture
04/06/2015
2. What I will cover
» What is Jisc ?
» Growing pressure of data
» The promise of open data
» Where the UK is on this journey
» What is the role of the library ( & others)
04/06/2015 Congress of University and Research Libraries, Research Data Management in UK Universities, a collaborative venture 2
3. 04/06/2015 Congress of University and Research Libraries, Research Data Management in UK Universities, a collaborative venture 3
Mission
To enable people in higher education,
further education and skills in the UK
to perform at the forefront of
international practice by exploiting
fully the possibilities of modern digital
empowerment, content and
connectivity
Vision
To make the UK the most digitally
advanced education and research
nation in the world
4. Four key pillars to Jisc activities
04/06/2015 Congress of University and Research Libraries, Research Data Management in UK Universities, a collaborative venture 4
5. Growing pressure of data 2009
04/06/2015 Congress of University and Research Libraries, Research Data Management in UK Universities, a collaborative venture 5
research.microsoft.com/en-
us/collaboration/fourthparadigm/4th_paradigm_book_complete_lr.pdf
6. 2010
» Develop an international framework for a
Collaborative Data Infrastructure
» Earmark additional funds for scientific
e-infrastructure
» Develop and use new ways to measure data
value, and reward those who contribute it
» Train a new generation of data scientists, and
broaden public understanding
» Create incentives for green technologies in
the data infrastructure
» Establish a high-level, inter-ministerial group
on a global level to plan for data
infrastructure
04/06/2015 Congress of University and Research Libraries, Research Data Management in UK Universities, a collaborative venture 6
7. Science as Open Enterprise Report, 2012
» ‘how the conduct and communication of
science needs to adapt to this new era of
information technology’.
» ‘As a first step towards this intelligent
openness, data that underpin a journal
article should be made concurrently
available in an accessible database. We are
now on the brink of an achievable aim: for all
science literature to be online, for all of the
data to be online and for the two to be
interoperable.’
» Royal Society June 2012, Science as an Open
Enterprise,
royalsociety.org/policy/projects/science-
public-enterprise/report/
04/06/2015 Congress of University and Research Libraries, Research Data Management in UK Universities, a collaborative venture 7
8. Data for Discovery – RCUK
04/06/2015 Congress of University and Research Libraries, Research Data Management in UK Universities, a collaborative venture 8
9. Definitions
» “Research data is defined as recorded factual material commonly retained by and accepted in
the scientific community as necessary to validate research findings; although the majority of such
data is created in digital format, all research data is included irrespective of the format in which it is
created.” (Epsrc)
» “Research data’ refers to information, in particular facts or numbers, collected to be examined
and considered as a basis for reasoning, discussion or calculation….examples of data include
statistics, results of experiments, measurements, observations resulting from fieldwork, survey
results, interview recordings and images.The focus is on research data that is available in digital
form.” (H2020)
04/06/2015 Congress of University and Research Libraries, Research Data Management in UK Universities, a collaborative venture 9
10. Open Research Data
»Research Integrity
and transparency
»Re-use, new research
and innovation
04/06/2015 Congress of University and Research Libraries, Research Data Management in UK Universities, a collaborative venture 10
11. Research Funder Policies
04/06/2015 Congress of University and Research Libraries, Research Data Management in UK Universities, a collaborative venture 11
» Public good: Publicly funded research data are produced in the public interest should be made
openly available with as few restrictions as possible
» Planning for preservation: Institutional and project specific data management policies and plans
needed to ensure valued data remains usable
» Discovery: Metadata should be available and discoverable; Published results should indicate how to
access supporting data
» Confidentiality: Research organisation policies and practices to ensure legal, ethical and
commercial constraints assessed; research process should not be damaged by inappropriate release
» First use: Provision for a period of exclusive use, to enable research teams to publish results
» Recognition: Data users should acknowledge data sources and terms & conditions of access
» Public funding: Use of public funds for RDM infrastructure is appropriate and must be efficient and
cost-effective
RCUK Common Principles on Data Policy
rcuk.ac.uk/research/datapolicy/
12. Archival data counts
»The number of science
papers written based on
Hubble archival data …has
eclipsed the number of
papers from new
observations
04/06/2015 Congress of University and Research Libraries, Research Data Management in UK Universities, a collaborative venture 12
NASA, ESA and J. Hester (ASU)
13. Open data counts
»Microarray clinical trial
publications examination
of citation – 48% with
publicly available data
received 85% of the
aggregate citations.
04/06/2015 Congress of University and Research Libraries, Research Data Management in UK Universities, a collaborative venture 13
14. DUDs
»The data centre under the
desk (or in a back pack) is
not adequate.
04/06/2015 Congress of University and Research Libraries, Research Data Management in UK Universities, a collaborative venture 14
Not duds
15. On the ground
04/06/2015 Congress of University and Research Libraries, Research Data Management in UK Universities, a collaborative venture 15
Informatics researcher
“ yes I will share my data but
people should register; & why
change to the Open Data
Commons licence I have a
bespoke licence I have always
used”
Engineer
“I have lots of data but
you need a licence to
this bespoke software to
use it”
Bio-scientist
“why would I put my data in
a repository? I share it
informally with my peers,
no-one else would
understand it.“
Philosopher-
“I don’t have data, I
annotate books”
Social Scientist
“I have notes, photos &
video and audio of
subjects it would take
way too long to
anonymise it ”
16. Promoting and Supporting good RDM
04/06/2015 Congress of University and Research Libraries, Research Data Management in UK Universities, a collaborative venture 16
5 Strands
1. Research Data Management Infrastructure (RDMI) Projects
2. Research Data Management Planning (RDMP) Projects
3. Support andTools Projects
4. Citing, Linking, Integrating and Publishing Research Data (CLIP) Projects
5. Research Data Management Training Materials Projects
Jisc first MRD Programme, 2009-2011 (but also 2004 DCC)
17. » Ownership: High level
ownership of the
problem.
» Sustainability:
Develop business
cases to sustain work.
Building Institutional Capacity
Jisc second MRD Programme, 2011-2013
» Encouraged to reuse
outputs from first
programme and
elsewhere.
» Mix of pilot projects
and embedding
projects.
» Holistic institutional
approach to RDM.
04/06/2015 Congress of University and Research Libraries, Research Data Management in UK Universities, a collaborative venture 17
Institutional RDM
Infrastructure
services
17 projects
Data
publication
3 projects
RDM
planning
10 projects
RDM
training
5 projects
18. Research Data Management Case Studies, 2015
04/06/2015 Congress of University and Research Libraries, Research Data Management in UK Universities, a collaborative venture 18
19. A multi-pathway approach to training
04/06/2015 Congress of University and Research Libraries, Research Data Management in UK Universities, a collaborative venture 19
» Module and
course profiles
» Level
» Themes
» Workshops
» Labs
» Online
» Point of need
» Progression pathway
» Modular
» PGR Directors
» Programme and module
leaders
» Supervisors
» PGR & ECR researchers
Content
Mode
People
Time
20. Russell Group (39) Others 10%+ (35) Others (13)
Survey of 61 Universities on RDM
04/06/2015 Congress of University and Research Libraries, Research Data Management in UK Universities, a collaborative venture 20
21. 0 10 20 30 40 50 60 70 80
Most advanced areas
04/06/2015 Congress of University and Research Libraries, Research Data Management in UK Universities, a collaborative venture 21
% indicating piloting or live
RMD skills training
and consultancy
Data management
and sharing plans
Policy development
22. 19 20 21 22 23 24 25 26
Least progress
04/06/2015 Congress of University and Research Libraries, Research Data Management in UK Universities, a collaborative venture 22
% indicating piloting or live
Governance of data
and access reuse
Digital preservation
and continuity
planning
Business planning
and sustainability
23. 71
64
59
0 10 20 30 40 50 60 70 80
Barriers to progress
04/06/2015 Congress of University and Research Libraries, Research Data Management in UK Universities, a collaborative venture 23
% citing
Low priority for
researchers
Availability of
funding
Lack of appropriate
resources and
infrastructure
24. 13/07/2015 Title of presentation (Go to ‘View’ menu > ‘Header and Footer…’ to edit the footers on this slide) 24
25. Data infrastructure for sensitive data
Safe share:
» Encrypted VPN infrastructure between
organisations; enhanced confidentiality and
integrity per ISO27001
» Requirement to move electronic health data
securely and support research collaboration
» Working with biomedical researchers at Farr
Institute, MRC Medical Bioinformatics
initiative, ESRC Administrative Data Centres
04/06/2015 Congress of University and Research Libraries, Research Data Management in UK Universities, a collaborative venture 25
26. Storage # 1
04/06/2015 Congress of University and Research Libraries, Research Data Management in UK Universities, a collaborative venture 26
» Dynamic purchasing system
» Offering file sharing products typified by vendors such as Dropbox
» Enhanced products available offering:
› EEA (European Economic Area) storage - for Data Protection compliance
› user managed encryption - encryption keys are managed by the customer
› integrated federated access - end users can access services using institutional issued credentials
» Box; Microsoft;Q Associates; Capito currently on the system
File sync and share:
27. Storage # 2
04/06/2015 Congress of University and Research Libraries, Research Data Management in UK Universities, a collaborative venture 27
» A single supplier framework with Arkivum
» secure, easy-to-use and cost-effective data archiving service for research and education
› 100% data integrity guarantee
› 2 UK stored data copies accessible online
› 1 UK stored copy held with 3rd party ESCROW data holding company.
› £5m - £100m professional indemnity insurance
› ISO 27001 compliance
› 10 year framework to December 2023
Data archiving:
28. Storage (& preservation) #3
A gap?
» Many different research domain and
institution approaches to storage provision
» Is there a requirement for national scale
storage facilities and services?
» e.g A national large scale, on-net, ‘on-line /
near on-line’, possibly complementing close-
coupled on-site storage, preservation?
» To complement the focus on data standards,
interoperability and discoverability
04/06/2015 Congress of University and Research Libraries, Research Data Management in UK Universities, a collaborative venture 28
29. Research systems
04/06/2015 Congress of University and Research Libraries, Research Data Management in UK Universities, a collaborative venture 29
0 50 100 0 50 100
Research Systems Provided Outputs Registered in CRIS
Institutional Repositories
CRIS
File sync/share
Data registry/catalogue
Archival/preservation
systems
Software repositories
Traditional research outputs
(articles etc.)
Archived data (%)
Active data (%)
30. Research data storage
04/06/2015 Congress of University and Research Libraries, Research Data Management in UK Universities, a collaborative venture 30
0% 50% 100% 150%
<100 TB 100-499 TB 500-999 TB 1PB+
0 50 100 150
Archived Data Active Data
RD storage capacity RD storage provision
Current active
Planned
active (3yrs)
Current
archive
Planned
archive (3yrs)
Centrally
Other
Disciplinary
faculty
Unsure
32. Functions of an Institutional RDM Service
Institutional coordination and partnerships
1. Requirements
2. Planning
3. Informatics
4. Citation
5. Training
6. Licensing
7. Appraisal
8. Storage
9. Access
10. Impact
04/06/2015 Congress of University and Research Libraries, Research Data Management in UK Universities, a collaborative venture 32
Liz Lyon, ‘The InformaticsTransform: Re-Engineering Libraries for the Data Decade’, International Journal of DigitalCuration (2012), 7(1),
126–138; dx.doi.org/10.2218/ijdc.v7i1.220
33. Research Data Ecology
04/06/2015 Congress of University and Research Libraries, Research Data Management in UK Universities, a collaborative venture 33
Professor
Geoffrey Boulton
34. And we need to go further
“Science is broken” Examples:
» psychology academics making up data,
» anaesthesiologistYoshitaka Fujii with 172 faked articles
» Nature - rise in biomedical retraction rates overtakes rise in
published papers
04/06/2015 Congress of University and Research Libraries, Research Data Management in UK Universities, a collaborative venture 34
“economists have been astonished to find that a famous
academic paper often used to make the case for austerity
cuts contains major errors. Another surprise is that the
mistakes, by two eminent Harvard professors, were
spotted by a student doing his homework”
bbc.co.uk/news/magazine-22223190
35. 04/06/2015 Congress of University and Research Libraries, Research Data Management in UK Universities, a collaborative venture 35
36. 04/06/2015 Congress of University and Research Libraries, Research Data Management in UK Universities, a collaborative venture 36
37. A journey together
04/06/2015 Congress of University and Research Libraries, Research Data Management in UK Universities, a collaborative venture 37
38. Find out more…
04/06/2015 Congress of University and Research Libraries, Research Data Management in UK Universities, a collaborative venture 38
Thank you!
Rachel Bruce
Email: email@jisc.ac.uk
Twitter: jisc.ac.uk
Blog: researchdata.jiscinvolve.org/wp/
Except where otherwise noted, this
work is licensed under CC-BY-NC-ND
Notas del editor
Digital content: We provide institutions, their students and researchers secure, cost-effective access to the UK’s richest collection of digital resources.
Network and IT: We provide the HE sector with Janet, the Jisc network, advanced infrastructure services and collaboration platforms.
Advice: We listen to our members to ensure we offer them quality support, guidance and tools - estimated to save the sector £122 million in cost efficiencies each year.
R&D: Identifying emerging technologies and developing them around our members’ needs. Testing and learning on their behalf to ensure they are ready to take advantage
Published 2009
Swan (Omega) Nebula
Resembling the fury of a raging sea, this 13th anniversary Hubble image actually shows a bubbly ocean of glowing hydrogen gas and small amounts of other elements such as oxygen and sulfur. The photograph captures a small region within M17, also known as the Omega or Swan Nebula, a hotbed of star formation
Since the earliest days of astronomy, since the time of Galileo, astronomers have shared a single goal — to see more, see farther, see deeper.
The Hubble Space Telescope's launch in 1990 sped humanity to one of its greatest advances in that journey. Hubble is a telescope that orbits Earth. Its position above the atmosphere, which distorts and blocks the light that reaches our planet, gives it a view of the universe that typically far surpasses that of ground-based telescopes.
Hubble is one of NASA's most successful and long-lasting science missions. It has beamed hundreds of thousands of images back to Earth, shedding light on many of the great mysteries of astronomy. Its gaze has helped determine the age of the universe, the identity of quasars, and the existence of dark energy.
access the sensitive and potentially disclosive data from the Community Innovation Survey (in a dataset known as the UK Innovation Survey) remotely via a secure server. The researchers developed the hypothesis that knowledgeable individuals (such as inventors) with the flexibility to change geographical location, would positively impact the innovative behaviour of firms in the areas they moved to. The empirical results demonstrated that the relocation of these individuals does not have a direct impact on the firms' innovation: a positive effect emerges only after controlling for the capability of local firms to exploit external sources of information (such as suppliers, clients, customers, competitors, other businesses in the industry, consultants, commercial labs and private R&D institutes). Only those firms who complemented their internal knowledge with external sources were able to benefit from the arrival of highly-skilled individuals into the local labour market, improving their innovative performance.
Hubble Telescope
"The number of science papers written based on Hubble archival data has increased to the point where it has eclipsed the number of papers resulting from new observations."
Hubble Racks up 10,000 Science Papers (Dec 2011) http://hubblesite.org/newscenter/archive/releases/2011/40/image/a/
Astronomers Find Elusive Planets in Decade-Old Hubble Data
http://www.nasa.gov/mission_pages/hubble/science/elusive-planets.html
The number of science papers written based on Hubble archival data has increased to the point where it has eclipsed the number of papers resulting from new observations. Hubble's archive contains data from over 1 million exposures. This astronomical treasure trove will serve as a key "data mine" serving generations of astronomers for decades to come, long after Hubble has stopped operations. The first science paper from a Hubble observation was submitted on October 1, 1990, by Tod Lauer of the National Optical Astronomy Observatory in Tucson, Ariz. This paper reported observations of the environment around a suspected black hole in the core of galaxy NGC 7457. Data from Hubble's longest operating camera, the Wide Field Planetary Camera 2 (which was active from 1994 to 2009), was used for nearly half of the papers. The next most highly ranking instrument is the Advanced Camera for Surveys, which was installed in 2002 and is still operating. This is followed by three other top-ranking instruments: the Space Telescope Imaging Spectrograph, the Near Infrared and Multi-Object Spectrograph, and the Faint Object Spectrograph.
Sharing Detailed Research Data Is Associated with Increased Citation Rate
http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0000308
Sharing research data provides benefit to the general scientific community, but the benefit is less obvious for the investigator who makes his or her data available. They examined the citation history of 85 cancer microarray clinical trial publications with respect to the availability of their data. The 48% of trials with publicly available microarray data received 85% of the aggregate citations. Publicly available data was significantly associated with a 69% increase in citations, independently of journal impact factor, date of publication, and author country of origin using linear regression. This correlation between publicly available data and increased literature impact may further motivate investigators to share their detailed research data.
http://blogs.bournemouth.ac.uk/research/files/2014/05/bibliometrics.jpg
“People will ask questions”
So use a data centre or repository
“It will be misinterpreted”
Stuff happens. Also, openness encourages correction
“It’s not interesting”
Let others be the judge – your noise is my signal
“I might get another paper out of it”
Up to a point. We might get more research out of it
“I don’t have permission”
A real problem. But solvable at senior level
“It’s too bad/complicated” –see above
“It’s not a priority”
Unfortunately, funders are making it so. But if you looked at the evidence, it would be your priority as well
http://datapub.cdlib.org/2013/04/24/closed-data-excuses-excuses/
With thanks to Kevin Ashley (DCC) for responses
Talk about the overall benefits.
Survey of universities in 2014, the results were collated in April 2014. Aimed to give a national picture of UK university progress, and to identify gaps.
Targeted VC for Research, Heads of IT, Libraries, Research Support in universities where research income was 10 % of the overall income or above.
It also told us that RDM was mainly led via the Library but there were significant roles for IT & Research Offices & it is a truly cross HEI endeavour.
That researchers are involved in the high level oversight groups and consulted – for both 75%. But it is a low priority for researchers.
That there is a contrast in storage provision – around a third provide a quota to all active research staff, and around third haven’t settled this yet and it is still being debated. That half of the respondents were not clear whether they would be providing free at the point of use storage to researchers by May 2015. By May 2015 HEIs expected between 3- 5 FTEs dedicated to RDM across many areas – library, IT, faculties.
Mission: Realising a robust and sustainable research data management infrastructure and services to enrich UK research
With a focus on: open agenda, collaboration and internationalisation, digital standards and policies
“File sync and share” is shorthand for sharing files among multiple users and devices, and synchronizing the shared files to retain file integrity. The process is very familiar to most users as a consumer-level file sharing application, typified by vendors like Dropbox.
However, consumer-level file sharing is not nearly as popular with enterprise IT. The lack of IT management is a big issue: IT knows that it should establish control over corporate data that is flying around the globe on personal devices. They need to be able to supervise data retention, versions, security, controlled access, and policies to administer the management environment. Consumer-level file sharing does not support centralized control over distributed file sharing.
The lack of security and compliance with consumer-grade services is another legitimate issue. When files are stored in the cloud, IT’s security concerns ratchet up. And when business data is stored on thousands of mobile devices – you do the math. Consumer-grade file sharing products have some security but usually lack strong encryption and user access control. They also lack the ability for IT to set and enforce security policies for shared files.
Data is the universal
scientific material
Software is the universal
scientific research instrument
And a material. And a method.
The Lab and the eLab, together
Software is an instrument, a material, a method.
Materials
(data)
+
Methods
(software)
Data is the universal
scientific material
Software is the universal
scientific research instrument
And a material. And a method.
The Lab and the eLab, together
Software is an instrument, a material, a method.
Materials
(data)
+
Methods
(software)
Go to ‘View’ menu > ‘Header and Footer…’ to edit the footers on this slide (click ‘Apply’ to change only the currently selected slide, or ‘Apply to All’ to change the footers on all slides).