This slideshow was used in an Introduction to Research Data Management course taught for the Medical Sciences Division, University of Oxford, on 2014-03-03. It provides an overview of some key issues, looking at both day-to-day data management, and longer term issues, including sharing, and curation.
Introduction to Research Data Management - 2014-03-03 - Medical Sciences Division, University of Oxford
1. Introduction to research
data management
Slides provided by the Research Support
Team, IT Services, University of Oxford
2. WHAT IS RESEARCH DATA
MANAGEMENT?
Introduction to research
data management
3. What is data?
“A reinterpretable representation of information in a formalized
manner suitable for communication, interpretation, or processing.”
Digital Curation Centre
Slide adapted from
the PrePARe Project
Introduction to research
data management
4. What is data?
Any information you use in your
research
Slide adapted from
the PrePARe Project
Introduction to research
data management
5. Introductions
What sort of data do you use?
Where does it come from?
Are
you creating new data?
Are
you working with pre-existing data?
Where is your data stored?
Introduction to research
data management
6. What is data management?
A general term covering how you organize,
structure, store, and care for the information
used or generated during a research project
How
you deal with information on a day-today basis over the lifetime of a project
happens to data in the longer term –
what you do with it after the project
concludes
What
Introduction to research
data management
7. Carrots and sticks
Work efficiently and
University of Oxford
with minimum hassle
Policy on the
now
Management of
Research Data and
Save time and avoid
problems in the future Records
Make it easy to share Funding body
requirements
your data
Introduction to research
data management
8. University of Oxford policy
Introduced July 2012
Introduction to research
data management
9. University of Oxford policy
The full policy can be viewed on the University of
Oxford Research Data Management website
Research data defined as the information needed „to
support or validate a research project‟s observations,
findings or outputs‟
Research data should be:
Accurate, complete, identifiable, retrievable, and
securely stored
Able to be made available to others
Introduction to research
data management
10. University of Oxford policy
Research data should be retained for „as long as they
are of continuing value to the researcher and the wider
research community‟ – but a minimum of three years
Specific requirements from funders take precedence
Researchers are responsible for:
Planning for the ongoing custodianship of their data
Developing and documenting clear data management procedures
Ensuring that legal, ethical, and funding body requirements are met
Policy applies to University staff and doctoral students
Depositing relevant research data may ultimately become a condition
of award for doctorates
Introduction to research
data management
11. Funders‟ requirements
Funding bodies are taking an increasing
interest in what happens to research data
You may be required to make your data
publicly available at the end of a project
Check
the small print in your grant conditions
Many funders require a data management plan
as part of grant applications
Oxford‟s RDM website provides a summary of
requirements
Introduction to research
data management
13. Can you find what you
need, when you need it?
„What a mess‟ by .pst, via Flickr: http://www.flickr.com/photos/psteichen/3915657914/.
Introduction to research
data management
14. Hierarchical systems vs. tagging
Hierarchical organization uses nested folders
Default option for most operating systems
Tagging allows more flexibility
Some operating systems support tagging
Items can be in multiple categories
File tagging software is also
available
Sort… or search?
Introduction to research
data management
15. Adding tags in Windows 7
Introduction to research
data management
16. Hyperlinks and shortcuts
Hyperlinks are not just for websites –
they can also lead to other files on your
computer
Use shortcuts to avoid duplicating files
Create
project folders as an easy way to
access related material
Introduction to research
data management
17. File naming
Aim for concise but informative names
Ideally,
you should be able to tell what‟s in a file
without opening it
Think about the ordering of elements within a
filename
YYYY-MM-DD
dates allow chronological sorting
You
can force an order by adding a number at
the beginning of the name
Consider including version information
Introduction to research
data management
18. File naming strategies – examples
Order by date:
Order by type:
2013-04-12_analysis_ASPH.xlsx
2013-04-12_raw-data_ASPH.txt
Analysis_JARID1A_2013-04-12.xlsx
2012-12-15_analysis_JARID1A.xlsx
Raw-data_ASPH_2012-12-15.txt
2012-12-15_raw-data_JARID1A.txt
Analysis_ASPH_2012-12-15.xlsx
Raw-data_JARID1A_2013-04-12.txt
Order by subject:
Forced order with numbering:
ASPH_analysis_2012-12-15.xlsx
01_JARID1A_raw-data_2013-04-12.txt
ASPH_raw-data_2012-12-15.txt
02_JARID1A_analysis_2013-04-12.xlsx
JARID1A_analysis_2013-04-12.xlsx
03_ASPH_raw-data_2012-12-15.txt
JARID1A_raw-data_2013-04-12.txt
04_ASPH_analysis_2012-12-15.xlsx
Introduction to research
data management
19. File naming strategies – examples
In retrospect I am not very happy with the method I
used for naming files. The biggest problem was with
the newspaper articles I downloaded… I named the
files only based on the topic of the article, without
mentioning the name of the periodical and the year
of publication, which would have been very useful
later, when I began writing the thesis.
– Doctoral student researching communication history
Introduction to research
data management
20. Are you using the right tools for the job?
Take time to assess whether your current
software and methods are meeting your needs
Sticking with old familiars can be false
economy
Ask friends and colleagues for
recommendations
Introduction to research
data management
21. Research Skills Toolkit
Website and handson workshops
A guide to software,
University services,
and other tools and
resources for
research
Requires SSO login
http://www.skillstoolkit.ox.ac.uk/
Introduction to research
data management
22. IT Learning Programme
Over 200 different IT
courses
Covering software, skills,
and new technologies
http://www.oucs.ox.ac.uk/itlp/
ITLP Portfolio offers
course materials and
other resources
http://portfolio.it.ox.ac.uk/
Introduction to research
data management
23. ORDS – Online Research Database
Service
Specifically designed for academic research data
Cloud-hosted and automatically backed up
Web interface makes collaboration straightforward
If desired, databases can easily be made public
Designed to permit easy archiving
Currently being used by a small group of test users –
will become more widely available
later in 2014
http://ords.ox.ac.uk/
Introduction to research
data management
25. Backing up is
easier than
replacing
lost data…
http://blogs.ch.cam.ac.uk/pmr/2011/08/01/why-you-need-a-data-management-plan/
Slide adapted from
the PrePARe Project
Introduction to research
data management
26. Make multiple copies…
…and keep them in different places
Automate the
process if you can
Slide adapted from
the PrePARe Project
Introduction to research
data management
27. Example back-up plan
Raw data from instruments are stored on the
instrument PC, which is backed up every couple of
months to DVDs
Much raw data also transferred to desktop computers –
usually stored on external hard drives
Analysed data (e.g. Excel spreadsheets and
PowerPoint files) are stored in a shared folder on a
departmental server which is backed up daily
Lab books are stored inside the laboratory in locked
cupboards
Introduction to research
data management
28. IT Services: Data Back-up on the HFS
HFS is Oxford‟s central back-up and archiving
service
Free of charge to University staff and
postgraduates
Automated back-ups of machines connected to
University network
Copies kept in multiple places
Introduction to research
data management
29. Data security
If you‟re working with sensitive data, it‟s
essential to ensure that every copy kept has
appropriate security
InfoSec at IT Services can provide advice –
see http://www.it.ox.ac.uk/infosec/ for more
details
Introduction to research
data management
30. Think about your storage media…
… and about file formats
Slide adapted from
the PrePARe Project
Introduction to research
data management
32. Documentation and metadata
Documentation is the contextual information
required to make data intelligible and aid
interpretation
A
users‟ guide to your data
May
be given at study level or data level
Metadata is similar, but usually more structured
Conforms
Machine
to set standards
readable
Introduction to research
data management
33. Make material understandable
What‟s obvious
now might not
be in a few
months, years,
decades…
MAKE SURE
YOU CAN
UNDERSTAND
IT LATER
Adapted from „Clay Tablets with Linear B Script‟ by Dennis, via Flickr: http://www.flickr.com/photos/archer10/5692813531/
Slide adapted from
the PrePARe Project
Introduction to research
data management
34. Make material verifiable
Image by woodleywonderworks , via Flickr:
http://www.flickr.com/photos/wwworks/4588700881/
• Detailing your methods
helps people
understand what you
did
• Reduces risk of
misinterpretation
• Helps make your work
reproducible
• Conclusions can be
verified
Slide adapted from
the PrePARe Project
Introduction to research
data management
36. Exercise
In small groups, look at the sample data sheet
Imagine you have just downloaded this dataset from an
archive
What contextual or explanatory information is missing?
What additional documentation would you like to see
supplied
At the data level?
At the study level?
Introduction to research
data management
37. Documentation – what to include
• Who created it, when and why
•
•
•
•
Description of the item
Methodology and methods
Units of measurement
Definitions of jargon,
acronyms and code
• References to related data
Slide adapted from
the PrePARe Project
Introduction to research
data management
38. Metadata – data about data
A formal,
structured
description
of a dataset
Used by
archives
to create
catalogue
records
Introduction to research
data management
39. ISA tools software suite
Open source
metadata
tracking tools
for the life
sciences
http://isa-tools.org/
Introduction to research
data management
40. Missing metadata – or the riddle of the
sixth toe
This painting shows
Georgiana, Duchess of
Devonshire as Diana
… or maybe Cynthia
She has six toes – but
no one knows why
Public domain image from Wikimedia Commons:
http://commons.wikimedia.org/wiki/File:Georgiana_Cavendish,_Duchess_of_Devonshire_as_Diana.jpg
Introduction to research
data management
41. For discussion
What data management challenges have you
encountered?
What strategies have you personally found
useful?
Be ready to feed back to the group
Introduction to research
data management
42. WHAT HAPPENS AT THE END
OF THE PROJECT?
Introduction to research
data management
43. Data archiving
Data generated during a research project is
valuable
Don‟t leave it languishing on your hard drive
Consider depositing it in an archive or repository
A
number of national disciplinary archives exist
DataBib
Oxford
provides a catalogue: http://databib.org/
will soon have its own data archive
If possible, make it available for others to re-use
Introduction to research
data management
44. Why share data? Reputation
Get credit for high quality
research
Recognition for contribution
to research community
Open data leads to increased
citations
Of
the data itself
Of
associated papers
Slide adapted from
the PrePARe Project
Introduction to research
data management
45. Why share data? Reuse
Reduces duplication of
effort
Allows public research
funding to be used more
effectively
Extend research beyond
your discipline
Perhaps into contexts not
currently envisaged
Slide adapted from
the PrePARe Project
Introduction to research
data management
46. Why share data? Be a trailblazer!
A paradigm shift in how research outputs are
viewed is occurring
Data outputs are of increasing importance –
and are likely to become even more so
Major journals are increasingly
looking to publish datasets
alongside articles
Be at the forefront of an
important shift in the
academic world
Introduction to research
data management
47. Figshare
Free online data sharing platform
Shared research is allocated a DataCite DOI
A possible alternative to conventional repositories
If no suitable
repository is
available
If you need a
data sharing
solution in a
hurry
Introduction to research
data management
48. Video by NYU Health Sciences Libraries: http://www.youtube.com/watch?v=N2zK3sAtr-4
Introduction to research
data management
49. Data sharing – concerns
Ethical concerns
Confidential
Legal concerns
Third
or sensitive data
party data
Professional concerns
Intended
publication
Commercial
issues (e.g. patent protection)
Introduction to research
data management
50. Plan with sharing in mind from the
beginning
Appropriate consent from human subjects
Distinguish third party and new data
Introduction to research
data management
51. Share – but maybe not everything
• Redact or embargo if there is good reason
Slide adapted from
the PrePARe Project
Introduction to research
data management
52. Research Integrity courses on WebLearn
https://weblearn.ox.ac.uk/portal/hierarchy/skills/ricourses
Introduction to research
data management
53. Data licensing
A licence clarifies the conditions for accessing
and making use of a dataset
User
knows what‟s allowed without asking further
permission
Doesn‟t
exclude possibility of specific requests to
go beyond the terms of the licence
For databases, structure and content may be
covered by separate rights
Introduction to research
data management
54. Data licences - examples
Creative Common licences
Six different flavours, plus CC0 public domain dedication
Widely used and recognized
http://creativecommons.org/
Open Data Commons
Specifically designed for datasets
Recognizes the structure/content distinction
http://opendatacommons.org/
Introduction to research
data management
55. Data licensing - guidance
„How to License Research Data‟
A
guide from the Digital Curation Centre
http://www.dcc.ac.uk/resources/how-guides/license-research-data
Introduction to research
data management
57. Data management plans
A document which may be created in the early
stages of a project
While
An
planning, applying for funding, or setting up
initial plan may be expanded later
Details plans and expectations for data
Nature
of data and its creation or acquisition
Storage
and security
Preservation
and sharing
Introduction to research
data management
58. Exercise
Using the resources available, have a go at
drafting a data management plan for your own
research
If there are questions you can‟t answer at this
stage, make a note of
What
you need to find out
Decisions
you need to make
Introduction to research
data management
59. Digital Curation Centre
A national service
providing advice and
resources
Create a data
management plan
using the DMP online
tool
http://www.dcc.ac.uk/
https://dmponline.dcc.ac.uk/
Introduction to research
data management
60. „In preparing for
battle, I have always
found that plans are
useless but planning
is indispensable.‟
Dwight D. Eisenhower
Introduction to research
data management
62. ORA-Data and DataFinder
Two forthcoming University of Oxford services
Launch date TBC
Introduction to research
data management
63. ORA-Data (formerly DataBank)
University of Oxford‟s institutional data archive
Long term preservation for datasets without another
natural home
Datasets will be assigned DOIs
Will work alongside ORA-Publications to form a
composite University archive
In some cases, may a suitable home for DPhil data
Possible to link publications and datasets in ORA
Depositors can opt to make datasets publicly available,
embargoed for a fixed period, or hidden
Introduction to research
data management
64. DataFinder
A catalogue of datasets
Will harvest metadata from ORA-Data and other
compatible data stores
Information on the nature, location, and availability of the data
So anything in ORA-Data will have a record in DataFinder
Researchers depositing data elsewhere strongly
encouraged to add a record to DataFinder
Should provide a substantial resource for researchers
seeking datasets for reuse
Introduction to research
data management
66. Research data management website
Oxford‟s central
advisory website
University policy
is available
Questions?
Email
researchdata
@ox.ac.uk
http://researchdata.ox.ac.uk/
Introduction to research
data management
67. IT Services: Research Support Team
Can assist with technical aspects of research
projects at all stages of the project lifecycle
Help
But
with DMPs, selecting software or storage, etc.
the earlier you seek advice, the better
For more information, see our website:
http://research.it.ox.ac.uk
Introduction to research
data management
68. Research Data MANTRA
Free online
interactive
training modules
Aimed at
postgraduates
and early career
researchers
http://datalib.edina.ac.uk/mantra/
Introduction to research
data management
69. Any questions?
Ask now, or email us on
researchdata@ox.ac.uk
Introduction to research
data management
70. Rights and re-use
This slideshow is part of a series of research data management
training resources prepared by the IT Services Research Support
team at the University of Oxford
With the exception of clip art used with permission from Microsoft,
commercial logos and trademarks, and images credited to other
sources, the slideshow is made available under a Creative
Commons Attribution Non-Commercial Share-Alike License
Parts of this slideshow draw on teaching materials produced by
the DaMaRO Project, the PrePARe Project, DATUM for Health,
and DataTrain Archaeology
Within the terms of this licence, we actively encourage sharing,
adaptation, and re-use of this material
Introduction to research
data management
Editor's Notes
The first question to address is what the term ‘data’ actually refers to. Definitions vary, and to some extent, what counts as data will depend on the field of study. For many people, their initial association with the word ‘data’ will be numerical information (statistics, spreadsheets, or experimental results, for example), or perhaps the contents of highly structured information sources such as relational databases.However, data is far from being limited to these. Other examples include:Textual sources (literary or historical works that are being analysed, or interview transcripts)Websites (including all sorts of sites such as social media sites, as well as established academic sources)Works of art and other imagesAudio files (e.g. oral history, recordings of interviews or focus groups)VideosEmailsComputer source codeBooksPapersCatalogues, concordances and indexes The Digital Curation Centre suggests that data is “A reinterpretable representation of information in a formalized manner suitable for communication, interpretation, or processing.”Image montage adapted from PrePARe Project slideshow ‘What is data?”: http://www.lib.cam.ac.uk/dataman/training.html
A very broad definition – such as ‘any information you use in your research’ – works well for thinking about data management: it helps make sure you don’t miss out something important!Whatever your area of research is, you will be dealing with data in one form or another. Bear in mind that not all data is digital: print resources, handwritten notes, tape recordings, and hard copies of images may also be important sources.In addition to the data you collect or generate and analyse as part of a research project, it’s also worth thinking about the data you will create. This might include very structured collections of information, such as a relational database – or it might be something much more informal, such as a file of your own notes, summaries you create for your own reference, or a list of items to be examined.Image montage adapted from PrePARe Project slideshow ‘What is data?”: http://www.lib.cam.ac.uk/dataman/training.html
Discussion exercise for small groups.The length of this exercise can be varied depending on the time available – the idea is just to get participants to introduce themselves, and get them thinking about their own data.
Most of us find that we have many calls on our time, and that packing everything that needs to be done into the week is often a challenge. That being the case, it’s easy to feel as though research data management is simply one more thing to add to an already endless to-do list – or worse, that it’s a distraction from real work. However, there are a number of key reasons that it’s worth paying some attention to it.Good data management does require an investment of effort – but ultimately it’s something that can actually save you time, by helping you work more efficiently. You want to complete your research project to the best of your ability, but with minimum stress – and good research data management is one of the tools that can help you to do that.Many of us are all too well acquainted with the frustration of trying to track down a fact or a document we know we have somewhere. Setting up an organizational system that works for you, and ensuring everything is properly filed or labelled to enable re-identification and retrieval – can make life a lot easier. And it’s not just a matter of saving time and reducing unnecessary effort (though clearly that’s a major benefit): having everything well ordered can also help you get a better feel of the shape and scope of your research material, which in turn can enable you to spot patterns or connections that might otherwise get missed.As well as this being true for your own research, the data might ultimately be of use to other researchers. Having everything well organized and properly labelled also has the potential to save you a lot of time at the end of a research project, when it comes to deciding what to do with your data – but more of that later.Oxford now has a policy on data management, which sets out the various responsibilities of researchers working in the University.There may also be requirements imposed by your funding body which you need to meet.Image credit: Microsoft clip art.
As of summer 2012, the University of Oxford has an official policy on the management of research data and records
Note that the policy uses a specific definition of research data as the information that supports or validates research outputs. The policy only applies explicitly to data in this category – however, it’s still well worth thinking about the management of data construed more broadly, both from the perspective of making life easier for yourself, and because you may produce data that isn’t needed to back up an output from this particular project, but which nevertheless might be of use if shared with other researchers.The policy outlines two broad types of responsibility that researchers haveThe first of these is about data integrity – data should be correct and well storedThe second is about data sharing – as far as is reasonably possible, data should be made available for other to use
The key question for day-to-day data management is whether you can locate the material you want quickly and easily. It’s important to have a good system in place for dealing with new data when you acquire it – so you know where and how to store it so you’ll be able to retrieve it again without difficulty.
By default, most operating systems will organize things in a hierarchical file structure – files inside folders, which may be nested inside other folders. This great if your material can easily be grouped into relatively discrete categories. In planning a hierarchical folder structure, aim for a balance between breadth and depth – so no one category gets too big, but also so that you don’t have to click through endless folders to find a file. In some cases, it may be more helpful to use a tag-based system – where each file is assigned one or more tags, or labels. This makes it easier to have overlapping categories, and files can be categorised in multiple ways simultaneously (by subject, by author, and by the project it relates to, for example). Some modern operating systems will allow you to add tags to files; file tagging software is also available. Sometimes it can be quicker to find a file using the desktop search function rather than to look through your folder or tag structure. Windows and Mac both have decent in-built search utilities.It’s also worth taking time every now and then to reassess your folder or tag structure, perhaps moving old, unused items to a folder called ‘Archive’ or something similar so they don’t clutter up the screen.
More recent versions of Windows and Mac OS allow you to add tags to files as you save them. These can be used to retrieve the file at a later point.
Even within a hierarchical structure, there are ways of linking relating material.Hyperlinks can be used to link to another file on your computer (or a particular place within a file). So you could, for example, create a document listing all the data files which relate to a particular project, with some notes about them, and add hyperlinks to each data file so you can open them from within the document.If you want to be able to put a file in multiple places without duplicating it, try using a shortcut. Recognizable by the small curved arrow on the icon, these allow you to open a file that’s stored elsewhere on your computer.One use of this is to create project folders. If you have a collection of material which is relevant to a particular piece of work – a conference presentation, for example – but which is scattered around your file system because it also relates to other projects, you can create a shortcut to each file, and group these together in a project folder. You’ll have a quick way to access everything you need for that piece of work, without disturbing your original arrangement of material.
An ideal filename is concise yet informative. Ideally, you should be able to tell what’s in a file without opening itThe order of elements in a filename will also usually make a difference to the order of files within a folder, so a bit of planning can help ensure similar items are grouped together. Using the year-month-date format at the beginning of a filename makes it easy to sort files into chronological order. (The date that a file was created and last edited will often be recorded automatically, but you may sometimes want to associate a file with a date that is neither of these (e.g. when a particular meeting happened).)You can also force a particular order by adding a number to the start of a filename, or by adding a leading underscore to a file you want to appear at the top of the list. Filenames can also be used to record version information, so you can be sure you’re using the most recent one
It’s worth taking some time now and then to assess whether your current methods of handling information are meeting your needs.It can be tempting to stick with a software package because you’re familiar with it and don’t have to spend time learning something new, but if it doesn’t do what you need it to (or doesn’t do it easily), this is likely to cost you time (and cause additional hassle and frustration) in the medium and long term.One good way of finding out about new ways of working is by asking friends and colleagues for their recommendations. What do they use for similar tasks? How helpful do they find it?Image credit: Microsoft clip art.
The Research Skills Toolkit website provides an overview of lots of useful software and services, plus other tools and resources for researchers. It includes a substantial section on managing information. The Toolkit team also holds a series of hands-on workshops each year.The site provides a guide to software, tools, University services, and other things that are useful to know about. There’s a substantial section on information management.The site is hosted on WebLearn, and you’ll need to log in using your SSO credentials – the same username and password you use for Nexus email.
The IT Learning Programme offers an extensive range of IT coursesThese cover learning how to use specific pieces of software, IT-related skills (such as database design or programming), and how to make use of new technologies (such as social media or podcasting). Software courses can be a great way of trying out a software package without committing yourself to buying a copy before you’re sure it’s for you.The ITLP Portfolio website offers the course materials which you can use for self-study, and access to a range of other related resources
A new University service which will become available later this year is ORDS – the Online Research Database Service. It’s designed to allow academic researchers to create relational databases – so it’s a tool that might be used as an alternative to something like Microsoft Access or FileMaker Pro.The service uses cloud storage – so rather than your database being stored on your own computer, it’s hosted on a server, and you access it via a Web interface. This means you can access it from any computer with Internet access, and also has the advantage of meaning back up is taken care of automatically, without you needing to worry about itThe system is also set up to make collaboration – with people both in and outside Oxford – easy. All members of a project team can access the same version of the database, so there are no worries about whether you’re working with the latest version.If they wish to do so, the service will also allow users to make their databases publicly available. This might happen at the end of a project – or you might want to publish a specific sub-set of the data to accompany a research publication.For the longer term, if ORDS isn’t the most appropriate long-term home for your data, the system will be set up to allow easy transfer to the University’s new data archive (ORA-Data – more of that later) or elsewhere.The system is currently being tested by a small group of early adopters, but will become more widely available later in 2014. ORDS will be a paid-for service – the hope is that people will cost it into a research proposal from the beginning. If you’d be interested in finding out more, please email ords@it.ox.ac.uk
Losing crucial research material is the stuff of nightmares… but nightmares come true sometimes. This is a genuine poster from a pub in Cambridge [the picture has only been altered to straighten it, change the contrast to make it easier to read, and remove some of the details, e.g. the address of the pub and the person’s contact information]You might think ‘Ah, but I would take more care of my laptop/external hard-drive/back up disks’, but sometimes things are out of your control – fires, floods, and burglaries can all deprive you of your hard-won research data. Slide adapted from PrePARe Project slideshow “What is data?”: http://www.lib.cam.ac.uk/dataman/training.html
Back up is probably one data management thing that most people are aware that they should be doing, or doing better. It’s actually a good idea to have more than one back up copy, particularly of important and/or irreplaceable material; this is part of the LOCKSS principle (Lots Of Copies Keeps Stuff Safe). It’s also a good idea to keep these copies in different places, for example you might keep a copy of some material in a cloud-based service (WARNING: if your research deals with sensitive data you may not be able to do this), on an external hard-drive or on DVDs/CDs. Consider asking a friend/colleague or family member to look after one copy, or keep one copy at home and one in your office, so your material is physically in separate places. This minimises the risk of data loss in the case of flood, fire or theft. But remember that back-up isn’t the same as preservation – it’s just one aspect of it! If you have made a back-up copy of your data, that means you now have two copies in total to look after (and if you’re working with sensitive data, you need to make sure than an appropriate level of security is applied to each copy). But the good news is that this greatly reduces the risks to your data, and goes a long way to helping it stay safe over time.Slide adapted from PrePARe Project slideshow “Store it Safely”: http://www.lib.cam.ac.uk/dataman/training.htmlImage credits: Microsoft clip art
This is the storage and back-up plan of an analytical chemistry postdoc working in Oxford.Storing data on a computer connected to the instruments used to collect it is often a practical and convenient option. However, it places a lot of reliance on one machine, hence it’s important to have proper back-ups. In this case, two types of additional copy are made of most of the material – to DVDs, and to external hard drives attached to desktops. (DVDs are a relatively hardwearing storage solution for medium-term preservation. However, if data is being kept long-term, it’s good practice to refresh the storage media (e.g. by copying the data to new disks) every three years. DVDs may last significantly longer, but unfortunately you generally only find out that something’s gone wrong after the event, when it’s too late to do anything about it.)A shared folder on a departmental server is a handy way of allowing everyone in a particular research group to access files. Using a server which is automatically backed up every day takes a lot of the stress out of keeping data safe – as it doesn’t rely on individual researchers remembering to make a back-up copy.It’s worth remembering that not all data that needs to be taken care of is digital. In this case, hard copies of lab books also make up an important part of the research record.
Oxford has a central back-up and archiving service called HFS, provided via IT Services. (You may also sometimes hear people refer to this as TSM – this is the name of the client software used to run back-ups.)The service is free to University staff and postgraduates.You can set up the system to perform automated back-ups of computers connected to the University network (these usually happen overnight). If that’s not convenient, you can run a manual back-up. (If you’ve had trouble with automated back-ups, contact the HFS team and they should be able to help.)Three copies of your data will be made. One of these is stored outside Oxford, so even if there were to be a flood or a fire at IT Services, your data would still be safe.
It’s worth thinking about what you’re storing your data on. Storage media don’t last forever – they can degrade over time, get broken, or in the case of small, portable storage solutions like USB drive, simply get lost down the back of the sofa. Ultimately, they may become obsolete – when was the last time you saw a computer with a floppy disc drive?Think also about the file formats you’re using. Software changes rapidly, and over the long term, proprietary formats can also become obsolete: you might find yourself going back to an old file and not being able to open it. Some formats are better because they are open (i.e. not controlled by one particular software company), in widespread use, or conform to standards, and it’s best to use these for the long-term preservation of a file once you’re no longer working on it (even if you use a different format while you’re actually working on the data). Many proprietarysoftware packages have a ‘Save As’ or ‘Export’ function that you can use to make a copy of your data which will be readable by other applications – in plain text or .csv format, for example – though be aware that this probably won’t preserve formatting. Slide adapted from PrePARe Project slideshow “Store it Safely”: http://www.lib.cam.ac.uk/dataman/training.htmlAdditional image credits: CD Rom and floppy disk images are Microsoft clip art
Documentation is an umbrella term covering the contextual information that a user would need to make sense of a dataset. Sometimes this will be given at the study level – perhaps a text document that accompanies the data, giving information about when, where, and by whom the study was conducted, what its aims were, the methods used, and so forth. Sometimes it will be at the data level – labels or other information which ensure data can be properly interpreted (this might include giving helpful variable or field names, or including the units for measurements).
First of all, because documentation should be thorough it will contain a lot of information that might seem obvious. But will that same information still be obvious in a few months, years, decades, centuries… time?It’s very easy to assume that you will remember everything, but in fact it’s all too easy to lose track of crucial information if it’s not recorded somewhere. Having that information accessible also means that other people can understand what you’ve done and why. It’s important to include context (why you did your research, how it fits into other contemporary research, or follows on from previous work), as well as explaining your methods and analytical techniques. This is related to the next point…Slide adapted from PrePARe Project slideshow “Explain It”: http://www.lib.cam.ac.uk/dataman/training.html
By providing documentation, you can provide the methodology of how you generated, collected or produced your data (for example information about collection strategies, interview methods, survey techniques, algorithms, database searches), and how you reached your conclusions from your data (for example any statistical methods you used). This is useful for you if you need to replicate or adapt or re-purpose an aspect of your research method later on.This is important as it means that people can reproduce your research, either to verify your conclusions or as a starting point to develop your work further. In many research groups, this could be a student or post-doc who continues work started by a previous group member. Replicating methodology can also be a useful training tool.Slide adapted from PrePARe Project slideshow “Explain It”: http://www.lib.cam.ac.uk/dataman/training.html
Proper documentation helps avoid this sort of situation…From http://twentytwowords.com/scientists-explain-their-processes-with-a-little-too-much-honesty-17-pictures/
Possible answers include:Data level:Explanation of the column name ‘E?’ needs to be providedThe gender column seems to be using coding – what do ‘1’ and ‘2’ stand for?Units are needed for the numerical informationMore helpful/informative column names generally would be useful – what’s actually being measured here? (Is this weight loss data? If so, what’s the significance of positive and negative values? Does a negative value indicate an actual weight loss, or a negative weight loss (i.e. a weight gain)?) Date information could also be provided – when did this study take place?Study level:Does the dataset have a title?What is the nature of the study being recorded here?Who did the work? When? Where? What were they setting out to achieve?What methods were used? Specifically, what do the four programmes (GI, Mediterranean, Low fat, Cal count) denote? (One might make a guess that these are all diets, but we need more information for this to be useful.)How were missing values indicated within the dataset? (Lines 7, 14, and 21 have a 0 in the E? column, and line 15 is blank. Is 0 a missing value, or something else?)Has data from another set been incorporated here? Lines 7, 14, and 21 differ from the other entries in a number of other ways (full first names rather than initial, age of 3, different precision for numerical values) – is this an indication that data has been pulled in from somewhere else? Does age 3 actually mean three years old, or is this a coded variable that’s been copied across from another dataset?Did the subjects consent to be included in the study? What did they consent to, exactly? Does this place any restrictions on how the data might be used?
Slide adapted from PrePARe Project slideshow “Explain It”: http://www.lib.cam.ac.uk/dataman/training.html
Metadata is a specific type of documentation – a formal description of a dataset which conforms to a particular structure. One typical use of metadata is to create a catalogue record for a dataset held in an archive.(Note: at time of writing, the data had not actually been published in the BMRB, although the researcher planned to deposit it there after the research based on the data had been published.)The image shows metadata for a dataset from the research project. It follows the Dublin Core metadata standard – a straightforward, widely-used structure which is not tied to any specific discipline. The metadata (in blue in this image) is enclosed in tags, much like HTML. This makes the metadata machine readable – by using a standard set of tags, an automatic system can tell where the information about the title, creator, description and so forth begin and end. Dublin Core is only one of many metadata standards – others may be appropriate in specific disciplines where there is other information (location details, for example) that need to be recorded.(Individual researchers may or may not need to create formal metadata of this sort for a given dataset. However, for all datasets, it’s important that researchers preserve all the contextual information that’s needed to enable proper interpretation of the data. If this is recorded, creating metadata if/when it’s needed should be straightforward.)
“The open source ISA metadata tracking tools facilitate standards compliant collection, curation, local management and reuse of datasets in an increasingly diverse set of life-science domains.”Image from http://isa-tools.org/
This 18th centurypainting by Maria Cosway is part of a collection on display at Chatsworth House in Derbyshire. The subject is Georgiana Cavendish, Duchess of Devonshire (portrayed by Keira Knightley in the 2008 film The Duchess).It shows her as Diana, the goddess of the moon. Some sources, however, say she’s depicted as Cynthia from Spenser’s Faerie Queene. (At time of writing, the Wikimedia Commons metadata is itself inconsistent: the image title says she’s Diana, but the image description says she’s Cynthia.) In fact, Diana and Cynthia are different names for the same figure, so this isn’t as much of a contradiction as it might appear. However, there’s plenty of potential for confusion here!If you look closely, you can see that Georgiana has six toes. There are various theories about why this is: perhaps she really did have six toes (though there’s a lack of other evidence to support this), perhaps it’s an artistic shorthand hinting that the subject had supernatural abilities or a sixth sense, or perhaps the artist simply couldn’t count! However, no one really knows why: there’s no surviving record of the artist’s intention in giving her subject this unusual feature.A symbolic message, or just a mistake? Without the relevant metadata, we’ll never know.Image credit: Wikimedia Commons: http://commons.wikimedia.org/wiki/File:Georgiana_Cavendish,_Duchess_of_Devonshire_as_Diana.jpg
Discussion exercise for small groups, or for people to chat about over coffee.The length of this exercise can be varied depending on the time available. If time permits, it may be useful to ask the small groups to feed back to the group as a whole, and in particular to encourage sharing of hints, tips, and solutions to specific problems.
As you’ve probably put a lot of effort into creating data in the course of your research, it’s worth thinking about how that data can be preserved for the long term after your project ends. As mentioned previously, many funders now require this.The best way to do this is to deposit it in an archive or repository. There may be an appropriate archive devoted to data in your discipline. For datasets that don’t have another natural home, Oxford will soon have its own data archive – more of that laterIdeally, data should be made available for others to re-use
Sharing data can build your reputation in number of ways. Laying your work open to scrutiny means that you will get credit for high quality research, increased understanding of your methods and allowing your work to be verified by others. Sharing allows you to make a greater contribution to your community – and to be recognized for doing so. It can also help extend your reputation beyond that community.There is also substantial evidence that making your data openly available leads to increased citations – of the datasets themselves, and of the papers or other publications based on the data.Slide adapted from PrePARe Project slideshow “Share It”: http://www.lib.cam.ac.uk/dataman/training.html
Sharing your research allows it to be re-used; this might be within your field, for example using the data as a starting point for a complementary study, or as test data for new software and algorithms. It might be useful for teaching purposes. Sharing data means that someone else working in a similar area doesn’t have to waste time duplicating the work you’ve already done. If datasets can be used in multiple research projects, that means the funding that allowed them to be created is being used more effectively – a key reason that many funding bodies are now requiring that data be shared where possible.Data might even be re-used in contexts that can’t currently be envisaged – for example in new developments several years down the line, or in completely different fields. And you will get credit as your work will be cited each time.Slide adapted from PrePARe Project slideshow “Share It”: http://www.lib.cam.ac.uk/dataman/training.html
A major change is happening within academia at the moment. Data outputs are being viewed as increasingly important, and this trend is only likely to continue - for example, major journals are increasingly looking to publish (or provide access to) datasets alongside the articles reporting on and interpreting the data.This provides an exciting opportunity for researchers: a chance to be at the forefront of a new movement. It’s well worth embracing this change – if you start getting your data out there in the public sphere now, then you’ll have a headstart.Image credit: Microsoft clip art
Figshare is a free online data sharing platform – anyone can upload a dataset (or other research objects, such as charts, visualizations, or posters) and make it available on the Web, with a DOI to enable consistent citation.There are advantages and disadvantages to using a service like Figshare. It’s quick, convenient, and doesn’t require you to meet the sometimes extensive requirements of conventional repositories. On the other hand, your data won’t be as easily discoverable, and because of the lack of quality control, people who do find your data this way won’t have any assurance that the dataset is one they can trust. Figshare may, however, be useful in situationsWhere you don’t have easy access to a conventional repository, or where no suitable repository exists for your discipline.If you need to share some data rapidly – for example, you’re giving a conference presentation and want to be able to make the underlying data available as well. Figshare will let you do this, and you can then reference the data’s DOI in your presentation.
Link to video from http://www.youtube.com/watch?v=N2zK3sAtr-4. A data management horror story by Karen Hanson, Alisa Surkis and Karen Yacobucci, of NYU Health Sciences Libraries.This short and entertaining video highlights a few potential pitfalls in the process of data sharing, through a vivid example of how not to do it.
In some cases, there may be concerns about sharing data, or reasons why all or part of a dataset needs to be kept private. These may be ethical (the data is confidential), legal (the dataset includes third party material with restrictions on usage), or professional (you intend to publish the results, and don’t want someone to get there first).Image credit: Microsoft clip art
Many potential difficulties or concerns regarding data sharing can be alleviated by forward planning.If your research involves human subjects (e.g. in clinical trials, interviews, or surveys), it’s worth thinking about the consent you ask them for, and whether they are happy for their data (perhaps in an anonymised form): it’s much easier to ask about this at the beginning of the process than try to do it retrospectively. The UK Data Archive provide some example consent forms. Clearly it’s important not to pressure people into sharing more than they’re comfortable with, but in some cases, people may be perfectly happy for their data to be shared with other researchers – you won’t know unless you ask them.If you’re using third party data which has restrictions on its use, and will be combining this with new data that you’re gathering, it’s worth making sure that you keep separate copies of each of these. When the time comes to share your data, it’ll be easy to make the distinction between the data that’s under your control, and the data that belongs to someone else. Proper documentation is also important here: this will help keep track of what you’re allowed to do with data, and what’s happened to it in the course of the project.Image credit: Microsoft clip art
You can also redact material, for example 3rd party copyrighted material in a PhD thesis, or place embargoes so that it cannot be accessed for a certain period, for example because of publisher requirements or applying for a patent. Such measures may also be necessary with some confidential information.Slide adapted from PrePARe Project slideshow “Share It”: http://www.lib.cam.ac.uk/dataman/training.html
For more guidance about good practice in research, have a look at the online Research Integrity courses, provided via WebLearn. These will introduce some key principles, and provide some prompts for thinking about how you might apply these in your own work.
A data management plan is, as the name suggests, a document which outlines how data will be managed over the course of a project.One may be created when a project is still in the initial planning stages, as part of a funding application (this may be a requirement), or when the project is in the process of getting underwayIt’s common for there to be more than one version of a plan: an initial outline might be produced for the funding application, then fleshed out if the application is successfulThe plan gives details of what sort of data the project expects to be dealing with, and what will be done with it. This might include:A description of the type of data that will be used and where it will come from – how it will be created, or where it will be obtained from if pre-existing datasets are being usedHow the data will be stored and kept safe during the projectWhat plans there are for preserving the data after the end of the project, and for sharing it with other researchers
Practical exercise which can last a flexible amount of time. The resources available will include David Shotton’s ‘Twenty Questions for Research Data Management’, the DCC’s checklist leaflet, and a very basic data management plan template based on one developed by DataTrain. Participants can make use of whichever of these they find most helpful.If it seems appropriate, this may be followed by a brief discussion session, in which participants are invited to give feedback on their experience of trying to draft a data management plan.
The Digital Curation Centre is a national service providing advice and resources to researchers and their institutions. Although their primary focus is (as their name suggests) on longer-term curation and preservation of research data, they offer information relating to the whole data lifecycle.One particularly helpful resource is their online data management planning tool. When building a plan, you can select a template which reflects the requirements of your particular funding body.
A final thought on the subject of plans and planning.A research project isn’t – or shouldn’t be – a battle, but President Eisenhower’s words nevertheless have some relevance in this context. It is almost inevitable that unexpected events will arise – it’s very rare that everything goes exactly as anticipated. But although this means you may often have to adapt your plan on the fly, this makes having created a plan in the first place more essential, not less. If you’ve thought through all the relevant issues, you’re less likely to be taken by surprise – and you’ll be better placed to respond when the unexpected does crop up.Public domain image, from http://commons.wikimedia.org/wiki/File:Dwight_D._Eisenhower,_official_Presidential_portrait.jpg
ORA-Data (formerly known as DataBank) and DataFinder are two forthcoming University of Oxford services. They will be key parts of a larger research data management infrastructure that the University is in the process of developing. These services are being offered in part to enable researchers to comply with funder requirements and the demands of the new University policy.The launch date of these services is still to be determined: at the moment the plans are being reviewed by the relevant University committees.(The DataFinder screenshots is taken from the development version, and the ORA screenshot from the current repository home page – the final versions will look slightly different. It’s also still possible the names of the services will change.)
ORA-Data will be the University of Oxford’s institutional data archive. This is the new name for the planned service formerly known as DataBank.It is intended to provide a long-term preservation option for datasets without another natural home – where, for example, no suitable national or discipline-based repository is available.Once depositing DPhil data becomes a condition of award for the degree, ORA-Data may be a suitable place for some DPhil data to be deposited.DOIs (Digital Object Identifiers) can be assigned to datasets deposited in ORA-Data. A DOI is a unique, permanent identifier for an electronic object such as a document, Web page, or dataset – it can be set to point to wherever the object is currently hosted. This means a DOI can be used to refer to the dataset in publications and so forth, and as long as the DOI metadata is kept updated, it will always send the reader to the right place. (This is preferable to using a URL, as these frequently change.)ORA-Data will operate in parallel with ORA-Publications, which is what the University’s existing archive for research publications will become known as. It will be possible to create a link between a publication in ORA and the underlying datasetResearchers depositing datasets in ORA-Data will have control over the availability of their data. They may choose to make a dataset publicly available, or to embargo it for a fixed period (so, for example, the data might become available a year or three years after being placed in ORA-Data). Sensitive data may be kept hidden permanently; in this case the data owner may choose either to make a record for the data available (so others can see that it exists, and perhaps contact the data owner to ask questions about it), or to make both data and record invisible.
DataFinder is a catalogue of datasets held by the University of Oxford and elsewhereDataFinder records will provide information about the nature of the dataset, where it is hosted, and (if details are given by the source) the availability of the data. Records for non-digital data can also be created in DataFinder: in this case, the record will include a description of the data and contact details for the data holder.DataFinder will harvest metadata about datasets from ORA-Data, and from other repositories or data stores that make their metadata available in a suitable form. These include ORDS, the database service mentioned earlier.This means that if a datasets is deposited in ORA-Data, a record for it will automatically be created in DataFinder (unless, of course, the ORA-Data record is set to be invisible).It will also be possible to add records to DataFinder manually, and researchers depositing data elsewhere are strongly encouraged to do this. The aim is for DataFinder to include a comprehensive listing of datasets created or owned by members of the University of Oxford.Once populated, DataFinder will be a substantial resource for researchers who want to find datasets they might be able to reuse in their own research, or who are looking for information about research that has already been conducted.
The University of Oxford has a central Research Data Management website, which provides a central information source on this subject. A copy of the University Policy on the Management of Research Data and Records can be downloaded from here.The site was relaunched (with a new URL) in February 2014.
IT Services has a team of people who provide support to researchers. They can assist with various aspects of the technical side of a research project throughout the project lifecycle – planning, setting up, doing the work, and what happens at the end of the project. If you need some help setting up a database, building a website, or working out where and how to store your data, the Research Support Team may be able to help.The earlier in the research process you seek advice, the better – preferably while things are still in the planning stages.You can find more information on the team’s website, http://blogs.it.ox.ac.uk/acit-rs-team/about/, or by emailing researchsupport@it.ox.ac.uk
Research Data MANTRA is a series of free interactive online training modules covering key research data management issues.The modules are designed for postgraduates and early career researchers. The course describes itself as being particularly geared towards people working in geosciences, social and political sciences, and clinical psychology, but don’t be put off by this – in fact much of the course material is relevant to all research disciplines.