SlideShare una empresa de Scribd logo
1 de 55
My Data, Our Data, Your Data:
data reuse through data management
Kevin Ashley
Digital Curation Centre
www.dcc.ac.uk
@kevingashley
Kevin.ashley@ed.ac.uk
Reusable with attribution: CC-BY The DCC is supported by Jisc
A summary
• Why data reuse ?
• What stops us ?
• How data management helps
• Harmonising the goals of research
administration and research
• Barriers again
• The case for reuse - again
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 2
My home – the DCC
• Mission – to
increase capability
and capacity for
research data
services in UK
institutions
• Not just a UK
problem – an
international one
• Training, shared
services, guidance,
policy, standards,
futures
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 3
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 4
What is data curation ?
• “Maintaining, preserving and adding value to
research data throughout its lifecycle”
• More than preservation:
– Active management – dealing with change
• Less than preservation:
– Lifecycle sometimes involves destruction
DCC guidance
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 5
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 6
SWEDEN
DENMARK
CANADA
Data reuse stories
• The palaeontologist who saved years of work
with archaeological data
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 7
What a paleontologist looks at
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 8
Now
100 million
years ago
25m
50m 75m
1m
What a paleontologist looks at
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 9
Now
100 million
years ago
25m
50m 75m
1mNow 1 million
years
750,000500,000100,000
What an archaeologist looks at
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 10
Now 1 million
years
750,000500,000100,000
100,000
years ago
75,000
50,00025,000
Data reuse stories
• The palaeontologist who saved years of work
with archaeological data
• The 19th-century ships logs that help us model
climate change
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 11
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 12
The Old
weather
project
Data for
research,
not from
research
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 13
Data reuse stories
• The palaeontologist who saved years of work
with archaeological data
• The 19th-century ships logs that help us model
climate change
• The ‘noise’ from research radar that mapped
dust from Eyjafjallajökull
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 14
Data reuse - messages
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 15
Often your data tells
stories that your
publications do not
Not all data comes from
other researchers
One person’s noise is
another person’s signal
Discipline-bounded data
discovery doesn’t give us
all we need or want
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 16
Why care?
• Data is expensive – an investment
• Reuse:
– More research
– Teaching & Learning
– Planning
• Impact – with or without publication
• Accountability
• Legal & regulatory requirements
Why does this matter?
• Research quality
– How close can we get to
the truth?
• Research speed
– How quickly can we get
to the truth?
• Research finance
– How much does the
truth cost?
• Improving one or more
of these is of interest to
all actors:
• Researchers as data
creators
• Researchers as data
reusers
• Research institutions
• Funders – hence
government and society
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 17
G8UK - Endorses
OA
Open Data
Charter
Policy Paper
18 June 2013
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 18
G8UK - Billigt offenen Zugang
Eine offene Daten Charter
Strategiepapier.
Funder requirements
• UK
• USA – NSF, NEH, NIH
• Europe
• Most place burden on
researcher – some on
the institution
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 19
http://www.epsrc.ac.uk/about/standards/researchdata/Pages/policyframework.aspx
RCUK policy - The 1-minute version
• Research data are a public good – make openly
available in timely & responsible way
• Have policies & plans. Data with long-term value
should be preserved & usable
• Metadata for discovery & reuse. Link publications &
data
• Sometimes law, ethics get in the way. We understand.
• Limited embargos OK. Recognition is important –
always cite data sources
• OK to use public money to do this. Do it efficiently.
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 20
EPSRC policy points
• Awareness of regulatory environment
• Data access statement
• Policies and processes
• Data storage
• Structured metadata descriptions
• DOIs for data
• Securely preserved for a minimum of 10 years
from last use
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY
Compliance
expected by 2015
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 22
DCC Policy
Summary
http://www.dcc.ac.uk/resources/policy-and-legal
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 23
Findable, citable data has value
• Important to link publications to data (and vice
versa)
• Increases citations – of data & publication
• Increases reuse (hence value)
• But effects exist even without publication, if data is:
– Archived
– Citable
– Discoverable
MORAL: build a data registry
What stops data reuse
• Loss
• Destruction
• Pride
• Gluttony
• Ineptitude
• Concealment
• Bureaucracy
• Complexity
• Procrastination
• Lack of potential
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 24
Kevin Ashley – Eurocris2014 -
CC-BY
25
“Departments don’t have guidelines or norms for
personal back-up and researcher procedure,
knowledge and diligence varies tremendously.
Many have experienced moderate to
catastrophic data loss”
Incremental Project Report, June 2010
http://www.flickr.com/photos/mattimattila/3003324844/
2014-05-14
What stops data reuse
• Loss
• Destruction
• Pride
• Gluttony
• Ineptitude
• Concealment
• Bureaucracy
• Complexity
• Procrastination
• Lack of potential
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 26
How people talk about data
• I put my data in figshare and I got a DOI for it
• Not our data; the university’s data; my
funder’s data; the data; the people’s data;
your data.
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 27
Data ownership – it’s messy
• You need ownership to make data free
• Governments may assert this
• Industrial collaborators – understanding role
of public funding
• Research admin tracks the rules
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 28
ON METADATA
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 29
Disciplines – current state
• Typically specialised
• Focussed on discipline-specific concerns
• Frequently embedded – hence processing
required to expose independently
• Historic failure to express generic concepts
generically
– Place
– Time
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 30
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 31
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 32
Understanding Data Requirements
http://www.dcc.ac.uk/
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 33
Data centres are good value!
• See Jisc reports on ADS, BADC, UKDA:
• Returns on investment between 400% and
1200%
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 34
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 35
Integrity
• Not everyone publishes
here
• Almost all fraud
connected to
unavailable data
• People suffer & die due
to research fraud
• When your research is
reproducible – it gets
cited
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 36
Integrity – not without data
• Cyril Burt
– Twin studies on intelligence.
– Questioned 1976; now discredited
• Duke case
– Data hiding leads to wasted treatments, clinical
trials, probable death & huge lawsuits
• Dutch cases
– Stapel – 55 publications – “fictitious data”
– Poldermans – fabricated data or negligence?
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 37
“The case for open data: the Duke Clinical Trials “– blog post, Kevin Ashley, http://www.dcc.ac.uk/news/case-open-data-duke-clinical-trials
“Lies, Damned Lies and Research Data: Can Data Sharing Prevent Data Fraud?” – Doorn, Dillo, van Horik, IJDC 8(1); doi:10.2218/ijdc.v8i1.256
Citability
• Making data available increases citations
• Everyone – academic, funder, institution –
loves citations
• Want evidence?
– Alter, Pienta, Lyle – 240%, social sciences *
– Piwowar, Vision – 9% (microarray data)†
– Henneken, Accomazzi – 20% (astronomy) #
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 38
† Piwowar H, Vision TJ. (2013) Data reuse & the open data citation advantage. PeerJ PrePrints 1:e1v1
http://dx.doi.org/10.7287/peerj.preprints.1v1
* Amy Pienta, George Alter, Jared Lyle, (2010) The Enduring Value of Social Science Research: The Use and Reuse of Primary Research Data.
http://hdl.handle.net/2027.42/78307
# Edwin Henneken, Alberto Accomazzi, (2011) Linking to Data - Effect on Citation Rates in Astronomy. http://arxiv.org/abs/1111.3618
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 39
How to cite data
What data to keep
The Data Deluge is upon us
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 40
Sensor’s ability
to produce data
outstrips IT’s
ability to
process it
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 41
Roles and
Responsibilities
What data to keep
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 42
Excuses – and responses
• “People will ask questions”
– So use a data centre or repository
• “It will be misinterpreted”
– Stuff happens. Also, openness encourages correction
• “It’s not interesting”
– Let others be the judge – your noise is my signal
• “I might get another paper out of it”
– Up to a point. We might get more research out of it
• “I don’t have permission”
– A real problem. But solvable at senior level
• “It’s too bad/complicated” –see above
• “It’s not a priority”
– Unfortunately, funders are making it so. But if you looked at the
evidence, it would be your priority as well
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 43
See e.g. Carly Strasser’s blog:
http://datapub.cdlib.org/2013/04/24/closed-data-excuses-excuses/
Should all data be open?
• NO
• Many reasons – most to do with human
subjects
• But data existence should always be open
• Allows discovery & negotiation on use
• Avoids pointless replication
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 44
Kevin Ashley – Eurocris2014 -
CC-BY 45
Some conundrums
• Releasing genome data is OK when it’s:
– An identified human subject
– An anonymous human subject
– Your pet dog
– Another mammal
– An insect
– A plant
– A virus
2014-05-14
It’s amazing what people will share…
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 46
Data reuse from Hubble
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 47
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 48
Pimp your
data –
make it
findable &
reusable
2014-04-25 Kevin Ashley, DCC – SocSciScot14 - CC-BY 49
Gking.harvard.edu/data
Data is variable
• Not always textual
• Not always tabular
• Not always fixed – continual change
• Not always clearly authored – think of archival
provenance
• Not always associated with publication
• Often with indistinct boundaries
• Multi-dimensional and non-linear
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 50
Some messages for you
• Some things we need to know about data:
– When/where/what is it about?
– Who owns it
– What rights apply
– What it is derived from & how
– What software may be associated
– What data management plan applies
– How do I gain access ?
– Where is it ?
– When was/will it be destroyed?
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 51
What about your data?
• If administrative data isn’t freely available,
why not?
• Expose it in bulk – not just as a web page
• Gain the value from your overheads!
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 52
What about collaboration?
• Collaborate within the university
• Collaborate with partners
• Collaborate with regional, national services
• Not everything can be done well locally
• Some examples…
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 53
http://dataintelligence.3tu.nl/en/home/
Choice of RDM training
materials for librarians
Up-skilling
for data
http://datalib.edina.ac.uk/mantra/libtraining.html
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 54
My message to researchers
• The credit belongs to you
• The data belongs to all of us
• Share, and we all reap the
benefits
2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 55

Más contenido relacionado

La actualidad más candente

Jisc's new shared data centre
Jisc's new shared data centreJisc's new shared data centre
Jisc's new shared data centreJisc
 
Save money and consolidate data in one safe environment - Jisc Digital Festiv...
Save money and consolidate data in one safe environment - Jisc Digital Festiv...Save money and consolidate data in one safe environment - Jisc Digital Festiv...
Save money and consolidate data in one safe environment - Jisc Digital Festiv...Jisc
 
The Janet network: your digital utility - Jisc Digifest 2016
The Janet network: your digital utility - Jisc Digifest 2016The Janet network: your digital utility - Jisc Digifest 2016
The Janet network: your digital utility - Jisc Digifest 2016Jisc
 
DCC's role in the UMF Programme
DCC's role in the UMF ProgrammeDCC's role in the UMF Programme
DCC's role in the UMF ProgrammeEduserv
 
Collaboration Slideshow
Collaboration SlideshowCollaboration Slideshow
Collaboration SlideshowIntersectAust
 

La actualidad más candente (6)

Jisc's new shared data centre
Jisc's new shared data centreJisc's new shared data centre
Jisc's new shared data centre
 
Save money and consolidate data in one safe environment - Jisc Digital Festiv...
Save money and consolidate data in one safe environment - Jisc Digital Festiv...Save money and consolidate data in one safe environment - Jisc Digital Festiv...
Save money and consolidate data in one safe environment - Jisc Digital Festiv...
 
The Janet network: your digital utility - Jisc Digifest 2016
The Janet network: your digital utility - Jisc Digifest 2016The Janet network: your digital utility - Jisc Digifest 2016
The Janet network: your digital utility - Jisc Digifest 2016
 
RDM initiatives
RDM initiativesRDM initiatives
RDM initiatives
 
DCC's role in the UMF Programme
DCC's role in the UMF ProgrammeDCC's role in the UMF Programme
DCC's role in the UMF Programme
 
Collaboration Slideshow
Collaboration SlideshowCollaboration Slideshow
Collaboration Slideshow
 

Similar a My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

Opening up data: a UK perspective – Jisc and CNI conference 10 July 2014
Opening up data: a UK perspective – Jisc and CNI conference 10 July 2014Opening up data: a UK perspective – Jisc and CNI conference 10 July 2014
Opening up data: a UK perspective – Jisc and CNI conference 10 July 2014Jisc
 
National Research Data Services in the UK and elsewhere (#confdados)
National Research Data Services in the UK and elsewhere (#confdados)National Research Data Services in the UK and elsewhere (#confdados)
National Research Data Services in the UK and elsewhere (#confdados)Kevin Ashley
 
University of Northumbria Research
University of Northumbria ResearchUniversity of Northumbria Research
University of Northumbria ResearchKevin Ashley
 
Supporting open research - how to help your researchers - Vitae15
Supporting open research - how to help your researchers - Vitae15Supporting open research - how to help your researchers - Vitae15
Supporting open research - how to help your researchers - Vitae15Kevin Ashley
 
Towards Biomedical Research as a Digital Enterprise
Towards Biomedical Research as a Digital EnterpriseTowards Biomedical Research as a Digital Enterprise
Towards Biomedical Research as a Digital EnterprisePhilip Bourne
 
The art of depositing social science data: maximising quality and ensuring go...
The art of depositing social science data: maximising quality and ensuring go...The art of depositing social science data: maximising quality and ensuring go...
The art of depositing social science data: maximising quality and ensuring go...Louise Corti
 
Trust: when we need it and how to get it
Trust: when we need it and how to get itTrust: when we need it and how to get it
Trust: when we need it and how to get itKevin Ashley
 
Research data for repository managers
Research data for repository managers Research data for repository managers
Research data for repository managers Kevin Ashley
 
AKVS - Edinburgh Data Repository Experiences June 2016
AKVS - Edinburgh Data Repository Experiences June 2016AKVS - Edinburgh Data Repository Experiences June 2016
AKVS - Edinburgh Data Repository Experiences June 2016University of Edinburgh
 
Opening up data – Jisc and CNI conference 10 July 2014
Opening up data – Jisc and CNI conference 10 July 2014Opening up data – Jisc and CNI conference 10 July 2014
Opening up data – Jisc and CNI conference 10 July 2014Jisc
 
Engaging with students and researchers: the case of the social sciences
Engaging with students and researchers: the case of the social sciencesEngaging with students and researchers: the case of the social sciences
Engaging with students and researchers: the case of the social sciencesLouise Corti
 
Northumbria University case study
Northumbria University case studyNorthumbria University case study
Northumbria University case studyJisc RDM
 
Emerging researchers slideshow jv r -7-fonts
Emerging researchers slideshow   jv r -7-fontsEmerging researchers slideshow   jv r -7-fonts
Emerging researchers slideshow jv r -7-fontseResearchatUCT
 
Data in the research process: a funder's perspective – Mark Thorley, National...
Data in the research process: a funder's perspective – Mark Thorley, National...Data in the research process: a funder's perspective – Mark Thorley, National...
Data in the research process: a funder's perspective – Mark Thorley, National...OpenAIRE
 
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...CINECAProject
 
L cwebinar toon_feb26-2015
L cwebinar toon_feb26-2015L cwebinar toon_feb26-2015
L cwebinar toon_feb26-2015Library_Connect
 
Shared research data management service
Shared research data management serviceShared research data management service
Shared research data management serviceJisc RDM
 
20160414 23 Research Data Things
20160414 23 Research Data Things20160414 23 Research Data Things
20160414 23 Research Data ThingsKatina Toufexis
 

Similar a My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote) (20)

Opening up data: a UK perspective – Jisc and CNI conference 10 July 2014
Opening up data: a UK perspective – Jisc and CNI conference 10 July 2014Opening up data: a UK perspective – Jisc and CNI conference 10 July 2014
Opening up data: a UK perspective – Jisc and CNI conference 10 July 2014
 
National Research Data Services in the UK and elsewhere (#confdados)
National Research Data Services in the UK and elsewhere (#confdados)National Research Data Services in the UK and elsewhere (#confdados)
National Research Data Services in the UK and elsewhere (#confdados)
 
University of Northumbria Research
University of Northumbria ResearchUniversity of Northumbria Research
University of Northumbria Research
 
Supporting open research - how to help your researchers - Vitae15
Supporting open research - how to help your researchers - Vitae15Supporting open research - how to help your researchers - Vitae15
Supporting open research - how to help your researchers - Vitae15
 
Towards Biomedical Research as a Digital Enterprise
Towards Biomedical Research as a Digital EnterpriseTowards Biomedical Research as a Digital Enterprise
Towards Biomedical Research as a Digital Enterprise
 
Critique and Reflections on Open Data Initiatives
Critique and Reflections on  Open Data  InitiativesCritique and Reflections on  Open Data  Initiatives
Critique and Reflections on Open Data Initiatives
 
Wiser2009 Luis Martinez
Wiser2009 Luis MartinezWiser2009 Luis Martinez
Wiser2009 Luis Martinez
 
The art of depositing social science data: maximising quality and ensuring go...
The art of depositing social science data: maximising quality and ensuring go...The art of depositing social science data: maximising quality and ensuring go...
The art of depositing social science data: maximising quality and ensuring go...
 
Trust: when we need it and how to get it
Trust: when we need it and how to get itTrust: when we need it and how to get it
Trust: when we need it and how to get it
 
Research data for repository managers
Research data for repository managers Research data for repository managers
Research data for repository managers
 
AKVS - Edinburgh Data Repository Experiences June 2016
AKVS - Edinburgh Data Repository Experiences June 2016AKVS - Edinburgh Data Repository Experiences June 2016
AKVS - Edinburgh Data Repository Experiences June 2016
 
Opening up data – Jisc and CNI conference 10 July 2014
Opening up data – Jisc and CNI conference 10 July 2014Opening up data – Jisc and CNI conference 10 July 2014
Opening up data – Jisc and CNI conference 10 July 2014
 
Engaging with students and researchers: the case of the social sciences
Engaging with students and researchers: the case of the social sciencesEngaging with students and researchers: the case of the social sciences
Engaging with students and researchers: the case of the social sciences
 
Northumbria University case study
Northumbria University case studyNorthumbria University case study
Northumbria University case study
 
Emerging researchers slideshow jv r -7-fonts
Emerging researchers slideshow   jv r -7-fontsEmerging researchers slideshow   jv r -7-fonts
Emerging researchers slideshow jv r -7-fonts
 
Data in the research process: a funder's perspective – Mark Thorley, National...
Data in the research process: a funder's perspective – Mark Thorley, National...Data in the research process: a funder's perspective – Mark Thorley, National...
Data in the research process: a funder's perspective – Mark Thorley, National...
 
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...
 
L cwebinar toon_feb26-2015
L cwebinar toon_feb26-2015L cwebinar toon_feb26-2015
L cwebinar toon_feb26-2015
 
Shared research data management service
Shared research data management serviceShared research data management service
Shared research data management service
 
20160414 23 Research Data Things
20160414 23 Research Data Things20160414 23 Research Data Things
20160414 23 Research Data Things
 

Más de Kevin Ashley

RISE - the DCC's Research Infrastructure Self-Evaluation Framework
RISE - the DCC's Research Infrastructure Self-Evaluation FrameworkRISE - the DCC's Research Infrastructure Self-Evaluation Framework
RISE - the DCC's Research Infrastructure Self-Evaluation FrameworkKevin Ashley
 
An analysis of open data and open science policies in Europe - a SPARCEurope ...
An analysis of open data and open science policies in Europe - a SPARCEurope ...An analysis of open data and open science policies in Europe - a SPARCEurope ...
An analysis of open data and open science policies in Europe - a SPARCEurope ...Kevin Ashley
 
Data Quality and Data Curation - a personal view
Data Quality and Data Curation - a personal viewData Quality and Data Curation - a personal view
Data Quality and Data Curation - a personal viewKevin Ashley
 
Data and the webmanager
Data and the webmanagerData and the webmanager
Data and the webmanagerKevin Ashley
 
Research Data Management: the UK national change programme (Nordbib)
Research Data Management: the UK national change programme (Nordbib)Research Data Management: the UK national change programme (Nordbib)
Research Data Management: the UK national change programme (Nordbib)Kevin Ashley
 
Missing links closing talk - with notes
Missing links closing talk - with notesMissing links closing talk - with notes
Missing links closing talk - with notesKevin Ashley
 
What can the DCC do for you? Sheffield Roadshow
What can the DCC do for you? Sheffield RoadshowWhat can the DCC do for you? Sheffield Roadshow
What can the DCC do for you? Sheffield RoadshowKevin Ashley
 
Audit and outsourcing: their role in creating interoperable repository infras...
Audit and outsourcing: their role in creating interoperable repository infras...Audit and outsourcing: their role in creating interoperable repository infras...
Audit and outsourcing: their role in creating interoperable repository infras...Kevin Ashley
 
JISC repositories and preservation programme: Plenary presentation 2009
JISC repositories and preservation programme: Plenary presentation 2009JISC repositories and preservation programme: Plenary presentation 2009
JISC repositories and preservation programme: Plenary presentation 2009Kevin Ashley
 
Digital Curation: gaps and challenges
Digital Curation: gaps and challengesDigital Curation: gaps and challenges
Digital Curation: gaps and challengesKevin Ashley
 
ipres2008: the Digital Preservation Training Programme
ipres2008: the Digital Preservation Training Programmeipres2008: the Digital Preservation Training Programme
ipres2008: the Digital Preservation Training ProgrammeKevin Ashley
 

Más de Kevin Ashley (11)

RISE - the DCC's Research Infrastructure Self-Evaluation Framework
RISE - the DCC's Research Infrastructure Self-Evaluation FrameworkRISE - the DCC's Research Infrastructure Self-Evaluation Framework
RISE - the DCC's Research Infrastructure Self-Evaluation Framework
 
An analysis of open data and open science policies in Europe - a SPARCEurope ...
An analysis of open data and open science policies in Europe - a SPARCEurope ...An analysis of open data and open science policies in Europe - a SPARCEurope ...
An analysis of open data and open science policies in Europe - a SPARCEurope ...
 
Data Quality and Data Curation - a personal view
Data Quality and Data Curation - a personal viewData Quality and Data Curation - a personal view
Data Quality and Data Curation - a personal view
 
Data and the webmanager
Data and the webmanagerData and the webmanager
Data and the webmanager
 
Research Data Management: the UK national change programme (Nordbib)
Research Data Management: the UK national change programme (Nordbib)Research Data Management: the UK national change programme (Nordbib)
Research Data Management: the UK national change programme (Nordbib)
 
Missing links closing talk - with notes
Missing links closing talk - with notesMissing links closing talk - with notes
Missing links closing talk - with notes
 
What can the DCC do for you? Sheffield Roadshow
What can the DCC do for you? Sheffield RoadshowWhat can the DCC do for you? Sheffield Roadshow
What can the DCC do for you? Sheffield Roadshow
 
Audit and outsourcing: their role in creating interoperable repository infras...
Audit and outsourcing: their role in creating interoperable repository infras...Audit and outsourcing: their role in creating interoperable repository infras...
Audit and outsourcing: their role in creating interoperable repository infras...
 
JISC repositories and preservation programme: Plenary presentation 2009
JISC repositories and preservation programme: Plenary presentation 2009JISC repositories and preservation programme: Plenary presentation 2009
JISC repositories and preservation programme: Plenary presentation 2009
 
Digital Curation: gaps and challenges
Digital Curation: gaps and challengesDigital Curation: gaps and challenges
Digital Curation: gaps and challenges
 
ipres2008: the Digital Preservation Training Programme
ipres2008: the Digital Preservation Training Programmeipres2008: the Digital Preservation Training Programme
ipres2008: the Digital Preservation Training Programme
 

Último

Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdftheeltifs
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制vexqp
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schscnajjemba
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........EfruzAsilolu
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样wsppdmt
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxParas Gupta
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制vexqp
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制vexqp
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制vexqp
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATIONLakpaYanziSherpa
 

Último (20)

Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdf
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 

My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

  • 1. My Data, Our Data, Your Data: data reuse through data management Kevin Ashley Digital Curation Centre www.dcc.ac.uk @kevingashley Kevin.ashley@ed.ac.uk Reusable with attribution: CC-BY The DCC is supported by Jisc
  • 2. A summary • Why data reuse ? • What stops us ? • How data management helps • Harmonising the goals of research administration and research • Barriers again • The case for reuse - again 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 2
  • 3. My home – the DCC • Mission – to increase capability and capacity for research data services in UK institutions • Not just a UK problem – an international one • Training, shared services, guidance, policy, standards, futures 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 3
  • 4. 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 4 What is data curation ? • “Maintaining, preserving and adding value to research data throughout its lifecycle” • More than preservation: – Active management – dealing with change • Less than preservation: – Lifecycle sometimes involves destruction
  • 5. DCC guidance 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 5
  • 6. 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 6 SWEDEN DENMARK CANADA
  • 7. Data reuse stories • The palaeontologist who saved years of work with archaeological data 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 7
  • 8. What a paleontologist looks at 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 8 Now 100 million years ago 25m 50m 75m 1m
  • 9. What a paleontologist looks at 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 9 Now 100 million years ago 25m 50m 75m 1mNow 1 million years 750,000500,000100,000
  • 10. What an archaeologist looks at 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 10 Now 1 million years 750,000500,000100,000 100,000 years ago 75,000 50,00025,000
  • 11. Data reuse stories • The palaeontologist who saved years of work with archaeological data • The 19th-century ships logs that help us model climate change 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 11
  • 12. 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 12 The Old weather project Data for research, not from research
  • 13. 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 13
  • 14. Data reuse stories • The palaeontologist who saved years of work with archaeological data • The 19th-century ships logs that help us model climate change • The ‘noise’ from research radar that mapped dust from Eyjafjallajökull 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 14
  • 15. Data reuse - messages 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 15 Often your data tells stories that your publications do not Not all data comes from other researchers One person’s noise is another person’s signal Discipline-bounded data discovery doesn’t give us all we need or want
  • 16. 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 16 Why care? • Data is expensive – an investment • Reuse: – More research – Teaching & Learning – Planning • Impact – with or without publication • Accountability • Legal & regulatory requirements
  • 17. Why does this matter? • Research quality – How close can we get to the truth? • Research speed – How quickly can we get to the truth? • Research finance – How much does the truth cost? • Improving one or more of these is of interest to all actors: • Researchers as data creators • Researchers as data reusers • Research institutions • Funders – hence government and society 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 17
  • 18. G8UK - Endorses OA Open Data Charter Policy Paper 18 June 2013 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 18 G8UK - Billigt offenen Zugang Eine offene Daten Charter Strategiepapier.
  • 19. Funder requirements • UK • USA – NSF, NEH, NIH • Europe • Most place burden on researcher – some on the institution 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 19 http://www.epsrc.ac.uk/about/standards/researchdata/Pages/policyframework.aspx
  • 20. RCUK policy - The 1-minute version • Research data are a public good – make openly available in timely & responsible way • Have policies & plans. Data with long-term value should be preserved & usable • Metadata for discovery & reuse. Link publications & data • Sometimes law, ethics get in the way. We understand. • Limited embargos OK. Recognition is important – always cite data sources • OK to use public money to do this. Do it efficiently. 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 20
  • 21. EPSRC policy points • Awareness of regulatory environment • Data access statement • Policies and processes • Data storage • Structured metadata descriptions • DOIs for data • Securely preserved for a minimum of 10 years from last use 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY Compliance expected by 2015
  • 22. 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 22 DCC Policy Summary http://www.dcc.ac.uk/resources/policy-and-legal
  • 23. 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 23 Findable, citable data has value • Important to link publications to data (and vice versa) • Increases citations – of data & publication • Increases reuse (hence value) • But effects exist even without publication, if data is: – Archived – Citable – Discoverable MORAL: build a data registry
  • 24. What stops data reuse • Loss • Destruction • Pride • Gluttony • Ineptitude • Concealment • Bureaucracy • Complexity • Procrastination • Lack of potential 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 24
  • 25. Kevin Ashley – Eurocris2014 - CC-BY 25 “Departments don’t have guidelines or norms for personal back-up and researcher procedure, knowledge and diligence varies tremendously. Many have experienced moderate to catastrophic data loss” Incremental Project Report, June 2010 http://www.flickr.com/photos/mattimattila/3003324844/ 2014-05-14
  • 26. What stops data reuse • Loss • Destruction • Pride • Gluttony • Ineptitude • Concealment • Bureaucracy • Complexity • Procrastination • Lack of potential 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 26
  • 27. How people talk about data • I put my data in figshare and I got a DOI for it • Not our data; the university’s data; my funder’s data; the data; the people’s data; your data. 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 27
  • 28. Data ownership – it’s messy • You need ownership to make data free • Governments may assert this • Industrial collaborators – understanding role of public funding • Research admin tracks the rules 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 28
  • 29. ON METADATA 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 29
  • 30. Disciplines – current state • Typically specialised • Focussed on discipline-specific concerns • Frequently embedded – hence processing required to expose independently • Historic failure to express generic concepts generically – Place – Time 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 30
  • 31. 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 31
  • 32. 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 32 Understanding Data Requirements http://www.dcc.ac.uk/
  • 33. 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 33
  • 34. Data centres are good value! • See Jisc reports on ADS, BADC, UKDA: • Returns on investment between 400% and 1200% 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 34
  • 35. 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 35
  • 36. Integrity • Not everyone publishes here • Almost all fraud connected to unavailable data • People suffer & die due to research fraud • When your research is reproducible – it gets cited 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 36
  • 37. Integrity – not without data • Cyril Burt – Twin studies on intelligence. – Questioned 1976; now discredited • Duke case – Data hiding leads to wasted treatments, clinical trials, probable death & huge lawsuits • Dutch cases – Stapel – 55 publications – “fictitious data” – Poldermans – fabricated data or negligence? 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 37 “The case for open data: the Duke Clinical Trials “– blog post, Kevin Ashley, http://www.dcc.ac.uk/news/case-open-data-duke-clinical-trials “Lies, Damned Lies and Research Data: Can Data Sharing Prevent Data Fraud?” – Doorn, Dillo, van Horik, IJDC 8(1); doi:10.2218/ijdc.v8i1.256
  • 38. Citability • Making data available increases citations • Everyone – academic, funder, institution – loves citations • Want evidence? – Alter, Pienta, Lyle – 240%, social sciences * – Piwowar, Vision – 9% (microarray data)† – Henneken, Accomazzi – 20% (astronomy) # 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 38 † Piwowar H, Vision TJ. (2013) Data reuse & the open data citation advantage. PeerJ PrePrints 1:e1v1 http://dx.doi.org/10.7287/peerj.preprints.1v1 * Amy Pienta, George Alter, Jared Lyle, (2010) The Enduring Value of Social Science Research: The Use and Reuse of Primary Research Data. http://hdl.handle.net/2027.42/78307 # Edwin Henneken, Alberto Accomazzi, (2011) Linking to Data - Effect on Citation Rates in Astronomy. http://arxiv.org/abs/1111.3618
  • 39. 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 39 How to cite data What data to keep
  • 40. The Data Deluge is upon us 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 40 Sensor’s ability to produce data outstrips IT’s ability to process it
  • 41. 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 41
  • 42. Roles and Responsibilities What data to keep 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 42
  • 43. Excuses – and responses • “People will ask questions” – So use a data centre or repository • “It will be misinterpreted” – Stuff happens. Also, openness encourages correction • “It’s not interesting” – Let others be the judge – your noise is my signal • “I might get another paper out of it” – Up to a point. We might get more research out of it • “I don’t have permission” – A real problem. But solvable at senior level • “It’s too bad/complicated” –see above • “It’s not a priority” – Unfortunately, funders are making it so. But if you looked at the evidence, it would be your priority as well 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 43 See e.g. Carly Strasser’s blog: http://datapub.cdlib.org/2013/04/24/closed-data-excuses-excuses/
  • 44. Should all data be open? • NO • Many reasons – most to do with human subjects • But data existence should always be open • Allows discovery & negotiation on use • Avoids pointless replication 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 44
  • 45. Kevin Ashley – Eurocris2014 - CC-BY 45 Some conundrums • Releasing genome data is OK when it’s: – An identified human subject – An anonymous human subject – Your pet dog – Another mammal – An insect – A plant – A virus 2014-05-14
  • 46. It’s amazing what people will share… 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 46
  • 47. Data reuse from Hubble 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 47
  • 48. 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 48
  • 49. Pimp your data – make it findable & reusable 2014-04-25 Kevin Ashley, DCC – SocSciScot14 - CC-BY 49 Gking.harvard.edu/data
  • 50. Data is variable • Not always textual • Not always tabular • Not always fixed – continual change • Not always clearly authored – think of archival provenance • Not always associated with publication • Often with indistinct boundaries • Multi-dimensional and non-linear 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 50
  • 51. Some messages for you • Some things we need to know about data: – When/where/what is it about? – Who owns it – What rights apply – What it is derived from & how – What software may be associated – What data management plan applies – How do I gain access ? – Where is it ? – When was/will it be destroyed? 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 51
  • 52. What about your data? • If administrative data isn’t freely available, why not? • Expose it in bulk – not just as a web page • Gain the value from your overheads! 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 52
  • 53. What about collaboration? • Collaborate within the university • Collaborate with partners • Collaborate with regional, national services • Not everything can be done well locally • Some examples… 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 53
  • 54. http://dataintelligence.3tu.nl/en/home/ Choice of RDM training materials for librarians Up-skilling for data http://datalib.edina.ac.uk/mantra/libtraining.html 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 54
  • 55. My message to researchers • The credit belongs to you • The data belongs to all of us • Share, and we all reap the benefits 2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 55

Notas del editor

  1. This is a keynote delivered at CRIS2014 in Rome, 2014-05-14
  2. This is an outline of what I’ll be talking about. I hope to persuade you of the value of data reuse and. Having done that, to examine why we sometimes find it difficult. I’ll look at how we can overcome these barriers and what you as research administrators can do. I’lll then return to my opening themes in reverse order. Returning to the themes in this way means that this talk follows sonata form, the classic structure of the first movement of a symphony. I don’t claim my talk will be as beautiful but I do hope it will be enlightening.
  3. I’m from an organisation in the UK called the digital curation centre. We are funded to help UK universities improve their own research data management practices. But that task is international in nature so we do much work in collaboration with others outside the UK.
  4. Data curation is an odd term, combining vocabulary from the world of museums and science. It doesn’t always transate well to languages other than English. So this is an attempt to define what it means. It’s about preservation, but it’s more than that. It involves destruction and it involves adding value.
  5. There are many such stories of unexpected data reuse; these are a few examples. The last, exemplified in the Old Weather project, is seeing the original data being reused for at least the third time and in doing so is helping both climatologist and family historians through a single piece of transcription work. An impressive result.
  6. There are many such stories of unexpected data reuse; these are a few examples. The last, exemplified in the Old Weather project, is seeing the original data being reused for at least the third time and in doing so is helping both climatologist and family historians through a single piece of transcription work. An impressive result.
  7. There are many such stories of unexpected data reuse; these are a few examples. The last, exemplified in the Old Weather project, is seeing the original data being reused for at least the third time and in doing so is helping both climatologist and family historians through a single piece of transcription work. An impressive result.
  8. There are many such stories of unexpected data reuse; these are a few examples. The last, exemplified in the Old Weather project, is seeing the original data being reused for at least the third time and in doing so is helping both climatologist and family historians through a single piece of transcription work. An impressive result.
  9. For an audience such as this, I shouldn’t have to explain why data reuse is important. But just in case, and to explain why some things have happened the way they have, I’ll describe some of the drivers.Ensuring that all research data is discoverable and reusable increases the quality of the research that we do. It can add to the data we collect ourselves and can improve the statistical rigour of our results. Exposing data to scrutiny makes it more straightforward to validate or challenge the findings of others.Making data available also improves the speed with which we can do research. If someone else has already gathered the data we need (perhaps for a different end use), we can move directly to the analysis stage of our work, saving both time and money.And saving money increases the efficiency of research. We hope that the money saved lets us do more research, but even if it doesn’t society as a whole will gain. There’s evidence behind this that I’ll come to later, but it is an effective counter to those in some universities who feel that increasing funder requirements for data management simply leads to additional costs with no gain. There is a gain in all these areas, and hence every one of the actors – researchers, their employers, their funders, and society, should be motivated to make this happen.
  10. Mu ch as I enjoy the JIR, it isn’t the publication most of us aim for. But it brings home one compelling argument for making data available, that of research integrity. Almost all fraud, and other less clear-cut cases of bad research, can be associated with the unavailability of research data. There are real consequences, including human suffering and death – of which more later. And did I mention that making your data available makes it more likely to be cited? Don’t worry, I will again.
  11. These are just a few examples, some of outright fraud and others of simply dodgy research all of which would have been uncovered far more quickly had the data been made routinely available. The Duke case in particular roused the suspicions of many in the field but took many years to get to the bottom of because data was locked away. It is just one example of a set of practices described very clearly by Ben Goldacre in ‘Bad Pharma’. Missing data is the largest section in his book, although he has other justified concerns with research relating to medial treatments. It has led to a global movement to ensure that all clinical trial data is made available. But medicine is by no means the only area affected.
  12. Did I mention that making data available increases citations? This is a win all round. If you don’t believe me, here are three studies from three different areas that all show robust, positive correlations. The effect size varies with discipline, but we have enough evidence now that anyone who says that their area is different needs to come up with evidence to show why.
  13. Getting better at managing research data isn’t just about keeping more stuff for longer. It’s also about being more selective about what we do keep and documenting the decision-making process that we use. Reports such as this make clear that technological advances means that the cost of producing data is dropping more rapidly than the cost of retention. Some arguments show that if we attempt to retain everything it won’t be long before we’re spending the entire GDP purely on data storage. That’s an extreme analysis, but the problem is real as CERN know well. In some disciplines it really is wiser to just generate the data again when it is needed. But for many observational disciplines, that opportunity isn’t open to us.
  14. Yet some researchers still aren’t convinced by the rhetoric. Carly Strasser at CDL has listed some of the reasons for not sharing data that she’s encountered – and here are some of my one-line responses. I’m not saying that the concerns aren’t sincere or reasonable but they can all be dealt with and some are positively misguided. The purpose of data centres, for instance, is to make data independently reusable (as stated in the OAIS standard) which relieves researchers of the burden of dealing with questions about it, at the same time as increasing the likelihood that their data will be cited.
  15. Medicine does, however, provide some clear reasons why we can’t just stick all research data on the internet for anyone to trawl through. When human subjects are involved there are real concerns about confidentiality. Yet what alltrials.net and other initiatives make clear is that the *existence* of the data should never be hidden. That allows it to be discovered and for negotiations to take place about its use. It avoids costly replication, which can delay scientific discovery and involve human suffering when the replication takes the form of a clinical trial.
  16. Many of you may be familiar with this graph from the Hubble Space telescope data archive. It tells the same story in a different way, and also tells a story about the transformation of astronomy as a discipline. In the days of photographic plates, sharing (analogue) astronomical data was difficult. Digital instruments transformed this, and some time around 2000, more research was being done with old data than with new data. I could be more specific about this if the data behind this graph was made available, incidentally!