Scaling API-first – The story of a global engineering organization
Open data for open scholarship - where we are
1. Open data for open scholarship:
where are we?
Kevin Ashley
Digital Curation Centre
www.dcc.ac.uk
@kevingashley
Kevin.ashley@ed.ac.uk
Reusable with attribution: CC-BY The DCC is supported by Jisc
2. My home – the DCC
• Mission – to
increase capability
and capacity for
research data
services in UK
institutions
• Not just a UK
problem – an
international one
• Training, shared
services, guidance,
policy, standards,
futures
2014-10-07 Kevin Ashley –Confoa-2014 - CC-BY 2
3. Before where -
WHY?
2014-10-07 Kevin Ashley –Confoa-2014 - CC-BY 3
4. Data reuse stories
• The palaeontologist who saved years of work
with archaeological data
2014-10-07 Kevin Ashley –Confoa-2014 - CC-BY 4
5. What a paleontologist looks at
1m
Now
100 million
years ago
25m
50m 75m
2014-10-07 Kevin Ashley –Confoa-2014 - CC-BY 5
6. What a paleontologist looks at
100,000 500,000 750,000
Now 1m 1 million
Now
years
100 million
years ago
25m
50m 75m
2014-10-07 Kevin Ashley –Confoa-2014 - CC-BY 6
7. What an archaeologist looks at
100,000 500,000 750,000
Now 1 million
years
100,000
years ago
75,000
25,000 50,000
2014-10-07 Kevin Ashley –Confoa-2014 - CC-BY 7
8. Data reuse stories
• The palaeontologist who saved years of work
with archaeological data
• The 19th-century ships logs that help us model
climate change
2014-10-07 Kevin Ashley –Confoa-2014 - CC-BY 8
9. The Old
weather
project
Data for
research,
not from
research
2014-10-07 Kevin Ashley –Confoa-2014 - CC-BY 9
10. Data reuse stories
• The palaeontologist who saved years of work
with archaeological data
• The 19th-century ships logs that help us model
climate change
• The ‘noise’ from research radar that mapped
dust from Eyjafjallajökull
2014-10-07 Kevin Ashley –Confoa-2014 - CC-BY 10
11. Data reuse - messages
Often your data tells
stories that your
publications do not
Not all data comes from
other researchers
Discipline-bounded data
discovery doesn’t give us
all we need or want
One person’s noise is
another person’s signal
2014-10-07 Kevin Ashley –Confoa-2014 - CC-BY 11
12. Data reuse from Hubble
2014-10-07 Kevin Ashley –Confoa-2014 - CC-BY 12
13. G8UK - Endorses
OA
Open Data
Charter
Policy Paper
18 June 2013
2014-10-07 Kevin Ashley –Confoa-2014 - CC-BY 13
14. Why does this matter?
• Research quality
– How close can we get to
the truth?
• Research speed
– How quickly can we get
to the truth?
• Research finance
– How much does the
truth cost?
• Improving one or more
of these is of interest to
all actors:
• Researchers as data
creators
• Researchers as data
reusers
• Research institutions
• Funders – hence
government and society
2014-10-07 Kevin Ashley –Confoa-2014 - CC-BY 14
15. Open scholarship – the wider picture
• Not just about open papers, open data
• Software, methods, workflows are all
important
• Data need not be open – but its existence
must be
• Data: DISCOVERABLE & REUSABLE
2014-10-07 Kevin Ashley –Confoa-2014 - CC-BY 15
17. Funder requirements
• UK
http://www.epsrc.ac.uk/about/standards/researchdata/Pages/policyframework.aspx
• USA – NSF, NEH, NIH
• Europe
• Denmark – in development
• Most place burden on
researcher – some on the
institution
2014-10-07 Kevin Ashley –Confoa-2014 - CC-BY 17
18. RCUK policy - The 1-minute version
• Research data are a public good – make openly
available in timely & responsible way
• Have policies & plans. Data with long-term value
should be preserved & usable
• Metadata for discovery & reuse. Link publications &
data
• Sometimes law, ethics get in the way. We understand.
• Limited embargos OK. Recognition is important –
always cite data sources
• OK to use public money to do this. Do it efficiently.
2014-10-07 Kevin Ashley –Confoa-2014 - CC-BY 18
21. Research data centres are good value!
• See Jisc reports on ADS, BADC, UKDA:
• Returns on investment between 400% and
1200%
http://www.jisc.ac.uk/whatwedo/programmes/di_dir
ections/strategicdirections/badc.aspx
2014-10-07 Kevin Ashley –Confoa-2014 - CC-BY 21
22. Research Data Centres – the solution!
MANY AREAS OF
RESEARCH HAVE NO
DATA CENTRE TO
SERVE THEM
2014-10-07 Kevin Ashley –Confoa-2014 - CC-BY 22
28. Make data creation easier
2014-10-07 Kevin Ashley –Confoa-2014 - CC-BY 28
29. Make data citable
• Making data available increases citations
• Everyone – academic, funder, institution –
loves citations
• Want evidence?
– Alter, Pienta, Lyle – 240%, social sciences *
– Piwowar, Vision – 9% (microarray data)†
– Henneken, Accomazzi – 20% (astronomy) #
# Edwin Henneken, Alberto Accomazzi, (2011) Linking to Data - Effect on Citation Rates in Astronomy. http://arxiv.org/abs/1111.3618
* Amy Pienta, George Alter, Jared Lyle, (2010) The Enduring Value of Social Science Research: The Use and Reuse of Primary Research Data.
http://hdl.handle.net/2027.42/78307
† Piwowar H, Vision TJ. (2013) Data reuse & the open data citation advantage. PeerJ PrePrints 1:e1v1
http://dx.doi.org/10.7287/peerj.preprints.1v1
2014-10-07 Kevin Ashley –Confoa-2014 - CC-BY 29
30. http://dataintelligence.3tu.nl/en/home/
Choice of RDM training
materials for librarians
Up-skilling
for data
http://datalib.edina.ac.uk/mantra/libtraining.html
2014-10-07 Kevin Ashley –Confoa-2014 - CC-BY 30
31. Make data discoverable
• Data must be discoverable to be reused
• Alone, or in conjunction with publication
• Institutional catalogues, national data
registries – JISC is piloting through DCC
• We are copying Australian approach
2014-10-07 Kevin Ashley –Confoa-2014 - CC-BY 31
32. Pimp your
data –
make it
findable &
reusable
Gking.harvard.edu/data
2014-10-07 Kevin Ashley –Confoa-2014 - CC-BY 32
35. My message to you
• Help researchers understand the benefits to
them of sharing their data
• Help them discover & reuse data
• Give them tools that help the process
• Work to ensure they get credit for data
citation
2014-10-07 Kevin Ashley –Confoa-2014 - CC-BY 35
36. My message to researchers
• The credit belongs to you
• The data belongs to all of us
• Share, and we all reap the
benefits
2014-10-07 Kevin Ashley –Confoa-2014 - CC-BY 36
Notas del editor
There are many such stories of unexpected data reuse; these are a few examples. The last, exemplified in the Old Weather project, is seeing the original data being reused for at least the third time and in doing so is helping both climatologist and family historians through a single piece of transcription work. An impressive result.
There are many such stories of unexpected data reuse; these are a few examples. The last, exemplified in the Old Weather project, is seeing the original data being reused for at least the third time and in doing so is helping both climatologist and family historians through a single piece of transcription work. An impressive result.
There are many such stories of unexpected data reuse; these are a few examples. The last, exemplified in the Old Weather project, is seeing the original data being reused for at least the third time and in doing so is helping both climatologist and family historians through a single piece of transcription work. An impressive result.
There are many such stories of unexpected data reuse; these are a few examples. The last, exemplified in the Old Weather project, is seeing the original data being reused for at least the third time and in doing so is helping both climatologist and family historians through a single piece of transcription work. An impressive result.
Many of you may be familiar with this graph from the Hubble Space telescope data archive. It tells the same story in a different way, and also tells a story about the transformation of astronomy as a discipline. In the days of photographic plates, sharing (analogue) astronomical data was difficult. Digital instruments transformed this, and some time around 2000, more research was being done with old data than with new data. I could be more specific about this if the data behind this graph was made available, incidentally!
For an audience such as this, I shouldn’t have to explain why data reuse is important. But just in case, and to explain why some things have happened the way they have, I’ll describe some of the drivers.
Ensuring that all research data is discoverable and reusable increases the quality of the research that we do. It can add to the data we collect ourselves and can improve the statistical rigour of our results. Exposing data to scrutiny makes it more straightforward to validate or challenge the findings of others.
Making data available also improves the speed with which we can do research. If someone else has already gathered the data we need (perhaps for a different end use), we can move directly to the analysis stage of our work, saving both time and money.
And saving money increases the efficiency of research. We hope that the money saved lets us do more research, but even if it doesn’t society as a whole will gain. There’s evidence behind this that I’ll come to later, but it is an effective counter to those in some universities who feel that increasing funder requirements for data management simply leads to additional costs with no gain. There is a gain in all these areas, and hence every one of the actors – researchers, their employers, their funders, and society, should be motivated to make this happen.