The document discusses several key points regarding climate change research data and integrity:
1) Climate change research faces intense scrutiny given its high stakes, making data integrity especially important.
2) While improved data management may not have changed criticisms of a controversial email leak, the climate science community should still improve data sharing and linking publications to underlying data and analyses.
3) Doing so supports reproducibility, research integrity, further exploration of findings, and data reuse, helping inform important decisions around climate change. However, challenges like data volumes, disclosure agreements, resources, and unclear benefits must also be considered.
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Ā
Tim Osborn: Research Integrity: Integrity of the published record
1. Climate research data and research integrity Dr Tim Osborn Climatic Research Unit School of Environmental Sciences University of East Anglia JISC Research Integrity Conference: the Importance of Good Data Management 13 September 2011
26. 14/09/11 Wellcome Collection Conference Centre, 13 September 2011 slide Research Integrity Conference The importance of good data management
Editor's Notes
Although interested in the issues of research data management, itās not a primary area for me Consequently my contribution is rather narrower and discipline-specific than others, because thatās what Iām familiar with. Nevertheless I hope it is still a useful ācase studyā that can inform the wider debate
I was asked to talk about why the integrity of the published record has become important for climate research. This might imply that it hasnāt always been important or that it is more important for climate research than for some other disciplines, neither of which is true. Nevertheless, what are the features of the current global warming issue that might pick it out for special attention? Pushing scientific knowledge to its limits (and beyond current limits). Pace of scientific advancement needs to be maximised without harming integrity. Inextricably linked to development of poorer countries, economic and demographic growth, inequalities, etc. Say more about the high stakes next Intense scrutiny is good, we need more of it, it will improve scientific knowledge and understanding
Itās worth considering the controversy that followed the hacking of emails and other documents from the Climatic Research Unit and its relevance to the relationship between research data management and the integrity of the published research record. From the scientific community, integrity of our research was questioned very little. Overwhelmingly the community was supportive. But nevertheless there was widespread questioning (at least initially) in the main stream media and this was influenced also by relatively few but vociferous critics, and helped by various critics and/or vested interests willing to communicate their viewpoints. Itās important to distinguish between perceptions of integrity and actual integrity. As confirmed by the Muir-Russell review, for example, we did not destroy raw data; we did not manipulate data inappropriately or with the intention of obtaining a pre-determined outcome; we recorded, justified and published the few data adjustments that were made so that others could understand and test them; and third parties had access to the data and other materials necessary to reproduce our published research. Yet this isnāt how things were portrayed. My view (and others Iāve asked) is that the management of our research data played only a small role. The hacked emails and their interpretation were much more significant.
But there isnāt room for complacency. Although issues relating to research data may have played little role, there were areas of valid criticism and this made it harder to defend our integrity ā especially in the cross-over of criticism into the mainstream media. Rather than analysing this interaction further, it is better to identify the improvements that can ā and should ā be made and move on to how best this can be done. These ideas apply, of course, to the whole climate science community, not specifically to the Climatic Research Unit. Overall, various aspects of data management can be improved, supporting more transparency and openness, more re-use of data, and unambiguous links between published findings and the data on which they have been built. We have begun to investigate some of these issues, for example in our JISC-funded project āACRIDā, though I canāt go into the details of that today.
Here I've listed important outcomes of improved data management and sharing by the climate science community. First, supporting the reproducibility and thus integrity of published research Second, exploring beyond published findings to assess their robustness to changes in data, methods, assumptions, and so on. Third, facilitating more re-use of existing data sets. Despite having an estimated 10,000 terabytes of data about the climate system, it is so complex that we need to maximise our use of available datasets.
But there are some challenges to consider. The volume of research is considerable, with an estimated publication rate approaching 15,000 articles related to climate change each year. Publishing and sharing the data and scientific workflows associated with each one is a significant task ā is it necessary?
The volume of data is also very large, and projected to grow rapidly, with a big expansion of data simulated by climate models and, particularly, associated with remote sensing instruments mounted on satellites and ground radar. The small āin situ/otherā bars, although a much smaller volume, are very disparate and therefore present their own considerable challenge.
As well as challenges, there are also limitations. Some of these issues are necessary products of the adversarial system in which research (and publication and funding) is carried out. Some data cannot be published openly, some data producers want open publication delayed so they can exploit their data further. Then there are the resources ā time and money ā needed to improve data sharing and publication.
Many datasets are truly open, or are open for non-commercial research. But some are subject to real non-disclosure agreements. For example most data observed at UK weather stations is subject to this agreement, which is treated very seriously. I've used this data in many publications ā even winning a medal from the Royal Meteorological Society for some of this work ā but I can't share the raw data that I used. I can't even keep a copy myself to ensure that I can prove my work is repeatable. I could I suppose download a new copy in future years to demonstrate that it is repeatable, but can I guarantee that the data in the data centre won't have been changed (quality controlled, updated, etc.)? No!
Here is another example. The most widely used basis for analysing changes in global-scale patterns of precipitation is constructed from weather station data that cannot be disclosed. These non-disclosure agreements are real ā but need to be phased out somehow.
Informal agreements between colleagues sharing data are also genuine, and the consequence of breaching the trust established between colleagues is too.
Traditionally, climate data aren't published in their own right, but rather as part of an article that analyses the data and reports the findings. The scientists gain citations and credit, but it can take years in some cases ā first amassing sufficient results to represent publishable critical mass, and then negotiating the peer-review system.
The main co-benefits are probably not linked to the ability to demonstrate that published work can be replicated. Providing data (and other materials) with a publication, perhaps as supplementary online materials which many journals now allow, is certainly a useful option but in many cases it isn't seen as a sufficient benefit in itself. For how many of the 13,000 climate change articles published per year would this supplementary material actually be looked at and used? Again, it must be āsoldā to the scientist by other benefits. There are also limitations in how useful this route of supplementary material is, in relation to being able to cite specific datasets, finding them for re-use. Another concern is the proliferation of multiple copies of a dataset. Are they identical, or subtly different? How to tell? Better is to provide a unique identifier and address to existing data that were used in the article, rather than a copy of the data.
There are new opportunities to publish datasets in their own right, rather than as part of scientific articles. Although meta-data and other accompanying information must also be provided, it is still a smaller task than completing a study and associated paper, and the peer review can also be much lighter touch. The lag from data collection to publication could be reduced. The datasets are citable, allowing due credit and encouraging more scientists to follow this route, and need to be uniquely identifiable over a long period of time.
In terms of the location of long-term data publishing and archiving, the publishers of scientific journals do not seem to be ideal. Yes, it does help provide a very strong link between a published article and the data used. But publishers ā especially commercial rather than academic ā are not the place to guarantee long-term preservation, due to the commercial realities in which they operate. Cannot guarantee the longevity of a journal and its associated archive of data and materials. Unlike the journal article, are the supplementary materials archived (e.g. by British Library) and easily located? Publishers ā and also individual university archives ā may also not provide the functionality that dedicated data centres can provide, related to tools and search engines. Similarly they aren't ideal for supporting data re-use if scattered across hundreds of institutions and/or journals. Much better are existing data centres, especially if they are dedicated to specific disciplines and have a mandate to support the scientific community by provided long-term archiving.
In my experience, the more specific (e.g. subdiscipline) the better. For example the World Data Centre for Paleoclimatology does well by splitting their archives according to subdiscipline ā e.g. the International Tree-Ring Data Bank. Why? Well the more generalised these are, the more complex the data and meta-data model becomes, and formats tailored to the needs of specific cases are harder to cater for. There is a steeper barrier, some aspects may appear to be irrelevant, and the objective of encouraging greater data sharing is not met.
The stakes are particularly high but the context in which decisions must be made is very difficult, without past precedents to know what the most optimal route to follow is. If, as some suggest, it were easy and cheap to reduced greenhouse gas emissions and if the impacts of not doing so were very damaging, the best policy would be obvious. If, as others contend, the situation is reverse, the best policy (ādo nothingā) is also easily chosen. In reality, the context is much harder. Taking action is not easy or cheap. The net effects could be very serious. But they might not be. Or they might be serious for some but not for everyone. The net impact of climate change is very uncertain ā and the uncertainty range includes some changes that are not just economically damaging and could be beyond what we can adapt to.
There is a significant cost involved, and increasing its value in re-use means spending more time in publishing the data ā meeting standards for data and meta-data, and providing other materials But the ācostā is not simply a matter of funding. Though time = money, if you give an academic more funding to cover a task that doesnāt have an obvious benefit, they will still be reluctant to use the funding for that task. The solution is to focus on the co-benefits of committing this time and these resources.
Go to āViewā menu > āMasterā > āSlide Masterā to edit the titles on this slide Go to āViewā menu > āHeader and Footerā¦ā to edit the footers on this slide (click āApplyā to change only the currently selected slide, or āApply to Allā to change the footers on all slides.