Is Open Science Better Science?
Ewout W. Steyerberg, PhD
Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, the Netherlands
Abstract
The Open Science movement has many components, including Open Access to scientific publications, sharing of research data, and providing open source software. These components are expected to contribute to better science. In this seminar I aim to reflect on the strength and limitations of Open Science in the context of epidemiological research.
First, I note that by making research more open, the scale of research increases; this might enable addressing some research questions better. This allows us to recognize that different researchers use different scientific approaches; Open science makes that we become increasingly aware of different styles in research.
Second, we may hope to learn more about the value of modern approaches to data analysis such as machine learning. Indeed, neutral comparison studies benefit from the open availability of multiple data sets that can be analyzed with standardized approaches to the analysis, adding realism compared to analytical and simulation studies.
Third, I note that more data sharing is a positive development, especially to highlight heterogeneity between settings. In sum, I remain optimistic that open science will lead to better science, with the caveat that we recognize complexities that limit the interpretation of increasing amounts of data, such as the medical context, study design, measurement and data analysis.
These slides were presented in a series of lectures organized by Prof Marianna Huebner, June 2, 2022
Open Science Better Science? Steyerberg 2June2022.pptx
1. June, 2022
Is Open Science Better Science?
Ewout W. Steyerberg, PhD
Professor of Clinical Biostatistics and
Medical Decision Making
Thanks to many for assistance and inspiration,
including the GAP3 consortium, CENTER-TBI Study
Yes, but …
2. Open vs closed science
Long ago
- Performed by few, elitarian scientists
- Doing private experiments
- Discussion in small, closed communities
3. Probabilities to quantify uncertainty
• Christiaan Huygens 1657:
'Van rekeningh in spelen van geluck'
• Thomas Bayes 1763:
An Essay towards solving a Problem in the Doctrine of Chances”
(read to the Royal Society by Richard Price)
• Pierre Laplace 1812:
Théorie analytique des probabilités
6-Jun-22
3 Insert > Header & footer
4. Open vs closed science
Long ago
- Performed by few, elitarian scientists
- Doing private experiments
- Discussion in small, closed communities
Recent
- Science as a profession
- Protect data + code as intellectual property
- Aim for shocking findings in high IF journals
https://www.sciencemag.org/news/2020/06/whos-blame-these-three-scientists-are-heart-surgisphere-covid-19-scandal
5. Overall claim
“Open Science will make research better”
Vote pro / neutral / con
“More data is better”
Vote pro / neutral / con
6-Jun-22
5 Insert > Header & footer
6. Today
Aims:
- Highlight some strong points in Open Science
- Hint at some challenges in Open Science
Reflections based on personal 30-yr research experience,
specific focus on prediction research / decision making
6-Jun-22
6 Insert > Header & footer
8. Open science research questions: case 1
Example 1: Red cards and dark skin soccer players
https://psyarxiv.com/qkwst/
6-Jun-22
8 Insert > Header & footer
9. Open science research questions: case 1
• 29 teams involving 61 analysts; same dataset; same research question:
whether soccer referees are more likely to give red cards to dark skin
toned players than light skin toned players
• Estimated odds ratios 0.89 –2.93 (median 1.3)
• 20 teams: statistically significant positive effect, 9: non-significant relation
6-Jun-22
9 Insert > Header & footer
12. Open science research questions: case 1
• 29 teams involving 61 analysts; same dataset; same research question:
whether soccer referees are more likely to give red cards to dark skin toned
players than light skin toned players
• Estimated odds ratios 0.89 –2.93 (median 1.3).
• 20 teams: statistically significant positive effect, 9: non-significant relation.
• 21 unique combinations of covariates
• “Variation in analysis of complex data may be difficult to
avoid, even by experts with honest intentions”
6-Jun-22
12 Insert > Header & footer
13. Open science research questions: case 2
6-Jun-22
13 Insert > Header & footer
Example from Maarten van Smeden
@MaartenvSmeden
15. Findings not convincing
Cox, #4, 30 vars, max c =0.793
RF, #7, 600 vars, c=0.797
Elastic, #9, 600 vars, c=0.801
6-Jun-22
15 Insert > Header & footer
16. Machine learning vs conventional modeling
1. Findings convincing?
“We found that random forests did not outperform Cox models despite their
inherent ability to accommodate nonlinearities and interactions. …
Elastic nets achieved the highest discrimination performance …, demonstrating
the ability of regularisation to select relevant variables and optimise model
coefficients in an EHR context.”
6-Jun-22
16 Insert > Header & footer
17. Machine learning vs conventional modeling
1. Findings convincing? Not in case-study
2. Systematic / ”it depends” ?
6-Jun-22
17 Insert > Header & footer
20. Open science research questions: case 2
• 243 real datasets from “the OpenML database”
• RF performed better than LR:
mean difference between RF and LR was 0.041 (95%-CI =[0.031,0.053]) for
the Area Under the ROC Curve
• Results were dependent on the inclusion criteria used to select the example
datasets
• ES: Results rely on 10 x 10-fold cross-validation
6-Jun-22
20 Insert > Header & footer
21. Open science research questions: case 2
• More clarification needed when ML / RF works best; at least large N needed
6-Jun-22
21 Insert > Header & footer
22. Systematic review on ML vs classic modeling
6-Jun-22
22 Insert > Header & footer
25. Summary on examples of Open Science
to better address Big research questions
• 1 data set
• multiple modelers
• Multiple modeling options
• 1 neutral comparison; 243 OpenML databases
• Review of 282 comparative studies: meta-research
6-Jun-22
25 Insert > Header & footer
38. Open Science challenge:
dealing with heterogeneity for prediction research
Heterogeneity
• Study design
• Selection of subjects
• Measurement of covariates
• Measurement of outcomes
• Associations of covariates with outcome
• Overall outcome rates
• Performance of prediction models