Philip Bourne discusses the opportunities for data science in addressing diabetes. Data science involves using diverse digital data to ask and answer relevant questions, arriving at statistically significant conclusions not otherwise possible. It also involves sharing findings in a way that can improve lives. Diabetes is well-suited for data science approaches due to increasing data from genomics, wearables, electronic health records, and predictive modeling successes. However, data science must be done carefully with input from experts to account for confounders and ensure accurate outcomes for complex health issues like diabetes.
1. Diabetes Data Science
Philip E. Bourne PhD, FACMI
Stephenson Chair of Data Science
Director, Data Science Institute
Professor of Biomedical Engineering
peb6a@virginia.edu
https://www.slideshare.net/pebourne
1
@pebourne
American Diabetes Association, June 23, 2018, Orlando
2. I declare no conflicts of interest …
I am an open science advocate and you can take
all the photos you want (being courtious to others)
...
The slides are all on slideshare in any case
2
3. I am not a diabetes researcher...
I am a computational biologist come data
scientist interested in helping address diabetes
where I see lots of opportunities
3
4. So What is Data Science?
4
http://vadlo.com/cartoons.php?id=357
Data science is like the Internet…
If I asked you to define it, you would all say
something different, yet you use it every day…
5. So What do I Mean by Data Science?
• Use of the ever increasing amount of open, complex, diverse digital
data
• Finding ways to ask and then answer relevant questions by
combining such diverse data sets
• Arriving at statistically significant conclusions not otherwise
obtainable
• Sharing such findings in a useful way
• Translating such findings into actions that improve the human
condition
5
6. If you don’t listen to me listen to:
The NIH Strategic Plan for Data
• Support a Highly Efficient and Effective Biomedical Research Data
Infrastructure
• Promote Modernization of the Data-Resources Ecosystem
• Support the Development and Dissemination of Advanced Data
Management, Analytics, and Visualization Tools
• Enhance Workforce Development for Biomedical Data Science
• Enact Appropriate Policies to Promote Stewardship and
Sustainability
6https://grants.nih.gov/grants/rfi/NIH-Strategic-Plan-for-Data-Science.pdf
7. Why Now? Drivers of Change
• Generic
• There are ~2.7 Zetabytes (2.7 x 106 PB) of digital data
• Training data is doubling every two years
• Robust and reusable tools in Python and R
• More advanced tools e.g., Deep Artificial Neural Networks (DNNs)
• New computing power e.g., GPUs, the cloud
• Advances coming from the private sector NOT academia
• Successful integration into workflows & lifestyles – analytics companies
• Diabetes specific
• $1000 genome
• Wearable sensors
• Mandatory EHRs
• “Success” in predictive modelling
7
Pastur-Romay et al. 2016 doi:10.3390/ijms17081313
8. Mapping Diabetes to the 5 Pillars of Data Science
8
Data Integration
& Engineering
Machine Learning
& Analytics
Visualization
& Dissemination
Data Acquisition Ethics, Law,
Policy,
Social Implications
9. Mapping Diabetes to the 5 Pillars of Data Science
9
Data Integration
& Engineering
Machine Learning
& Analytics
Visualization
& Dissemination
Data Acquisition Ethics, Law,
Policy,
Social Implications
10. Global
Treatment
Ecosystem
Virtual Image
of the Patient
(VIP)
Patient Profile;
Analytics
Treatment
& Control
Predictive Analysis
Database
Add Genotype,
Medical Record
Local Treatment
Ecosystem:
Real-time data;
Predictive analytics;
Artificial Pancreas
[Adapted from Boris Kovatchev]
Screening
Hypoglycemia
Insulin associated weight gain
Retinopathy
Neuropathy
Nephropathy
Heart disease
Cichosz et al 2016 J Diabetes Sci & Tech 10(1) 27-34
10
11. Prediction – Image Recognition
• Google Diabetic Retinopathy– Prediction based of training from 120,000
images classified by 54 ophthalmologists
• Prediction maps inputs (image of the retina) to outputs (a diagnosis of
retinopathy) in a closed system – does not consider confounders eg if the
retina had been operated on
• All the required information is in the data
• Researchers concluded that the algorithm’s performance was in line with
board-certified ophthalmologists and retinal specialists
11Krause et al. https://doi.org/10.1016/j.ophtha.2018.01.034
12. Image Recognition - Convolutional Neural Networks
Convolutional
Layers
Max Pooling
Layers
• Down sampling while maintaining key features
• “Convolute” discovers the feature where ever it may reside in the image
12
14. A Note of Caution
14
Predictive ability overemphasizes what is possible in
healthcare …
There are many confounders …
Does enough expert knowledge (itself biased) in a complex
system built into the algorithm provide accurate outcomes?
15. The Birthweight Paradox
• What is the causal effect of smoking during pregnancy?
• Confounders – alcohol consumption, diet, prenatal care
• Need to adjust for cofounders e.g. birth weight
• BUT birth weight is associated with infant mortality and
maternal smoking – introduces bias
• Lower birth weight babies from mothers who smoked
during pregnancy leads to lower mortality
15
18. In Summary
• Data science will have an increasing impact on diabetes
research
• Data scientists & experts need to work together
• Acceptance begins with getting clinicians on-board at the
start of the study
• Education in these new approaches is desperately needed
• Bioethical data science training is part of that education even
though policy and law are not keeping pace
18
Editor's Notes
CNN - takes small regions and condenses into one value
16 million hospital inpatient events (24.5% of total), 35 million outpatient clinic events (53.6% of total) and 14 million emergency
department events (21.9% of total