Thinking about Data Strategy: for Ophthalmologists

Petteri Teikari, PhD
http://petteri-teikari.com/
Version “Wed 18 April 2018“
Singapore Eye Research Institute (SERI)
Visual Neurosciences group
Thinking about Data
Strategy: for
Ophthalmologists
The transformation of
healthcare via more
efficient use of data

Think of Data
Somethingworthremembering.Despitethe
hype,deeplearningalgorithmsare
commodities.It'sthedatathat'stherealvalue.
Not Artificial
Intelligence per se
Great to throw the
buzzword around, but
think AI (narrow AI,
i.e. deep learning) as a
more powerful algorithm
that will need to ingest
data

Value of Data Trask @iamtrask
iamtrask.github.io
Somethingworthremembering.Despitethe
hype,deeplearningalgorithmsare
commodities.It'sthedatathat'stherealvalue.
Well, actually, good-
quality structured
data that can be
refined to information
and knowledge with
suitable models

Structuring
Data
Commonways
Excelsheetsnotmaybe
thewaytogoforgood
qualitydatabases
ATypicalDataScienceDepartment
Most companies structuretheir datasciencedepartments into 3
groups:
Datascientists:the folks who are “better engineers than
statisticians and betterstatisticians than engineers”. Aka, “the
thinkers”.
Dataengineers:thesearethe folks who build pipelines that
feed datascientists with dataand takethe ideas from the data
scientists and implement them. Aka, “thedoers”.
Infrastructureengineers: these arethefolks who maintain
theHadoop cluster / big datainfrastructure. Aka, “theplumbers”.
http://www.kdnuggets.com/2016/03/engineers-shouldnt-wri
te-etl.html

Data vs
Models vs
Hardware
Sun et al. (2017)
https://arxiv.org/abs/1707.02968
“Analogous
to going higher in
polynomial
order”
“Better models
cannot reach
theirfull capacity
as datasets stay
small”
“Heavier models
processed in
reasonabletime,
or old ones
faster”

Preprocessing
vs
Data Engineering
vs
“The AI” part
vs
Deployment
Only a small fraction of real-world ML systems is composed of
the ML code, as shown by the small black box in the middle.
The required surrounding infrastructure is vast and complex.
Google (2016) at NIPS:
“Hidden Technical Debt in Machine Learning Systems”
Successful hospitals and labs are the
ones thatcan or want to re-design their
processes for “data-driven medicine”

Data
andthenthere is
Structured
Data
Not enoughformostofthe
applicationstohave a
bunchofimagesonahard
drive withnolabeling
(pathology, outlinesof
structuresofinterest, etc.)
Glaucomacotousfundusimagefrom
RIM-ONEr3Database (S-17-L)
http://medimrg.webs.ull.es/research/downloads/
Semantic
(image)
Segmentation
- Optic Disc
- Optic Cup
Image Classification
- Health vs
- Glaucoma severity

Labeling Data
expensiveastypically
clinicaldomain
knowledgeneeded
Image segmentationmoretime-
consumingthan pathology
labellingbut couldbe “Amazon
Turkified”forexample
Havingefficientlabeling tools
ismoreimportant thanthe “AI
modeling” ofyourdata.
Voxeleronallowsinteractive
correctionofretinallayer
segmentation(i.e.had/remove
knotsfromsplinefitting?)
https://twitter.com/voxeleron/
status/806172454657794048

Structuring
Data
Commonways
Excelsheetsnotmaybe
thewaytogoforgood
qualitydatabases
ETL (Extract, transformand load)
ATypicalDataScienceDepartment
Most companies structuretheir datasciencedepartments into 3
groups:
Datascientists:the folks who are “better engineers than
statisticians and betterstatisticians than engineers”. Aka, “the
thinkers”.
Dataengineers:thesearethe folks who build pipelines that
feed datascientists with dataand takethe ideas from the data
scientists and implement them. Aka, “thedoers”.
Infrastructureengineers: these arethefolks who maintain
theHadoop cluster / big datainfrastructure. Aka, “theplumbers”.
http://www.kdnuggets.com/2016/03/engineers-shouldnt-wri
te-etl.html

File format
Inter-
operability
Manufacturersof
imagingdevicesand
EHRprovidersof
coursetrytoresistthis
changeasitisathreat
fortheirbusinessmodel
https://youtu.be/0E121gukglE?t=26m36s

Not just AI
Gradualdestruction
ofoldprocesses
Why talkeven about
AI if even e-mailwas
an issue some years
ago in USA? Publishedon Mar 20, 2012
https://youtu.be/hF2QHevDHSw?t=50m

Changes in
Publishing
MostAIpapers
publishedinarXiV
withoutpeer-reviewed
accelerationof→
science
“We(Science)do allowposting ofresearchpapersonnot-for-profitpreprintserverssuchas arxiv.org andbioRxiv”
“Presentationofdataonapre-print serverdoesnot conflictwithsubmission toThe Lancet”
The ArXiv preprint server is the medium of choice for (mainly) physicists and astronomers who wish to share drafts of their
papers with their colleagues, and with anyone else with sufficient time and knowledge to navigate it. [...] If scientists wish to
display drafts of their research papers on an established preprint server before or during submission to Nature or any
Naturejournal,that'sfineby us."

Changes in
Reproducability
SharecodeasGithub
repository,Docker
image, http://dx.doi.org/10.1038/nj7622-703a
https://github.com/ozan-oktay/Attention-Gated-Networks

Back to topic ...
Tele-
ophthalmology
and
Self-monitoring
Reducinghealthcare
burden withimproved
userexperience forthe
patient
Fritz Kahn, 1939

Self-monitoring
Moredatapointsfrom
continuousmonitoring at
home(“Dataist”approach)

Multimodal
Data
Acquisition
Youhavestructural
and functional retinal
biomarkers.
Whataboutthe
emerging omics and
all the datafromthe
healthrecords?

Multimodal
Diagnostics
Power
Mininghospital
databasesfordisease
predictors(“withouta
hypothesis”)
In 2015, a research group at Mount Sinai Hospital in New York was inspired to
apply deep learning to the hospital’s vast database of patient records. This
data set features hundreds of variables on patients, drawn from their test results,
doctor visits, and so on. The resulting program, which the researchers named
Deep Patient, was trained using data from about 700,000 individuals, and
when testedonnew records,itprovedincrediblygoodat predictingdisease.
https://www.technologyreview.com/s/604087/the-dark-secret-at-the-
heart-of-ai/

Deep Patient
“We performed evaluation
using76,214 test patients
comprising 78 diseases
fromdiverse clinical
domains and temporal
windows.
Prediction performance
forsevere diabetes,
schizophrenia,and various
cancerswere among the
topperforming. “

Does not stop
to diagnostics
Stratify your
population,andgo
deeperinto
personalized
medicine
Throwaway your heuristicdecisiontrees
Wolfset al.(2000) IOVS forprimaryopen-angleglaucoma(POAG)classification
https://doi.org/10.1016/j.preteyeres.2015.07.007

Monitor
disease
progression
Andtrytopredictwhat
wouldbethe “intelligent
guesses”fortreatment
Isitevenpossiblein
theorytogetaneasy
index?

Monitor
disease
progression
Predictive model
surprisinggood
already from
unimodaldata!
ForecastingFutureHumphrey
VisualFieldsUsingDeepLearning
JoanneC Wen, CeciliaS Lee, PearseA Keane, Sa Xiao, Yue Wu, Ariel Rokem, PhilipP Chen, AaronY Lee
doi: https://doi.org/10.1101/293621
“More than 1.7 million perimetry points were extracted to the hundredth decibel from
32,443 24-2 HVFs. The model were able to successfully predict progressive field loss in
glaucomatous eyes up to 5.5 years in the future with a correlation of 0.92 between
theMDofpredictedandactualfuture HVFandanaveragedifference of0.41dB.“

From Mass
Medicine to
Personalized
Precision
Medicine
Stratifyyour
patientsmore
intelligently PublishedonMar20,2012
https://youtu.be/hF2QHevDHSw?t=56m31s

Multimodal
portable
device or
variouschunky
tabletop
devices?

Visualization
Clinicians more
ready to believethe
“blackboxresults”if
there is some
explainability
"Cliniciansneedthe data-drivenmodelpredictionstoalignwiththeirdomain
knowledge" Dr. JennaWiens@NIPS2016,“
NIPS2016WorkshoponMachineLearning forHealth”
http://www.nipsml4hc.ws/jenna-wiens
https://arxiv.org/abs/1602.04938
Examplesof
identificationofage-
relatedmacular
degeneration
(AMD)bydeep
learning algorithm.
Leeetal.(2016)
https://arxiv.org/ab
s/1612.04891

Next-Generation
Medical
Education
https://doi.org/10.1016/S0933-3657(97)00054-7
Paneldiscussion-Brainstormingonnextstepsandglobalcollaborations
Panelists:Anne L. Coleman, MD, PhD, FARVO, JoshuaD. Stein, MS, MD, Paul P. Lee, JD, MD, FARVO, Aaron Y. Lee, MD,
Adnan Tufail, FRCOphth, MD, Michael B. Gorin, MD, PhD, FARVO, SethBlackshaw, PhD, David Cobrinik, MD, PhD, Salil Anil
Lachke, PhD, JiangQian, PhD, HengZhu, PhD Moderator:Michael F. Chiang, MD
https://www.arvo.org/contentassets/5dfd4266ae0c49d49549227208326aba/2017-big-data--cours
e-agenda-for-online-course-updated.pdf

Thinking about Data Strategy: for Ophthalmologists

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (18)

Similar a Thinking about Data Strategy: for Ophthalmologists

Similar a Thinking about Data Strategy: for Ophthalmologists (20)

Más de PetteriTeikariPhD

Más de PetteriTeikariPhD (15)

Último

Último (20)

Thinking about Data Strategy: for Ophthalmologists