SlideShare a Scribd company logo
1 of 57
Download to read offline
Scaling is caring
Building scalable feature engineering pipelines for
machine learning in healthcare
April 3 2019
Amsterdam 2019
Introductions
• Michele Tonutti !
•Data Scientist at Pacmed
•Intensive Care Team
•Background in Biomedical Engineering and Robotics

Introductions
•Developing machine-learning-driven decision
support tools to make healthcare more
personal, personalised and precise.
•Patients only get care that has the highest
probability of success for them.
•Focus on oncology, emergency care, chronic
diseases, and intensive care.
Pacmed focuses on four applications
Emergency care: 

What is the urgency level of a patient (how quick should someone see a doctor)?
Intensive Care: 

Predicting risk of ICU and post-ICU complications to support decision-making
Chronic diseases: 

What is the best treatment (combination) for patients with hypertension, diabetes and/or
chronic kidney failure?
Oncology: 

What are the optimal treatments for the individual patient with colon-, prostate- or breast-
cancer?
Intensive care is most promising and furthest developed
Emergency care: 

What is the urgency level of a patient (how quick should someone see a doctor)?
Intensive Care: 

Predicting risk of ICU and post-ICU complications to support decision-making
Chronic diseases: 

What is the best treatment (combination) for patients with hypertension, diabetes and/or
chronic kidney failure?
Oncology: 

What are the optimal treatments for the individual patient with colon-, prostate- or breast-
cancer?
The Intensive Care Unit (ICU)
Pacmed is currently working on four prediction problems on the
intensive care
t-3 t-2 t-1 Today t+7
Readmission/mortality
Vital signs
t-3 t-2 t-1 Today t+2
Re-intubationRespiratory 

parameters
t-3 t-2 t-1 Today t+1 t+2
Bed capacityPatient inflow

& outflow
t-3 t-2 t-1 Today t+1
Creatinine Kidney function
Discharge decision


Predicting the readmission and
mortality risk of patients on
discharge
Extubation decision
Predicting the risk of re-intubation
of patients if they are extubated
Capacity management
Predicting the number of full/
available beds
Predicting complications
E.g. Predicting kidney function
Machine-learning based decision support software
Explainable prediction of eligibility for discharge from the ICU
Explainable prediction of eligibility for discharge from the ICU
Feature Value Interpretation of value
SATURATION

Max value of the admission
98% A max value of 98% is lower than 95% of all discharged patients
SERUM CREATININE

Trend in last 24 hours
Increase of 20 ml
From 100 to 120
The average patient had a stable serum creatinine during the last
24 hours. The increase of +20 is higher than 99% of discharged
patients
ALAT

Variation in values last 24 hours
Variation of 7 ml
Between 5 and 12
The average patient had a variation of ALAT of 2 in the last 24
hours. A variation of 7 is higher than 76% of all patients.
URINE OUTPUT

Average last 24 hours
240 ml
An average value of last 24 hours. The average discharged patient
has a urine output of 250.
A pipeline for ICUs that works for both development and production
Hospital 1
Hospital 2
Hospital 3
Development
Production
Hospital 1
Hospital 2
Hospital 3
A pipeline for ICUs that works for both development and production
Development
Production
Feature
Engineering
Hospital 1
Hospital 2
Hospital 3
A pipeline for ICUs that works for both development and production
Feature engineering for medical data is an iterative process
Medical knowledge
Feature engineering
Modelling
Validation
Feature engineering for medical data is an iterative process
Medical knowledge
Feature engineering
Modelling
Validation
The issue of variety in medical data
1.High number of unique parameters
2.Differing feature structure for different problems
3.Different parameter distributions between populations
4.Variability of measurements over time
Patient and admission characteristics
Clinical observations
Vital signs & device data
Lab values
High number of parameters measured in the ICU
• Respiratory rate
• Mechanical Ventilation
• Tidal Volume
• Expiratory minute Volume
• Respiration modus
• PEEP
• Piek druk
• Supplemental O2
• Fraction of inspired O2
• Type of O2 administration
• Peripheral O2 saturation
• Blood pressure (diastolic
and systolic, arterial and
non-invasive)
• Pulmonary artery press.
(diastolic and systolic)
• CVP
• PCWP wedge
• Heart rate
• Cardiac output
• Tidal volume (inspiratory
and expiratory)
• Heart rhythm & ectopic
• Shock index
• Temperature peripheral
• CAM, DOS, RASS, NAS
• GCS
• Pupil size and reaction
Respiration Circulation
• Cough stimulant
• Urine output
• Number of bronchial toilets
• Age, sex
• Length and weight at
admission
• Department of origin
• Length of stay
• Number of prior
admissions
• Time in the hospital
before admission
• CPR code
• Base excess
• O2 content in blood
• Arterial O2 saturation
• pH
• Part. press. (O2 & CO2)
• Actual bicarbonate
Blood gas analysis Haematology
• Hb, Ht
• White blood cell count
• MCH, MCV
• Erythrocytes
• Thrombocytes
• Lymphocytes
• Leucocytes
• Baso, eo and neutro
• Reticulocytes
• PT, APTT
• CK-MB
• Troponin-T
Cardiac enzymes
• Natrium, Kalium
• Chloride
• Calcium, ion. Calcium
• Magnesium
• Fosfaat
• Creatinine
• CK
• EST and CRP
• Blood glucose
• Blood lactate
• Amylase
• Serum albumine
• BUN_creatinine
• NT-ProBNP
Chemistry
• ALAT and ASAT
• GGT, AF
• LDH
• Bilirubine
Liver tests
• Natrium, Kalium
• Ureum
Urinalysis
Medication categories
• Alimentary tract and metabolism
• Antibiotics
• Blood and blood-forming organs
• Cardiovascular
• Musculoskeletal system
• Nervous system
• General (sondevoeding)
Other
• CVVH
• Lines and drains
Measurements can vary widely between hospitals
Number of measurements Mean value
Hospital 1
Hospital 2
Activated partial thromboplastin time (aPTT)
Parameters are measured at different time scales, with highly varying
values and measurement frequencies
What do we need?
• A feature engineering pipeline that:

1. is scalable
2. can be used efficiently for both development and production
3. can be used for multiple outcome measures
4. produces features that are interpretable and useful for both machine
learning models and doctors
Challenge: how to turn time series into information relevant for a
model (and doctors)?
Challenge: how to turn time series into information relevant for a
model (and doctors)?
๏ Recurrent Neural Networks

e.g. (Phased) LSTMs
๏ Frequency domain transforms

e.g. Fourier transform
๏ Embedded representations 

e.g. patient2vec
Challenge: how to turn time series into information relevant for a
model (and doctors)?
๏ Recurrent Neural Networks

e.g. (Phased) LSTMs
๏ Frequency domain transforms

e.g. Fourier transform
๏ Embedded representations 

e.g. patient2vec
• Scalable?
• Reusable across models?
• Interpretable?
Challenge: how to turn time series into information relevant for a
model (and doctors)?
๏ Recurrent Neural Networks

e.g. (Phased) LSTMs
๏ Frequency domain transforms

e.g. Fourier transform
๏ Embedded representations 

e.g. patient2vec
• Scalable?
• Reusable across models?
• Interpretable?
Extracting interpretable aggregated values from vital parameters
last
first
minimum
average
slope standard deviation
maximum
{…}counts
Heart rate (bpm)
{…}
{…}
1
2
3
First 48h
First 72h
First 24h
{…}
We use these aggregated features to capture short-term effects as well as
longer-term trends
We use these aggregated features to capture short-term effects as well as
longer-term trends
{…} {…}
{…}
1
2
3
Whole stay
Day averages
First and last day
Multiple patients, multiple parameters, continuous time scale
Multiple patients, multiple parameters, continuous time scale
Split - apply - combine
1) Splitting the data into groups based on some criteria.
2) Applying a function to each group independently.
3) Combining the results into a data structure.
Creating features grouped in custom time windows
Creating features grouped in custom time windows
Creating features grouped in custom time windows
Why not stick to Pandas then?
• Interpretable, easy, reliable
• Works very well with datetime
formats
• Most simple aggregations available
Why not stick to Pandas then?
• Interpretable, easy, reliable
• Works very well with datetime
formats
• Most simple aggregations available
• No out-of-the-box parallelisation
• Everything in memory
• Custom aggregations can be
extremely computationally heavy
Heavy computational load for custom functions
Dask: scalable Pandas
• Abstraction over numpy, pandas and scikit-learn allowing you to run
operations on them in parallel, using multicore processing
Dask: scalable Pandas
Dask: scalable Pandas
Dask: scalable Pandas
• Manipulating large datasets, even when those datasets don’t fit in memory
• Distributed computing on large datasets with standard Pandas operations
like groupby, join, and time series computations
• Scales up to multiple machines auto-magically.

Scales down: low-memory and fast even on local machines.
Reminder: our goal of scalability
๏ Develop and test on any machine
๏ Re-use the same pipeline for production
๏ For both large and small datasets
Problems with Dask
• Not all pandas aggregations available

(e.g. apply custom functions on expanding windows)
• Complex to optimise on each machine
• Need to select manually number of workers, partitions, etc.
• Performance highly dependent on settings
• Slower for small datasets and certain transformations
Can we do better?
TSFRESH
• "Time Series Feature extraction based on scalable hypothesis tests”.
TSFRESH
• "Time Series Feature extraction based on scalable hypothesis tests”.
TSFRESH
• Same split-apply-combine concept, but feature calculations are done on
numpy arrays (vectorized), in parallel
Dealing with time-varying signals
pandas Series numpy array
Calculate aggregates
in parallel
pandas DataFrame
min()

max()
std()
…
Huge list of aggregates available out of the box
Result: clean, interpretable dataframe ready for modelling
Scaling up and down
• (Local) multiprocessing
• Cluster with Dask
Dealing with time-varying signals
• Problem: using numpy arrays means losing the datetime dimension
• Solution: custom fork of TSFRESH
• The DatetimeIndex of the input pandas dataframe is used only when
calculating time-dependent aggregations
• Medication data can also be taken into account by exploiting multi-
indices (e.g. medications)
Dealing with medications
Aggregates:
- Total amount
- Time since last dose
- Time under treatment
- Time without treatment
Summary
• Creating features for medical data entails dealing with variety and
variability
• Quick processing and interpretable features are top priorities
• No single tool offer a unique solution
Summary
• Pandas works well for quick processing of relatively small datasets
• Split-apply-combine
• Parallelizing (e.g. through Dask) allows quick computation of aggregates
both locally and distributed
• Vectorizing the split-apply-combine approach (e.g. with TSFRESH) speeds
up computation both for small and large datasets.
• Native support for Dask and custom distributors enables scaling
Conclusions
• Approach not limited to Python or specific packages
• Can be extended to any application that involve time series
• Scaling horizontally: we adapted the ICU pipeline for various other
projects (e.g. treatment decision based on patients’ clinical history)
• No need to re-invent the wheel every time
Key takeaway
“FEATURE ENGINEERING”
PANDAS
DATA SCIENTIST
Questions or feedback?
Michele Tonutti
michele.tonutti@pacmed.nl

More Related Content

Similar to Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019

David Snead on The use of digital pathology in the primary diagnosis of histo...
David Snead on The use of digital pathology in the primary diagnosis of histo...David Snead on The use of digital pathology in the primary diagnosis of histo...
David Snead on The use of digital pathology in the primary diagnosis of histo...Cirdan
 
HBaseCon 2015: HBase as an IoT Stream Analytics Platform for Parkinson's Dise...
HBaseCon 2015: HBase as an IoT Stream Analytics Platform for Parkinson's Dise...HBaseCon 2015: HBase as an IoT Stream Analytics Platform for Parkinson's Dise...
HBaseCon 2015: HBase as an IoT Stream Analytics Platform for Parkinson's Dise...HBaseCon
 
March 5, 2015 PoCDx Seminar - Wallace White, Stratos - Development of the Pan...
March 5, 2015 PoCDx Seminar - Wallace White, Stratos - Development of the Pan...March 5, 2015 PoCDx Seminar - Wallace White, Stratos - Development of the Pan...
March 5, 2015 PoCDx Seminar - Wallace White, Stratos - Development of the Pan...BerkeleyPoCDx
 
Meditech products list 2015 no FM v.B1
Meditech products list 2015 no FM v.B1Meditech products list 2015 no FM v.B1
Meditech products list 2015 no FM v.B1Miss.Alicia Zhang
 
InterSystems UK Symposium 2012 Corporate Overview
InterSystems UK Symposium 2012 Corporate OverviewInterSystems UK Symposium 2012 Corporate Overview
InterSystems UK Symposium 2012 Corporate OverviewISCMarketing
 
Raise the bar webinar 3.21.17 final
Raise the bar webinar   3.21.17 finalRaise the bar webinar   3.21.17 final
Raise the bar webinar 3.21.17 finalMeghan Carter
 
Untether Your Data with EndoGear: Wireless Volumetric Blood Flow and Pressure...
Untether Your Data with EndoGear: Wireless Volumetric Blood Flow and Pressure...Untether Your Data with EndoGear: Wireless Volumetric Blood Flow and Pressure...
Untether Your Data with EndoGear: Wireless Volumetric Blood Flow and Pressure...InsideScientific
 
Esco Versati Laboratory Centrifuge
Esco Versati Laboratory CentrifugeEsco Versati Laboratory Centrifuge
Esco Versati Laboratory CentrifugeEsco Group
 
HPC Infrastructure for Simulations of Drug-Induced Arrhythmias in Living Hearts
HPC Infrastructure for Simulations of Drug-Induced Arrhythmias in Living HeartsHPC Infrastructure for Simulations of Drug-Induced Arrhythmias in Living Hearts
HPC Infrastructure for Simulations of Drug-Induced Arrhythmias in Living Heartsinside-BigData.com
 
CBCC Biorepository capabilities
CBCC Biorepository capabilitiesCBCC Biorepository capabilities
CBCC Biorepository capabilitiesHarini Patel
 
Using Simulation for Hospital Planning
Using Simulation for Hospital PlanningUsing Simulation for Hospital Planning
Using Simulation for Hospital PlanningSIMUL8 Corporation
 
Dr. Jim Lowe - Big data and models: Are they really useful in disease managem...
Dr. Jim Lowe - Big data and models: Are they really useful in disease managem...Dr. Jim Lowe - Big data and models: Are they really useful in disease managem...
Dr. Jim Lowe - Big data and models: Are they really useful in disease managem...John Blue
 
A Real-Time Prostate Cancer Radiotherapy Research Database
A Real-Time Prostate Cancer Radiotherapy Research DatabaseA Real-Time Prostate Cancer Radiotherapy Research Database
A Real-Time Prostate Cancer Radiotherapy Research DatabaseCancer Institute NSW
 
H2O World - Machine Learning to Save Lives - Taposh Dutta Roy
H2O World - Machine Learning to Save Lives - Taposh Dutta RoyH2O World - Machine Learning to Save Lives - Taposh Dutta Roy
H2O World - Machine Learning to Save Lives - Taposh Dutta RoySri Ambati
 
Orlando Agrippa, Draper & Dash at "Journeys of Health-Tech Innovation" Nov 3...
Orlando Agrippa, Draper & Dash  at "Journeys of Health-Tech Innovation" Nov 3...Orlando Agrippa, Draper & Dash  at "Journeys of Health-Tech Innovation" Nov 3...
Orlando Agrippa, Draper & Dash at "Journeys of Health-Tech Innovation" Nov 3...Health-Tech Innovation LABS
 
2011-10-21 ASIP Santé Conférence Télémédecine "Présentation TEMPiS"
2011-10-21 ASIP Santé Conférence Télémédecine "Présentation TEMPiS"2011-10-21 ASIP Santé Conférence Télémédecine "Présentation TEMPiS"
2011-10-21 ASIP Santé Conférence Télémédecine "Présentation TEMPiS"ASIP Santé
 

Similar to Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019 (20)

David Snead on The use of digital pathology in the primary diagnosis of histo...
David Snead on The use of digital pathology in the primary diagnosis of histo...David Snead on The use of digital pathology in the primary diagnosis of histo...
David Snead on The use of digital pathology in the primary diagnosis of histo...
 
HBaseCon 2015: HBase as an IoT Stream Analytics Platform for Parkinson's Dise...
HBaseCon 2015: HBase as an IoT Stream Analytics Platform for Parkinson's Dise...HBaseCon 2015: HBase as an IoT Stream Analytics Platform for Parkinson's Dise...
HBaseCon 2015: HBase as an IoT Stream Analytics Platform for Parkinson's Dise...
 
March 5, 2015 PoCDx Seminar - Wallace White, Stratos - Development of the Pan...
March 5, 2015 PoCDx Seminar - Wallace White, Stratos - Development of the Pan...March 5, 2015 PoCDx Seminar - Wallace White, Stratos - Development of the Pan...
March 5, 2015 PoCDx Seminar - Wallace White, Stratos - Development of the Pan...
 
Cardiac Design Labs
Cardiac Design LabsCardiac Design Labs
Cardiac Design Labs
 
Cardiac Design Labs
Cardiac Design Labs Cardiac Design Labs
Cardiac Design Labs
 
Meditech products list 2015 no FM v.B1
Meditech products list 2015 no FM v.B1Meditech products list 2015 no FM v.B1
Meditech products list 2015 no FM v.B1
 
InterSystems UK Symposium 2012 Corporate Overview
InterSystems UK Symposium 2012 Corporate OverviewInterSystems UK Symposium 2012 Corporate Overview
InterSystems UK Symposium 2012 Corporate Overview
 
Raise the bar webinar 3.21.17 final
Raise the bar webinar   3.21.17 finalRaise the bar webinar   3.21.17 final
Raise the bar webinar 3.21.17 final
 
Untether Your Data with EndoGear: Wireless Volumetric Blood Flow and Pressure...
Untether Your Data with EndoGear: Wireless Volumetric Blood Flow and Pressure...Untether Your Data with EndoGear: Wireless Volumetric Blood Flow and Pressure...
Untether Your Data with EndoGear: Wireless Volumetric Blood Flow and Pressure...
 
Center for Integrative Research in Critical Care by Kevin Ward
Center for Integrative Research in Critical Care by Kevin WardCenter for Integrative Research in Critical Care by Kevin Ward
Center for Integrative Research in Critical Care by Kevin Ward
 
Esco Versati Laboratory Centrifuge
Esco Versati Laboratory CentrifugeEsco Versati Laboratory Centrifuge
Esco Versati Laboratory Centrifuge
 
HPC Infrastructure for Simulations of Drug-Induced Arrhythmias in Living Hearts
HPC Infrastructure for Simulations of Drug-Induced Arrhythmias in Living HeartsHPC Infrastructure for Simulations of Drug-Induced Arrhythmias in Living Hearts
HPC Infrastructure for Simulations of Drug-Induced Arrhythmias in Living Hearts
 
CBCC Biorepository capabilities
CBCC Biorepository capabilitiesCBCC Biorepository capabilities
CBCC Biorepository capabilities
 
Using Simulation for Hospital Planning
Using Simulation for Hospital PlanningUsing Simulation for Hospital Planning
Using Simulation for Hospital Planning
 
Dr. Jim Lowe - Big data and models: Are they really useful in disease managem...
Dr. Jim Lowe - Big data and models: Are they really useful in disease managem...Dr. Jim Lowe - Big data and models: Are they really useful in disease managem...
Dr. Jim Lowe - Big data and models: Are they really useful in disease managem...
 
A Real-Time Prostate Cancer Radiotherapy Research Database
A Real-Time Prostate Cancer Radiotherapy Research DatabaseA Real-Time Prostate Cancer Radiotherapy Research Database
A Real-Time Prostate Cancer Radiotherapy Research Database
 
H2O World - Machine Learning to Save Lives - Taposh Dutta Roy
H2O World - Machine Learning to Save Lives - Taposh Dutta RoyH2O World - Machine Learning to Save Lives - Taposh Dutta Roy
H2O World - Machine Learning to Save Lives - Taposh Dutta Roy
 
M-Health
M-HealthM-Health
M-Health
 
Orlando Agrippa, Draper & Dash at "Journeys of Health-Tech Innovation" Nov 3...
Orlando Agrippa, Draper & Dash  at "Journeys of Health-Tech Innovation" Nov 3...Orlando Agrippa, Draper & Dash  at "Journeys of Health-Tech Innovation" Nov 3...
Orlando Agrippa, Draper & Dash at "Journeys of Health-Tech Innovation" Nov 3...
 
2011-10-21 ASIP Santé Conférence Télémédecine "Présentation TEMPiS"
2011-10-21 ASIP Santé Conférence Télémédecine "Présentation TEMPiS"2011-10-21 ASIP Santé Conférence Télémédecine "Présentation TEMPiS"
2011-10-21 ASIP Santé Conférence Télémédecine "Présentation TEMPiS"
 

More from Codemotion

Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Codemotion
 
Pompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyPompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyCodemotion
 
Pastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaPastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaCodemotion
 
Pennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserPennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserCodemotion
 
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Codemotion
 
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Codemotion
 
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Codemotion
 
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 - Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 - Codemotion
 
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Codemotion
 
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Codemotion
 
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Codemotion
 
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Codemotion
 
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Codemotion
 
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Codemotion
 
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...Codemotion
 
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Codemotion
 
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Codemotion
 
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Codemotion
 
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Codemotion
 
Mike Kotsur - What can philosophy teach us about programming - Codemotion Ams...
Mike Kotsur - What can philosophy teach us about programming - Codemotion Ams...Mike Kotsur - What can philosophy teach us about programming - Codemotion Ams...
Mike Kotsur - What can philosophy teach us about programming - Codemotion Ams...Codemotion
 

More from Codemotion (20)

Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
 
Pompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyPompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending story
 
Pastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaPastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storia
 
Pennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserPennisi - Essere Richard Altwasser
Pennisi - Essere Richard Altwasser
 
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
 
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
 
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
 
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 - Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
 
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
 
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
 
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
 
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
 
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
 
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
 
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
 
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
 
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
 
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
 
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
 
Mike Kotsur - What can philosophy teach us about programming - Codemotion Ams...
Mike Kotsur - What can philosophy teach us about programming - Codemotion Ams...Mike Kotsur - What can philosophy teach us about programming - Codemotion Ams...
Mike Kotsur - What can philosophy teach us about programming - Codemotion Ams...
 

Recently uploaded

Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 

Recently uploaded (20)

Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 

Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019

  • 1. Scaling is caring Building scalable feature engineering pipelines for machine learning in healthcare April 3 2019 Amsterdam 2019
  • 2. Introductions • Michele Tonutti ! •Data Scientist at Pacmed •Intensive Care Team •Background in Biomedical Engineering and Robotics

  • 3. Introductions •Developing machine-learning-driven decision support tools to make healthcare more personal, personalised and precise. •Patients only get care that has the highest probability of success for them. •Focus on oncology, emergency care, chronic diseases, and intensive care.
  • 4. Pacmed focuses on four applications Emergency care: 
 What is the urgency level of a patient (how quick should someone see a doctor)? Intensive Care: 
 Predicting risk of ICU and post-ICU complications to support decision-making Chronic diseases: 
 What is the best treatment (combination) for patients with hypertension, diabetes and/or chronic kidney failure? Oncology: 
 What are the optimal treatments for the individual patient with colon-, prostate- or breast- cancer?
  • 5. Intensive care is most promising and furthest developed Emergency care: 
 What is the urgency level of a patient (how quick should someone see a doctor)? Intensive Care: 
 Predicting risk of ICU and post-ICU complications to support decision-making Chronic diseases: 
 What is the best treatment (combination) for patients with hypertension, diabetes and/or chronic kidney failure? Oncology: 
 What are the optimal treatments for the individual patient with colon-, prostate- or breast- cancer?
  • 6. The Intensive Care Unit (ICU)
  • 7. Pacmed is currently working on four prediction problems on the intensive care t-3 t-2 t-1 Today t+7 Readmission/mortality Vital signs t-3 t-2 t-1 Today t+2 Re-intubationRespiratory 
 parameters t-3 t-2 t-1 Today t+1 t+2 Bed capacityPatient inflow
 & outflow t-3 t-2 t-1 Today t+1 Creatinine Kidney function Discharge decision 
 Predicting the readmission and mortality risk of patients on discharge Extubation decision Predicting the risk of re-intubation of patients if they are extubated Capacity management Predicting the number of full/ available beds Predicting complications E.g. Predicting kidney function
  • 9. Explainable prediction of eligibility for discharge from the ICU
  • 10. Explainable prediction of eligibility for discharge from the ICU Feature Value Interpretation of value SATURATION Max value of the admission 98% A max value of 98% is lower than 95% of all discharged patients SERUM CREATININE Trend in last 24 hours Increase of 20 ml From 100 to 120 The average patient had a stable serum creatinine during the last 24 hours. The increase of +20 is higher than 99% of discharged patients ALAT Variation in values last 24 hours Variation of 7 ml Between 5 and 12 The average patient had a variation of ALAT of 2 in the last 24 hours. A variation of 7 is higher than 76% of all patients. URINE OUTPUT Average last 24 hours 240 ml An average value of last 24 hours. The average discharged patient has a urine output of 250.
  • 11. A pipeline for ICUs that works for both development and production Hospital 1 Hospital 2 Hospital 3
  • 12. Development Production Hospital 1 Hospital 2 Hospital 3 A pipeline for ICUs that works for both development and production
  • 13. Development Production Feature Engineering Hospital 1 Hospital 2 Hospital 3 A pipeline for ICUs that works for both development and production
  • 14. Feature engineering for medical data is an iterative process Medical knowledge Feature engineering Modelling Validation
  • 15. Feature engineering for medical data is an iterative process Medical knowledge Feature engineering Modelling Validation
  • 16. The issue of variety in medical data 1.High number of unique parameters 2.Differing feature structure for different problems 3.Different parameter distributions between populations 4.Variability of measurements over time
  • 17. Patient and admission characteristics Clinical observations Vital signs & device data Lab values High number of parameters measured in the ICU • Respiratory rate • Mechanical Ventilation • Tidal Volume • Expiratory minute Volume • Respiration modus • PEEP • Piek druk • Supplemental O2 • Fraction of inspired O2 • Type of O2 administration • Peripheral O2 saturation • Blood pressure (diastolic and systolic, arterial and non-invasive) • Pulmonary artery press. (diastolic and systolic) • CVP • PCWP wedge • Heart rate • Cardiac output • Tidal volume (inspiratory and expiratory) • Heart rhythm & ectopic • Shock index • Temperature peripheral • CAM, DOS, RASS, NAS • GCS • Pupil size and reaction Respiration Circulation • Cough stimulant • Urine output • Number of bronchial toilets • Age, sex • Length and weight at admission • Department of origin • Length of stay • Number of prior admissions • Time in the hospital before admission • CPR code • Base excess • O2 content in blood • Arterial O2 saturation • pH • Part. press. (O2 & CO2) • Actual bicarbonate Blood gas analysis Haematology • Hb, Ht • White blood cell count • MCH, MCV • Erythrocytes • Thrombocytes • Lymphocytes • Leucocytes • Baso, eo and neutro • Reticulocytes • PT, APTT • CK-MB • Troponin-T Cardiac enzymes • Natrium, Kalium • Chloride • Calcium, ion. Calcium • Magnesium • Fosfaat • Creatinine • CK • EST and CRP • Blood glucose • Blood lactate • Amylase • Serum albumine • BUN_creatinine • NT-ProBNP Chemistry • ALAT and ASAT • GGT, AF • LDH • Bilirubine Liver tests • Natrium, Kalium • Ureum Urinalysis Medication categories • Alimentary tract and metabolism • Antibiotics • Blood and blood-forming organs • Cardiovascular • Musculoskeletal system • Nervous system • General (sondevoeding) Other • CVVH • Lines and drains
  • 18. Measurements can vary widely between hospitals Number of measurements Mean value Hospital 1 Hospital 2 Activated partial thromboplastin time (aPTT)
  • 19. Parameters are measured at different time scales, with highly varying values and measurement frequencies
  • 20. What do we need? • A feature engineering pipeline that:
 1. is scalable 2. can be used efficiently for both development and production 3. can be used for multiple outcome measures 4. produces features that are interpretable and useful for both machine learning models and doctors
  • 21. Challenge: how to turn time series into information relevant for a model (and doctors)?
  • 22. Challenge: how to turn time series into information relevant for a model (and doctors)? ๏ Recurrent Neural Networks
 e.g. (Phased) LSTMs ๏ Frequency domain transforms
 e.g. Fourier transform ๏ Embedded representations 
 e.g. patient2vec
  • 23. Challenge: how to turn time series into information relevant for a model (and doctors)? ๏ Recurrent Neural Networks
 e.g. (Phased) LSTMs ๏ Frequency domain transforms
 e.g. Fourier transform ๏ Embedded representations 
 e.g. patient2vec • Scalable? • Reusable across models? • Interpretable?
  • 24. Challenge: how to turn time series into information relevant for a model (and doctors)? ๏ Recurrent Neural Networks
 e.g. (Phased) LSTMs ๏ Frequency domain transforms
 e.g. Fourier transform ๏ Embedded representations 
 e.g. patient2vec • Scalable? • Reusable across models? • Interpretable?
  • 25. Extracting interpretable aggregated values from vital parameters last first minimum average slope standard deviation maximum {…}counts Heart rate (bpm)
  • 26. {…} {…} 1 2 3 First 48h First 72h First 24h {…} We use these aggregated features to capture short-term effects as well as longer-term trends
  • 27. We use these aggregated features to capture short-term effects as well as longer-term trends {…} {…} {…} 1 2 3 Whole stay Day averages First and last day
  • 28. Multiple patients, multiple parameters, continuous time scale
  • 29. Multiple patients, multiple parameters, continuous time scale
  • 30. Split - apply - combine 1) Splitting the data into groups based on some criteria. 2) Applying a function to each group independently. 3) Combining the results into a data structure.
  • 31. Creating features grouped in custom time windows
  • 32. Creating features grouped in custom time windows
  • 33. Creating features grouped in custom time windows
  • 34. Why not stick to Pandas then? • Interpretable, easy, reliable • Works very well with datetime formats • Most simple aggregations available
  • 35. Why not stick to Pandas then? • Interpretable, easy, reliable • Works very well with datetime formats • Most simple aggregations available • No out-of-the-box parallelisation • Everything in memory • Custom aggregations can be extremely computationally heavy
  • 36. Heavy computational load for custom functions
  • 37. Dask: scalable Pandas • Abstraction over numpy, pandas and scikit-learn allowing you to run operations on them in parallel, using multicore processing
  • 40. Dask: scalable Pandas • Manipulating large datasets, even when those datasets don’t fit in memory • Distributed computing on large datasets with standard Pandas operations like groupby, join, and time series computations • Scales up to multiple machines auto-magically.
 Scales down: low-memory and fast even on local machines.
  • 41. Reminder: our goal of scalability ๏ Develop and test on any machine ๏ Re-use the same pipeline for production ๏ For both large and small datasets
  • 42. Problems with Dask • Not all pandas aggregations available
 (e.g. apply custom functions on expanding windows) • Complex to optimise on each machine • Need to select manually number of workers, partitions, etc. • Performance highly dependent on settings • Slower for small datasets and certain transformations
  • 43. Can we do better?
  • 44. TSFRESH • "Time Series Feature extraction based on scalable hypothesis tests”.
  • 45. TSFRESH • "Time Series Feature extraction based on scalable hypothesis tests”.
  • 46. TSFRESH • Same split-apply-combine concept, but feature calculations are done on numpy arrays (vectorized), in parallel
  • 47. Dealing with time-varying signals pandas Series numpy array Calculate aggregates in parallel pandas DataFrame min()
 max() std() …
  • 48. Huge list of aggregates available out of the box
  • 49. Result: clean, interpretable dataframe ready for modelling
  • 50. Scaling up and down • (Local) multiprocessing • Cluster with Dask
  • 51. Dealing with time-varying signals • Problem: using numpy arrays means losing the datetime dimension • Solution: custom fork of TSFRESH • The DatetimeIndex of the input pandas dataframe is used only when calculating time-dependent aggregations • Medication data can also be taken into account by exploiting multi- indices (e.g. medications)
  • 52. Dealing with medications Aggregates: - Total amount - Time since last dose - Time under treatment - Time without treatment
  • 53. Summary • Creating features for medical data entails dealing with variety and variability • Quick processing and interpretable features are top priorities • No single tool offer a unique solution
  • 54. Summary • Pandas works well for quick processing of relatively small datasets • Split-apply-combine • Parallelizing (e.g. through Dask) allows quick computation of aggregates both locally and distributed • Vectorizing the split-apply-combine approach (e.g. with TSFRESH) speeds up computation both for small and large datasets. • Native support for Dask and custom distributors enables scaling
  • 55. Conclusions • Approach not limited to Python or specific packages • Can be extended to any application that involve time series • Scaling horizontally: we adapted the ICU pipeline for various other projects (e.g. treatment decision based on patients’ clinical history) • No need to re-invent the wheel every time
  • 57. Questions or feedback? Michele Tonutti michele.tonutti@pacmed.nl