Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

Digital biomarkers for preventive personalised healthcare

Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Cargando en…3
×

Eche un vistazo a continuación

1 de 24 Anuncio

Digital biomarkers for preventive personalised healthcare

Descargar para leer sin conexión

A talk given to the Alan Turing Institute, UK, Oct 2021, reporting on the preliminary results and ongoing research in our lab, on self-monitoring using accelerometers for healthcare applications

A talk given to the Alan Turing Institute, UK, Oct 2021, reporting on the preliminary results and ongoing research in our lab, on self-monitoring using accelerometers for healthcare applications

Anuncio
Anuncio

Más Contenido Relacionado

Presentaciones para usted (20)

Similares a Digital biomarkers for preventive personalised healthcare (20)

Anuncio

Más de Paolo Missier (20)

Más reciente (20)

Anuncio

Digital biomarkers for preventive personalised healthcare

  1. 1. Digital biomarkers for preventive personalised healthcare Oct 18th, 2021 Paolo Missier Newcastle University, School of Computing
  2. 2. The Team Prof. Paolo Missier PI, Newcastle University (PI) Prof. Michael Catt, Newcastle University and Closed Loop Medicine, Cambridge (CO-I) Dr. Jaume Bacardit, Newcastle University (CO-I) Key contributors: Dr. Ossama Alshabrawy (PhD student, now Lecturer at Northumbria University) Ben Lam, PhD student Dr. Jacek Cala, Sr. Research Associate In collaboration with the IMI DIRECT Consortium https://www.imi.europa.eu/projects-results/project-factsheets/direct Diabetes research on patient stratification
  3. 3. Data-Driven, Personalised, Predictive, Preventive, Participatory Medicine (D2P4) Part I: The role of physical activity monitoring to support Type II Diabetes studies Can we learn useful representations for a person’s daily activities from accelerometry? Part II: Generating synthetic physical activity data How do we simulate plausible physical activity patterns and why?
  4. 4. Data-Driven, Personalised, Predictive, Preventive, Participatory Medicine (D2P4) Part I: The role of physical activity monitoring to support Type II Diabetes studies Can we learn useful representations for a person’s daily activities from accelerometry? Main contributors: Dr. Ossama Alshabrawy (PhD student, now Lecturer at Northumbria University) Benjamin Lam, PhD student
  5. 5. Activity traces archive from the UK Biobank Filter: Accelerometry study? 103,712 Split criteria: Type 2 Diabetes? At baseline: 2,755 Through EHR analysis: 1,321 Total: 4,076 Non-Diabetes 99,636 Filter: EHR data available? 19,852 502, 664 All UK Biobank participants: Filter: QC on activity traces 3,103 Positives: T2D vs Norm-0 Physical Impairment analysis Severe impairment 1,666 No impairment 8,463 T2D vs Norm-2 Is there enough signal in the traces to segregate T2D from Norm?
  6. 6. Extracting High Level Activity Features (HLAF) feature extraction 60 features / day  aggregated to week (*) (*) Doherty A, Jackson D, et al. (2017), Large scale population assessment of physical activity using wrist worn accelerometers: the UK Biobank study. PLOS ONE. 12(2):e0169649. https://github.com/activityMonitoring/biobankAccelerometerAnalysis
  7. 7. Selected results: Clustering - Cluster 2 contains almost entirely T2D positives (99.8%) - phenotypes associated with increased risk of T2D (increasing age, high body fat percentage and a sedentary lifestyle) are also highly expressed in Cluster 2
  8. 8. Selected results: classification Negatives: HLAF SDL HLAF+SDL Norm-0 Norm-2 Norm-0 Norm-2 Norm-0 Norm-2 RF .80 .68 .83 .78 .86 .77 LR .79 .70 .83 .78 .86 .78 XGB .78 .66 .80 .74 .85 .75 Lam B, Catt M, Cassidy S, Bacardit J, Darke P, Butterfield S, Alshabrawy O, Trenell M, Missier P, Using Wearable Activity Trackers to Predict Type 2 Diabetes: Machine Learning– Based Cross-sectional Study of the UK Biobank Accelerometer Cohort -- JMIR Diabetes, Vol 6 no1. 19/3/2021:23364 SDL: Socio-Demographic and Lifestyle variables
  9. 9. Lessons learnt • Signal is weak and noisy when used in the contex of a complex metabolic disease • “Controls” may actually be physically impaired and this is hard to determine • UK Biobank had no QC protocol, ”a random week in life” provides poor indicators Are we mapping raw traces to the best possible feature space?
  10. 10. Learning embedded representation spaces DIRECT DB • ~3,000 individuals total • Follow-ups at 18 36, 48 months Representation learning Embedded feature space LSTM Autoencoder Covariates, Outcomes (eg Insulin sensitivity) Classification Clustering Cluster interpretation
  11. 11. Autoencoder Architecture LSTM Autoencoder Final reconstruction loss: 0.46 (early termination, 9/150 epochs to prevent overfitting)
  12. 12. Clustering in the high level and embedded spaces Embedded features High-level features K-means Hierarchical Affinity Propagation Spectral clustering Embedded features High-level features Embedded features High-level features Embedded features High-level features
  13. 13. Clusters quality Silhouette Calinski-Harabrasz Davies-Bouldin Affinity Propagation 0.634 2220.021 0.895 Spectral 0.677 2600.836 0.839 DBSCAN 0.274 73.642 1.808 Hierarchical 0.466 2292.27 0.879 K-means 0.482 2617.19 0.839 Silhouette: Bounded between 0 and 1 (Closer to 1, the better) Calinski-Harabrasz: Unbounded (The higher the score, the better) Davies-Bouldin (Not well suited to density methods): Bounded between 0 and 1 (Closer to 0, the better) Many of the other cluster validity indices require knowledge of the ground truth labels, so this is not suitable for this study
  14. 14. Cluster interpretation: clinical and activity variables Logistic regression AdaBoost classifier Random forest classifier XGBoost 2 clusters  binary classification: are the clinical variables good predictors for the clusters? percent time light-tasks daily percent time sedentary daily avg num hrs asleep daily avg daily MET level 0.009 0.3 0.005 0.01 Significant p-values from t-tests Distribution of physical variables
  15. 15. Data-Driven, Personalised, Predictive, Preventive, Participatory Medicine (D2P4) Part II: Generating synthetic physical activity data How do we simulate plausible physical activity patterns and why? Main contributor: Dr. Jacek Cala, Sr. Research Associate
  16. 16. Motivation From the EPSRC Healthcare Technologies Grand Challenges (*) “[Design] An intelligent 'companion' that is fully aware of an individual's healthcare history and experience, empowering them to self-manage their health and care by providing directly relevant feedback, information and advice.” (*) https://epsrc.ukri.org/research/ourportfolio/themes/healthcaretechnologies/strategy/grandchallenges/ Scoping this down… How do we design an AI agent that - Knows our (wellness, fitness, health) goals - Understands our current state through physical activity monitoring - Can suggest personalised interventions to achieve our goals Idea: Reinforcement learning
  17. 17. Longitudinal and profile-specific data scarcity The Good: Annotated sensor data are widely available and useful to train an AI agent The Bad: Difficult to find / create protocols where: • Participants are followed for any length of time  no longitudinal dimension (months, years) • Responses to interventions can be observed • Activity traces are available for specific conditions, pathologies, patient groups...
  18. 18. A little puppetry Approach: 1. Use 24x7 traces to: • Learn to generate new synthetic traces for a catalogue A1… An of activities • Model unfolding daily activity patterns 2. Simulate: Generate syntraces and combine them into controlled plausible daily patterns Limited to basic activity types - Sedentary, Light tasks, Moderate, Vigorous - Sleep Goal: to simulate a variety of physical activity patterns that unfold in time - Realistic - Useful in practice to boost existing training sets
  19. 19. Learn: Generating synthetic activity traces (*) https://github.com/activityMonitoring/biobankAccelerometerAnalysis Training: - Traces: UK Biobank / 24x7 / 27 individuals - HAR: Oxford accelerometry analysis tool (*) - Traces broken down by (predicted) activity type - A separate model trained for each activity type - Notes: sleep excluded, traces trimmed to limit training times raw data (subject-1) preprocess 126-dimensional feature vectors classified activity trace classify split vectors by activity walking feature vectors sleep feature vectors moderate feature vectors ... train low-level sleep model train low-level moderate model train low-level walking model ... walking model moderate model sleep model ... raw data (subject-1) preprocess 126-dimensional feature vectors classified activity trace classify split vectors by activity walking feature vectors sleep feature vectors moderate feature vectors raw data (subject-1) preprocess 126-dimensional feature vectors classified activity trace classify split vectors by activity walking feature vectors sleep feature vectors moderate feature vectors ... Approach: Generative Neural Networks BasicGAN from Synthetic Data Vault: https://sdv.dev/
  20. 20. Preliminary results Validation: Oxford activity classifier used as discriminator 186 synthetic traces • Walking activity easiest to simulate: 120 correctly classified • Moderate activity hardest (196) – only 4 correctly classified Problem: some of the correctly classified traces look unrealistic
  21. 21. Model and simulate: whole-day activity profiles Goal: to realistically combine bouts of single activities into ”virtual days” Approach: parametric multi-state modelling • transition probability si  sj increases as more time spent in si Objective: Use real 24x7 sequences to learn: - Realistic lengths of each activity bouts - Activity transitions, eg walk  sit Selected traces of 24-hour synthetic activity profiles generated by the semi-Markov generalised gamma model (a), (b) show plausible traces; (c), (d) less realistic
  22. 22. Summary and open research Part I: The role of physical activity monitoring to support Type II Diabetes studies - Single sensor, free-living, poor QC  weak and noisy signal - Good clustering of patients but signal inadequate for specific outcomes eg insuline sensitivity - Signal either stable over time or too noisy to track disease progression Next: multi-sensor monitoring Part II: Generating synthetic physical activity data - Plausible activity patterns Next: use syndata for training using reinforcement learning?
  23. 23. Leveraged resources and Future plans New collaboration: - Physical activity monitoring to support a study on “long covid”-induced frailty. Consortium of 5 hospitals (Italy + Israel), about 300 patients. Funded by Gilead Potential collaborations: - Closed Loop Medicine, Digital Healthcare, Cambridge (through Prof. Catt) - Fully-funded CDT PhD studentship aligned with the project since its inception (Ben Lam) - New PhD student started October 2021 (Naif Alzahrani)
  24. 24. Key outputs Publications: • Lam B, Catt M, Cassidy S, Bacardit J, Darke P, Butterfield S, Alshabrawy O, Trenell M, Missier P, Using Wearable Activity Trackers to Predict Type 2 Diabetes: Machine Learning–Based Cross-sectional Study of the UK Biobank Accelerometer Cohort -- JMIR Diabetes, Vol 6 no1. 19/3/2021:23364 • Ferrari D, Milic J, Tonelli R, Ghinelli F, Meschiari M, et al. (2020) Machine learning in predicting respiratory failure in patients with COVID-19 pneumonia—Challenges, strengths, and opportunities in a global health emergency. PLOS ONE 15(11): e0239172. https://doi.org/10.1371/journal.pone.0239172 Invited Presentations: - Data Science for (Health) Science: tales from a challenging front line, a talk given to the The School of Information Sciences, Center for Informatics Research in Science and Scholarship, University of Illinois Urbana-Champaign, USA (March 2021) - Digital markers from physical activity traces to support research into type 2 diabetes, Talk given to the IMI DIRECT consortium (April 2020) - Prediction & prevention of age-related diseases through Machine Learning, Talk given to Newcastle BRC/NIHR group (Jan 2020) - Exploring the role of digital and genetic biomarkers to learn personalized predictive models of metabolic diseases, Talk given at the Turing Health Programme workshop, Manchester March 2019

Notas del editor

  • Aims:
    1. To understand the role {potential, limitations} of physical activity monitoring to support the detection of complex metabolic diseases (Type II Diabetes)
    2. To investigate the potential of synthetic activity traces to address the scarcity of longitudinal activity datasets for research

    Clustering within the space
    Learning classifiers for specific clinical outcomes

  • Aims:
    1. To understand the role {potential, limitations} of physical activity monitoring to support the detection of complex metabolic diseases (Type II Diabetes)
    2. To investigate the potential of synthetic activity traces to address the scarcity of longitudinal activity datasets for research

    Clustering within the space
    Learning classifiers for specific clinical outcomes

  • Spectral clustering is a technique with roots in graph theory, where the approach is used to identify communities of nodes in a graph based on the edges connecting them. The method is flexible and allows us to cluster non graph data as well.
    Spectral clustering uses information from the eigenvalues (spectrum) of the Laplacian built from a graph representation of the data set. 

  • Aims:
    1. To understand the role {potential, limitations} of physical activity monitoring to support the detection of complex metabolic diseases (Type II Diabetes)
    2. To investigate the potential of synthetic activity traces to address the scarcity of longitudinal activity datasets for research

    Clustering within the space
    Learning classifiers for specific clinical outcomes

×