SlideShare una empresa de Scribd logo
1 de 39
Health data science
Why study data science?
Why study data science?
What is health data science?
• Data-driven solution to solve complex real world health problems
• Or to derive knowledge from unstructured and messy data
• It is an interdisciplinary field: biostatistics, computer science,
epidemiology, public health, mathematics, etc
But basically…
Real life health data science example
• HIV:
• Visualising the pattern of early HIV transmission within the mucosal barrier
• COVID-19:
• What can predict covid-19 neutralisation activity?
• Can we predict covid-19 vaccine efficacy?
Early HIV transmission
dynamics
Background
• Early HIV transmission event might occur during vaginal or anal sex
• Want to investigate if the mucosal barrier (within the vaginal tissue) is
effective in blocking HIV virus transmission or not
If the mucosal barrier is good in preventing viral
transmission, this is what we expect to see
If the mucosal barrier is not good at preventing
transmission, multiple viruses can be found
(random infection)
If the mucosal barrier is not good at preventing
transmission, multiple viruses can be found
(clustered infection)
Animal experiment
Data
14
Data Visualisation
Can still see many viral variants
no evidence that the vaginal tissue
is effective in blocking viral entry
Need a formal method
• How can we say (formally) if infection is spatially clustered (or not) ?
• Mantel test (or Mantel and Valand) -> relate a matrix of
“geographical” distance and a matrix of “biological” distance
• So, need to define the “geographical” matrix and “biological” matrix
first
15
“Geographical” distance
• Euclidean distance
di, j = (xi - xj )2
+(yi - yj )2
16
“Biological” distance
• Morisita – Horn index of overlap
MH =
2
n1in2i
N1N2
i
å
n1i
2
N1
+
n2i
2
N2
i
å
17
“Biological” distance
• Similarity between 1 and 2 =
0.98
• Similarity between 1 and 3 =
0.46
18
Mantel Test (or Mantel and Valand)
• Testing the association between two matrices
• Mantel quantity (Zm) is given by:
• Basic idea -> permutation test
• Randomly changing the rows and columns of the two matrices
• And store the value of Zm for each permutation of rows and columns
Zm = gij
j
å
i
å bij
19
20
Low p-values: infection is clustered locally
within the vaginal tissue
What can predict covid-19
viral neutralisation activity?
Background
• Neutralising antibody (NAb): antibody that can defend the host from
the specific pathogen
• Data: 41 convalescent adults; measured several immunological
parameters (13 parameters total)
• Goal: want to know in those 41 recovered patients, what
immunological parameters can be used to predict NAb
Methods
• Data visualisation is very important in data science
• First step: plot the correlation matrix for the whole dataset
Microneutralization is positively correlated
with SARS-CoV-2 RBD
Microneutralization is negatively correlated
with CCR6+CXCR3-
Ok, not very informative….
Have so many things correlated with microneutralization
Methods
• Correlation matrix shows that Nab is correlated with so many things
• Next step: Can I find some hidden features in this dataset?
• Method: principal component analysis (PCA)
The main focus is microneutralization
If the angle between microneut and another variable is less
than 90o; then it’s a positive association
If the angle between microneut and another variable is greater
than 90o; then it’s a negative association
For instance, higher ELISA S trimer gives higher
microneutralization level (less than 90o)
For instance, higher CCR6+CXCR3- gives lower
microneutralization level (more than 90o)
Methods
• PCA visualisation is better than correlation matrix
• But, still cannot just pick one thing that can be used to predict NAb
• Next step: I want to only pick one thing to predict NAb
• Method: multiple linear regression with a backward model selection
strategy
• The idea is to run a linear regression with all the variables, and iteratively
remove non-significant predictor until all the predictors are significant
Two main things are highly predictive of NAb
Predicting covid-19
vaccine efficacy
Background
Background
• At the end of the phase 2 trial, we get the immunogenicity data
(measuring the amount of antibody)
• Given the data from phase 2 trial (antibody data), can we predict
what the efficacy of the vaccine will be?
• Training dataset: efficacy and antibody data from all available vaccines
Methods
• The first step is always to visualise your data, so why don’t we plot
efficacy against antibody first?
High antibody = high efficacy
Low antibody = low efficacy
Can we simply do a classification method based on the
level of antibody?
Methods
• The model is a distribution-free binary classification model, based on
the threshold level of antibody
• The lower your antibody level, higher chance for you to be infected,
so the vaccine efficacy will be lower
• The higher your antibody level, lower chance for you to be infected,
so the vaccine efficacy will be higher
• We want to know what is this threshold of antibody
We normalised the antibody to the convalescent patients
(the mean for convalescent is one)
Covaxin data came out a bit later, so we used covaxin to
validate our ‘classifier’ model
Using our classifier, as long as we have antibody data (from
phase 2 trial), we can predict any vaccine efficacy
CureVac mRNA vaccine failure – why???
Simple data visualisation can help to answer
Because lower dose than Pfizer and Moderna

Más contenido relacionado

Similar a Predicting Health Outcomes from Data

STDS- recent diagnosis methods@1223.pptx
STDS- recent diagnosis methods@1223.pptxSTDS- recent diagnosis methods@1223.pptx
STDS- recent diagnosis methods@1223.pptxKamalJungShahi
 
Laboratory monitoring of Progression of HIV
Laboratory monitoring of  Progression of HIVLaboratory monitoring of  Progression of HIV
Laboratory monitoring of Progression of HIVAnkita Mohanty
 
Effect of healthy diet on covid-19
Effect of healthy diet on covid-19Effect of healthy diet on covid-19
Effect of healthy diet on covid-19saimashahab1
 
Development of monoclonal antibodies Workshop
Development of monoclonal antibodies WorkshopDevelopment of monoclonal antibodies Workshop
Development of monoclonal antibodies WorkshopAngel Hernández
 
Biostatistics and Statistical Bioinformatics
Biostatistics and Statistical BioinformaticsBiostatistics and Statistical Bioinformatics
Biostatistics and Statistical BioinformaticsSetia Pramana
 
Cadth 2015 d5 symposium 2015 endonodal trials - version 2
Cadth 2015 d5 symposium 2015   endonodal trials - version 2Cadth 2015 d5 symposium 2015   endonodal trials - version 2
Cadth 2015 d5 symposium 2015 endonodal trials - version 2CADTH Symposium
 
Immune Monitoring
Immune MonitoringImmune Monitoring
Immune MonitoringPamoja
 
Dr. Stephanie Rossow - Applications of Next Generation Sequencing
Dr. Stephanie Rossow - Applications of Next Generation SequencingDr. Stephanie Rossow - Applications of Next Generation Sequencing
Dr. Stephanie Rossow - Applications of Next Generation SequencingJohn Blue
 
Research Paper Presentation: Sensitivity Evaluation of 2019 Novel Coronavirus...
Research Paper Presentation: Sensitivity Evaluation of 2019 Novel Coronavirus...Research Paper Presentation: Sensitivity Evaluation of 2019 Novel Coronavirus...
Research Paper Presentation: Sensitivity Evaluation of 2019 Novel Coronavirus...Shaista Jabeen
 
Pinning control of disease networks
Pinning control of disease networksPinning control of disease networks
Pinning control of disease networksEben du Toit
 
Ryblov - Presentation (ppt)
Ryblov - Presentation (ppt)Ryblov - Presentation (ppt)
Ryblov - Presentation (ppt)Artem Ryblov
 
Epcm l9(new) screening for diseases
Epcm l9(new) screening for diseasesEpcm l9(new) screening for diseases
Epcm l9(new) screening for diseasesDr Ghaiath Hussein
 
Amia tb-review-13
Amia tb-review-13Amia tb-review-13
Amia tb-review-13Russ Altman
 
Beating Bugs with Big Data: Harnessing HPC to Realize the Potential of Genomi...
Beating Bugs with Big Data: Harnessing HPC to Realize the Potential of Genomi...Beating Bugs with Big Data: Harnessing HPC to Realize the Potential of Genomi...
Beating Bugs with Big Data: Harnessing HPC to Realize the Potential of Genomi...Tom Connor
 
07.31.20 | Vaccines for the Prevention of COVID-19: An Unprecedented Need – A...
07.31.20 | Vaccines for the Prevention of COVID-19: An Unprecedented Need – A...07.31.20 | Vaccines for the Prevention of COVID-19: An Unprecedented Need – A...
07.31.20 | Vaccines for the Prevention of COVID-19: An Unprecedented Need – A...UC San Diego AntiViral Research Center
 
Bioclonetics summary presentation~july 2016
Bioclonetics summary presentation~july 2016Bioclonetics summary presentation~july 2016
Bioclonetics summary presentation~july 2016Charles S. Cotropia
 

Similar a Predicting Health Outcomes from Data (20)

STDS- recent diagnosis methods@1223.pptx
STDS- recent diagnosis methods@1223.pptxSTDS- recent diagnosis methods@1223.pptx
STDS- recent diagnosis methods@1223.pptx
 
Laboratory monitoring of Progression of HIV
Laboratory monitoring of  Progression of HIVLaboratory monitoring of  Progression of HIV
Laboratory monitoring of Progression of HIV
 
Effect of healthy diet on covid-19
Effect of healthy diet on covid-19Effect of healthy diet on covid-19
Effect of healthy diet on covid-19
 
Development of monoclonal antibodies Workshop
Development of monoclonal antibodies WorkshopDevelopment of monoclonal antibodies Workshop
Development of monoclonal antibodies Workshop
 
Biostatistics and Statistical Bioinformatics
Biostatistics and Statistical BioinformaticsBiostatistics and Statistical Bioinformatics
Biostatistics and Statistical Bioinformatics
 
Cadth 2015 d5 symposium 2015 endonodal trials - version 2
Cadth 2015 d5 symposium 2015   endonodal trials - version 2Cadth 2015 d5 symposium 2015   endonodal trials - version 2
Cadth 2015 d5 symposium 2015 endonodal trials - version 2
 
Immune Monitoring
Immune MonitoringImmune Monitoring
Immune Monitoring
 
HIV MANAGEMENT
HIV MANAGEMENT HIV MANAGEMENT
HIV MANAGEMENT
 
Dr. Stephanie Rossow - Applications of Next Generation Sequencing
Dr. Stephanie Rossow - Applications of Next Generation SequencingDr. Stephanie Rossow - Applications of Next Generation Sequencing
Dr. Stephanie Rossow - Applications of Next Generation Sequencing
 
Research Paper Presentation: Sensitivity Evaluation of 2019 Novel Coronavirus...
Research Paper Presentation: Sensitivity Evaluation of 2019 Novel Coronavirus...Research Paper Presentation: Sensitivity Evaluation of 2019 Novel Coronavirus...
Research Paper Presentation: Sensitivity Evaluation of 2019 Novel Coronavirus...
 
Pinning control of disease networks
Pinning control of disease networksPinning control of disease networks
Pinning control of disease networks
 
Ryblov - Presentation (ppt)
Ryblov - Presentation (ppt)Ryblov - Presentation (ppt)
Ryblov - Presentation (ppt)
 
Lab diagnosis hiv
Lab diagnosis hivLab diagnosis hiv
Lab diagnosis hiv
 
Epcm l9(new) screening for diseases
Epcm l9(new) screening for diseasesEpcm l9(new) screening for diseases
Epcm l9(new) screening for diseases
 
Amia tb-review-13
Amia tb-review-13Amia tb-review-13
Amia tb-review-13
 
Beating Bugs with Big Data: Harnessing HPC to Realize the Potential of Genomi...
Beating Bugs with Big Data: Harnessing HPC to Realize the Potential of Genomi...Beating Bugs with Big Data: Harnessing HPC to Realize the Potential of Genomi...
Beating Bugs with Big Data: Harnessing HPC to Realize the Potential of Genomi...
 
Incidence Testing in HIV
Incidence Testing in HIVIncidence Testing in HIV
Incidence Testing in HIV
 
07.31.20 | Vaccines for the Prevention of COVID-19: An Unprecedented Need – A...
07.31.20 | Vaccines for the Prevention of COVID-19: An Unprecedented Need – A...07.31.20 | Vaccines for the Prevention of COVID-19: An Unprecedented Need – A...
07.31.20 | Vaccines for the Prevention of COVID-19: An Unprecedented Need – A...
 
Bioclonetics summary presentation~july 2016
Bioclonetics summary presentation~july 2016Bioclonetics summary presentation~july 2016
Bioclonetics summary presentation~july 2016
 
WHO global RSV surveillance schema for future planning. Moving from RSV detec...
WHO global RSV surveillance schema for future planning. Moving from RSV detec...WHO global RSV surveillance schema for future planning. Moving from RSV detec...
WHO global RSV surveillance schema for future planning. Moving from RSV detec...
 

Último

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 

Último (20)

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 

Predicting Health Outcomes from Data

  • 2. Why study data science?
  • 3. Why study data science?
  • 4. What is health data science? • Data-driven solution to solve complex real world health problems • Or to derive knowledge from unstructured and messy data • It is an interdisciplinary field: biostatistics, computer science, epidemiology, public health, mathematics, etc
  • 6. Real life health data science example • HIV: • Visualising the pattern of early HIV transmission within the mucosal barrier • COVID-19: • What can predict covid-19 neutralisation activity? • Can we predict covid-19 vaccine efficacy?
  • 8. Background • Early HIV transmission event might occur during vaginal or anal sex • Want to investigate if the mucosal barrier (within the vaginal tissue) is effective in blocking HIV virus transmission or not
  • 9. If the mucosal barrier is good in preventing viral transmission, this is what we expect to see
  • 10. If the mucosal barrier is not good at preventing transmission, multiple viruses can be found (random infection)
  • 11. If the mucosal barrier is not good at preventing transmission, multiple viruses can be found (clustered infection)
  • 13. Data
  • 14. 14 Data Visualisation Can still see many viral variants no evidence that the vaginal tissue is effective in blocking viral entry
  • 15. Need a formal method • How can we say (formally) if infection is spatially clustered (or not) ? • Mantel test (or Mantel and Valand) -> relate a matrix of “geographical” distance and a matrix of “biological” distance • So, need to define the “geographical” matrix and “biological” matrix first 15
  • 16. “Geographical” distance • Euclidean distance di, j = (xi - xj )2 +(yi - yj )2 16
  • 17. “Biological” distance • Morisita – Horn index of overlap MH = 2 n1in2i N1N2 i å n1i 2 N1 + n2i 2 N2 i å 17
  • 18. “Biological” distance • Similarity between 1 and 2 = 0.98 • Similarity between 1 and 3 = 0.46 18
  • 19. Mantel Test (or Mantel and Valand) • Testing the association between two matrices • Mantel quantity (Zm) is given by: • Basic idea -> permutation test • Randomly changing the rows and columns of the two matrices • And store the value of Zm for each permutation of rows and columns Zm = gij j å i å bij 19
  • 20. 20 Low p-values: infection is clustered locally within the vaginal tissue
  • 21. What can predict covid-19 viral neutralisation activity?
  • 22. Background • Neutralising antibody (NAb): antibody that can defend the host from the specific pathogen • Data: 41 convalescent adults; measured several immunological parameters (13 parameters total) • Goal: want to know in those 41 recovered patients, what immunological parameters can be used to predict NAb
  • 23. Methods • Data visualisation is very important in data science • First step: plot the correlation matrix for the whole dataset
  • 24. Microneutralization is positively correlated with SARS-CoV-2 RBD Microneutralization is negatively correlated with CCR6+CXCR3-
  • 25. Ok, not very informative…. Have so many things correlated with microneutralization
  • 26. Methods • Correlation matrix shows that Nab is correlated with so many things • Next step: Can I find some hidden features in this dataset? • Method: principal component analysis (PCA)
  • 27. The main focus is microneutralization If the angle between microneut and another variable is less than 90o; then it’s a positive association If the angle between microneut and another variable is greater than 90o; then it’s a negative association
  • 28. For instance, higher ELISA S trimer gives higher microneutralization level (less than 90o) For instance, higher CCR6+CXCR3- gives lower microneutralization level (more than 90o)
  • 29. Methods • PCA visualisation is better than correlation matrix • But, still cannot just pick one thing that can be used to predict NAb • Next step: I want to only pick one thing to predict NAb • Method: multiple linear regression with a backward model selection strategy • The idea is to run a linear regression with all the variables, and iteratively remove non-significant predictor until all the predictors are significant
  • 30. Two main things are highly predictive of NAb
  • 33. Background • At the end of the phase 2 trial, we get the immunogenicity data (measuring the amount of antibody) • Given the data from phase 2 trial (antibody data), can we predict what the efficacy of the vaccine will be? • Training dataset: efficacy and antibody data from all available vaccines
  • 34. Methods • The first step is always to visualise your data, so why don’t we plot efficacy against antibody first?
  • 35. High antibody = high efficacy Low antibody = low efficacy Can we simply do a classification method based on the level of antibody?
  • 36. Methods • The model is a distribution-free binary classification model, based on the threshold level of antibody • The lower your antibody level, higher chance for you to be infected, so the vaccine efficacy will be lower • The higher your antibody level, lower chance for you to be infected, so the vaccine efficacy will be higher • We want to know what is this threshold of antibody
  • 37. We normalised the antibody to the convalescent patients (the mean for convalescent is one) Covaxin data came out a bit later, so we used covaxin to validate our ‘classifier’ model Using our classifier, as long as we have antibody data (from phase 2 trial), we can predict any vaccine efficacy
  • 38. CureVac mRNA vaccine failure – why???
  • 39. Simple data visualisation can help to answer Because lower dose than Pfizer and Moderna