Call Girls Amritsar Just Call 8250077686 Top Class Call Girl Service Available
NSF Northeast Hub Big Data Workshop
1. Building a search engine to find
environmental and phenotypic factors
associated with disease and health
Chirag J Patel
Northeast Big Data Innovation Hub Workshop
02/24/17
chirag@hms.harvard.edu
@chiragjp
www.chiragjpgroup.org
2. P = G + EType 2 Diabetes
Cancer
Alzheimer’s
Gene expression
Phenotype Genome
Variants
Environment
Infectious agents
Nutrients
Pollutants
Drugs
3. We are great at G investigation!
over 2400
Genome-wide Association Studies (GWAS)
https://www.ebi.ac.uk/gwas/
G
4. Nothing comparable to elucidate E influence!
E: ???
We lack high-throughput methods
and data to discover new E in P…
until now!
5. σ2
G
σ2P
H2 =
Heritability (H2) is the range of phenotypic
variability attributed to genetic variability in a
population
Indicator of the proportion of phenotypic
differences attributed to G.
6. Eye color
Hair curliness
Type-1 diabetes
Height
Schizophrenia
Epilepsy
Graves' disease
Celiac disease
Polycystic ovary syndrome
Attention deficit hyperactivity disorder
Bipolar disorder
Obesity
Alzheimer's disease
Anorexia nervosa
Psoriasis
Bone mineral density
Menarche, age at
Nicotine dependence
Sexual orientation
Alcoholism
Lupus
Rheumatoid arthritis
Crohn's disease
Migraine
Thyroid cancer
Autism
Blood pressure, diastolic
Body mass index
Depression
Coronary artery disease
Insomnia
Menopause, age at
Heart disease
Prostate cancer
QT interval
Breast cancer
Ovarian cancer
Hangover
Stroke
Asthma
Blood pressure, systolic
Hypertension
Osteoarthritis
Parkinson's disease
Longevity
Type-2 diabetes
Gallstone disease
Testicular cancer
Cervical cancer
Sciatica
Bladder cancer
Colon cancer
Lung cancer
Leukemia
Stomach cancer
0 25 50 75 100
Heritability: Var(G)/Var(Phenotype) Source: SNPedia.com
G estimates for burdensome diseases are low and variable:
massive opportunity for E discovery
Type 2 Diabetes
Heart Disease
Asthma
7. Eye color
Hair curliness
Type-1 diabetes
Height
Schizophrenia
Epilepsy
Graves' disease
Celiac disease
Polycystic ovary syndrome
Attention deficit hyperactivity disorder
Bipolar disorder
Obesity
Alzheimer's disease
Anorexia nervosa
Psoriasis
Bone mineral density
Menarche, age at
Nicotine dependence
Sexual orientation
Alcoholism
Lupus
Rheumatoid arthritis
Crohn's disease
Migraine
Thyroid cancer
Autism
Blood pressure, diastolic
Body mass index
Depression
Coronary artery disease
Insomnia
Menopause, age at
Heart disease
Prostate cancer
QT interval
Breast cancer
Ovarian cancer
Hangover
Stroke
Asthma
Blood pressure, systolic
Hypertension
Osteoarthritis
Parkinson's disease
Longevity
Type-2 diabetes
Gallstone disease
Testicular cancer
Cervical cancer
Sciatica
Bladder cancer
Colon cancer
Lung cancer
Leukemia
Stomach cancer
0 25 50 75 100
Heritability: Var(G)/Var(Phenotype) Source: SNPedia.com
G estimates for complex traits are low and variable:
massive opportunity for high-throughput E discovery
σ2
E : Exposome!
8. Enhance accessibility of clinical and environmental data,
and analytic artificial intelligence tools!
How can we drive discovery of
environmental factors (E) in disease phenotypes (P)?
9. Enhance accessibility of large open data and tools to
drive discovery of
environmental factors (E) in disease phenotypes (P)
Greg Cooper, MD, PhD
Pittsburgh
Vasant Honavar, PhD
Penn State
Noémie Elhadad, PhD
Columbia
Chirag Patel, PhD
Harvard
12. Where do we get disease P data?
Health record data from your doctor!
• Longitudinal data on millions of patients
• diagnoses, prescriptions, lab reports, notes
• Sitting there in institutional IT infrastructure
• OHDSI provides a unified model to access data across
institutions, enhancing the scientific process!
16. Examples of sources of disparate external exposome datasets
available in the Exposome Data Warehouse
• Geological
• NASA - Cloud and Atmosphere Profiles
• NOAA Climate Data
• Pollution
• EPA Air Quality Surveillance Data Mart, or AirData,
• Socio-Economic
• US Census American Community Survey (ACS)
• Epidemiological
• CDC Wonder, USDA Food Atlas
17. A key challenge:
mashing up Exposome Data Warehouse with patient
data from OHDSI
f(location, time)
PM2.5
income
Pollen count
EPA AirData
American Community
Survey
NOAA Climate
home zipcode
encounter time
22. Temperature(F)
20
40
60
80
Lag
0
1
2
3
4
5
6
Relativerisk
0.95
1.00
1.05
20 40 60 80
0.91.11.3
Overall temperature effect
Mean temperature (F)
Relativerisk
●
●
●
●
● ● ●
0 1 2 3 4 5 6
0.901.001.10
Lag effect at T=60F
Lag(day)
RelativeRisk
20 40 60 80
0.901.001.10
Temperature effect at Lag=0
Mean temperature (F)
RelativeRisk
Does temperature influence asthma ER visits?: yes!
Relative risk of asthma attack by mean temperature
23. Rates of asthma attacks depend on season?: yes!
0.9
1.0
1.1
20 40 60 80
season Fall Spring Summer Winter
26. Does temperature (and weather) influence asthma-
related ER visits in kids?: the tip of the iceberg!
Lag
0
1
2
20 40 60 80
0.91.11.3
Overall temperature effect
Mean temperature (F)
Relativerisk
● ●
5 6
0F
20 40 60 80
0.901.001.10
Temperature effect at Lag=0
Mean temperature (F)
RelativeRisk
What other scientific questions?
•what is influence of pollen?
•what is the influence of air pollution?
•what about adults?
Can we replicate the analysis?
•different populations
•using different data
•with different analysts
27. ExposomeDB
Environmental Protection Agency
AirData
US Census
Socioeconomic data
National Oceanic and Atmospheric Administration (NOAA)
Climate Data
Observational Health Data
Sciences and Informatics
(OHDSI)
Harvard Medical School Columbia University P&S
Geotemporal database Network of observational data
Causal Analytics
A2
A3
A1
AX Aim X
Analytics Workshop
Hands-on workshops
for leveraging work in
Aims 1-3
Training Resources
Jupyter notebooks
Student internships at Pitt,
Harvard, PSU, and
Columbia P&S
OHDSI Node
A4
University of Pittsburgh Penn State University
Integrating the ExposomeData Warehouse with OHDSI
and causal modeling tools to drive and demonstrate
discovery.
28. Many hypotheses are possible to address: useful for
public health & planning!
What is the effect of air pollution levels in disease?
Do adverse weather conditions influence hospital use?
What pharmaceutical drugs lead to adverse health
outcomes?
How does socioeconomic context influence hospital use,
disease rates, and recovery?
29. We will harness tools in machine learning
extract signal from noise!
Greg Cooper, MD, PhD
Pittsburgh
Vasant Honavar, PhD
Penn State
SES
PM2.5
asthma
oF
bayesian networks
socio-economic
case-crossover
weather
air pollution
Sci Rep 2016
31. Integrating the ExposomeDB with OHDSI and analytics
to drive and demonstrate discovery by the community!
(trainees especially welcome)
• 2-day hands-on workshop in New York or Boston
• remote “exchange” internship program and 2-week
immersion
• dissemination of electronic training resources
32. Many hypotheses are possible to address: useful for can
we build a machine learning predictor to estimate E?
Eye color
Hair curliness
Type-1 diabetes
Height
Schizophrenia
Epilepsy
Graves' disease
Celiac disease
Polycystic ovary syndrome
Attention deficit hyperactivity disorder
Bipolar disorder
Obesity
Alzheimer's disease
Anorexia nervosa
Psoriasis
Bone mineral density
Menarche, age at
Nicotine dependence
Sexual orientation
Alcoholism
Lupus
Rheumatoid arthritis
Crohn's disease
Migraine
Thyroid cancer
Autism
Blood pressure, diastolic
Body mass index
Depression
Coronary artery disease
Insomnia
Menopause, age at
Heart disease
Prostate cancer
QT interval
Breast cancer
Ovarian cancer
Hangover
Stroke
Asthma
Blood pressure, systolic
Hypertension
Osteoarthritis
Parkinson's disease
Longevity
Type-2 diabetes
Gallstone disease
Testicular cancer
Cervical cancer
Sciatica
Bladder cancer
Colon cancer
Lung cancer
Leukemia
Stomach cancer
0 25 50 75 100
Heritability: Var(G)/Var(Phenotype) Source: SNPedia.com
σ2
E : Exposome!
33. Harvard DBMI
Isaac Kohane
Susanne Churchill
Nathan Palmer
Sunny Alvear
Michal Preminger
Chirag J Patel
chirag@hms.harvard.edu
@chiragjp
www.chiragjpgroup.org
Thanks
RagGroup
Chirag Lakhani
Yeran Li
Shreyas Bhave
Rolando Acosta
Noémie Elhadad (Columbia)
Vasant Honavar (PSU)
Greg Cooper (Pitt)
George Hripcsak (Columbia)
René Baston
Katie Naum
Kathleen McKeown
NIH Common Fund
Big Data to Knowledge