Towards an understanding of diversity in biological and biomedical systems
1. Data analysis workshop for massive
sequencing data
Towards an understanding of diversity in
biological and biomedical systems
Igor Zwir
Department Computer Science and Artificial Intelligence,
University of Granada, Granada, Spain
Howard Hughes Medical Institute
Yale School of Medicine, NewHeaven, CT, US
Department of Psychiatry Washington University School of
Medicine, St. Louis, MO, US
e-mail: zwiri@psychiatry.wustl.edu
2. “Some people enjoy reading papers, “Some people enjoy reading papers, juggling
juggling possibilities and formulating ideas, possibilities and formulating ideas, even if
even if they can’t work a pipette” they can’t write a line of a computer program”
(“Reasoning for results”, Nature, Bray, D., 2001) (“Reasoning for results”, Groisman Lab, 2007)
3. “…organisms of the most different sorts are
constructed from the very same battery of
genes. The diversity of life forms results from
small changes in the regulatory systems that
govern expression of these genes.”
François Jacob
In Of flies, mice and men
4. Salmonella : A Gram-negative
pathogen with a varied lifestyle
5. Signal transduction cascade by
two-component regulatory systems
Signal low Mg2+
Sensor PhoQ
Regulator
PhoP -PO3
Effectors mgtA mgtB
Response Mg2+ transport Mg2+ transport
6. Two-component systems regulate physiological
and virulence functions
System Signal Function
ArcA/ArcB Quinones Anaerobic respiration
OmpR/EnvZ Osmolarity changes Osmoadaptation
NtrB/NtrC Low nitrogen levels Nitrogen metabolism
PhoP/PhoQ Low Mg2+ Virulence, growth in low Mg2+
PmrA/PmrB Fe3+ and Al3+ Resistance to polymyxin B
SsrA/SpiR Unknown Virulence
TtrR/TtrS Tetrathionate Anaerobic respiration
7. The Salmonella PMRA/PMRB system
responds to Fe3+ and low Mg2+
low Mg2+ high Fe3+
PhoQ PmrB
PhoP -PO3 PmrA -PO3
pmrD PmrD
pbgP
LPS modification
8. The E. Coli PMRA/PMRB system
responds to Fe3+ but not to low Mg2+
low Mg2+ high Fe3+
PhoQ PmrB
PhoP -PO3 PmrA -PO3
pmrD PmrD pbgP
85.4% 93.3% LPS modification
9. The Salmonella but not the E. coli ugd gene is
regulated by the PhoP protein
PhoQ PhoQ
PhoP -PO3 PhoP -PO3
ugd ugd
85.4% 93.3% 85.5%
(the median amino acid identity between Salmonella and E. coli proteins is 90%)
10. PhoP-PhoQ Two component system
regulates 5% of Salmonella genes
Consensus Motif
Salmonella LT2 & E. coli K12
11. Single motif vs. a family of PhoP
submotifs
+Sensitivity
+Specificity +Specificity
Harari et al., PloS computational Biology, 2010
22. Predicting gene binding and transcription of
PhoP regulated targets
ancestral
horizontally-acquired
23. Summary
TF Affinity for its binding sites determine promoter
time and levels in naked DNA
Binding and Transcription in vivo depends on where
the binding sites sit (promoter architectures)
Cis-acting features in the PhoP-activated promoters
determine non-arbitrary organized architectures
The differences of the regulon througout distinct
species depends on the evolution of the binding sites
and promoter architectures
24. Two paradigms: multiple genes with small
effect, or few genes with large effect
London Metro Boston Metro
de Vries, Nature Medicine, 2009
25. Phenotypic-genotypic relations describe a risk
surface of Schizophrenia
R19: R10:
6 affected, 11 affected,
1 Relative 6 Relatives
Gottesman II, Gould TD. Am J Psychiatry, 2003
0.1% of the population affected
Multigenic disease
Non-genetic contributions
Risk: Monozygotic twins 50% - Dizygotic twins 15%.
26. Uncovering genotype-phenotype relations by
independently clustering both domains
Phenotype clusters
Trios (affected, relatives and
controls)
Subjects
70 clinical attributes
Cognitive
Motor Genotype clusters
Behavioral
Structural
Subjects
SNPs chips
27. Identifying significant genotype-phenotype
relations among inter-domain clusters
0.01
1E-10
Romero-Zaliz et al, Nucleic Acids Research, 2008; Romero-Zaliz. et al, IEEE Trans. on
Evol. Computation, 2008, de Erausquin et al, Mol. Psych in Press
36. Summary
We proposed the first data-driven definition of the Schizophrenia risk
function
Concurrent CGWAS provides a panoramic vision of phenotype-
genotype associations, each of which can be used by traditional
GWAS analysis
Four signaling pathways associated with risk of schizophrenia were
identified
Phenotype-genotype relations were sufficient to reliably predict
subject status
This finding opens the door for early detection and preventative
intervention prior to the onset of psychotic symptoms in
high/intermediate risk populations
37. Acknowledgements
Eduardo Groisman Lab
Howard Hughes Medical Institute
Dept. of Computer Science and
Dongwoo Shin Artificial Intelligence
Chistian Perez University of Granada, Spain
Henry Huang Lab Coral del Val
Dept. of Molecular Microbiology Pat Anders
Washington U. Javier Arnedo
School of Medicine, USA Luis Miguel Merino
Rocio Romero-Zaliz (U. de Granada)
Gabriel de Erausquin Lab Cristina Rubio-Escudero (U. Seville)
Departments of Psychiatry and Christopher Previti (U. Bergen)
Neurology Oscar Harari (Washington U.)
Harvard Med. School
38. Acknowledgments
Francisco Herrera Mining for Modeling Lab
DECSAI,
University of Granada DECSAI,
University of Granada
Coral del Val
DECSAI,
University of Granada Gabriel de Eraúsquin
Department of Psychiatry,
Washington University in St. Louis
Igor Zwir
DECSAI,
Eduardo Groisman
University of Granada HHMI, Department of Molecular Biology,
Washington University in St. Louis
Kathleen Marchal Henry Huang
Department of Microbial
Department of Molecular Biology,
and Molecular Systems
Washington University in St. Louis
Katholieke Universiteit Leuven