The document discusses Wright's F-statistics and Cockerham's θ-statistics, which are methods used to calculate genetic differentiation between populations. It also discusses methods to detect signatures of positive selection, including Extended Haplotype Homozygosity (EHH), integrated Haplotype Score (iHS), and cross population Extended Haplotype Homozygosity (xpEHH). EHH detects when a particular haplotype is over-represented in a population by measuring how quickly homozygosity declines with genetic distance from the core haplotype. iHS and xpEHH are derived from EHH scores to identify haplotypes that have increased in frequency due to positive selection.
3. Fst Wright’s F-statistics
3 types of Heterozygosity[4]
Individual, Subpopulation, Total Population
1 HI = 1
n
n
i=1
ˆHi
2 HS = 1
n
n
i=1 2pi qi
3 HT = 2¯p¯q
( ˆHi : observed heterozygosity in ith subpopulation, 2pi qi : average
heterozygosity in ith subpopulation, 2¯p¯q: average heterozygosity of total
population)
Locus 별로 값 구한다.
김진섭 (GSPH, SNU) FST & Some Selection Index November 22, 2017 3 / 65
4. Fst Wright’s F-statistics
Wright’s F-statistics[4]
1 FIS = HS −HI
HS
2 FST = HT −HS
HT
3 FIT = HT −HI
HT
Example
FST = 0 → Subpopulation의 effect없다!! 차이 없다.
FST = 1 → Subpopulation별로 차이가 크다.
Simple relation
1 − FIT = (1 − FIS )(1 − FST )
김진섭 (GSPH, SNU) FST & Some Selection Index November 22, 2017 4 / 65
7. Fst Wright’s F-statistics
FST inference[5]
Convenient measure of genetic differentiation.
Most widely used descriptive statistics in population and
evolutionary genetics.
Natural selection in particular subpopulation.
김진섭 (GSPH, SNU) FST & Some Selection Index November 22, 2017 7 / 65
8. Fst Wright’s F-statistics
Problem in estimation
HT = 2¯p¯q
1 Subpopulation마다 sample수가 다르면??
2 Ex: SASIA 1000명, Oceania 100명..
3 제대로 된 ¯p 추정이 아님.
김진섭 (GSPH, SNU) FST & Some Selection Index November 22, 2017 8 / 65
9. Fst Cockerham’s θ-statistics
ANOVA approach[1, 5]
θ =
σP
σT
(σP: variance due to population, σT : total variance)
김진섭 (GSPH, SNU) FST & Some Selection Index November 22, 2017 9 / 65
11. Fst Cockerham’s θ-statistics
θ inference
Population > 2
대세와 다른 population이 있다!!
어떤 population인지는 말 안해준다.
Pairwise FST
2 population만 가지고 계산.
상대적인 비교.
김진섭 (GSPH, SNU) FST & Some Selection Index November 22, 2017 11 / 65
13. Fst Cockerham’s θ-statistics
Figure. FST calculated for each SNP between Tibetan and Han populations[6]
김진섭 (GSPH, SNU) FST & Some Selection Index November 22, 2017 13 / 65
14. Fst Cockerham’s θ-statistics
Figure. Inter-population pairwise comparisons of FST statistics
http://academic.reed.edu/biology/professors/srenn/pages/
research/2011_students/sean/SM_thesis.html
김진섭 (GSPH, SNU) FST & Some Selection Index November 22, 2017 14 / 65
15. Selection Index
Contents
1 Fst
Wright’s F-statistics
Cockerham’s θ-statistics
2 Selection Index
EHH
iHS
xp-EHH
3 Practice
김진섭 (GSPH, SNU) FST & Some Selection Index November 22, 2017 15 / 65
16. Selection Index
특정 인구집단에 특정 haplotype이 많냐??
Example: Erik Corona’s slide - Next slide
김진섭 (GSPH, SNU) FST & Some Selection Index November 22, 2017 16 / 65
24. Selection Index EHH
EHH: Sabeti, Reich et al. (2002)[7]
Extended Haplotype Homozygosity
Random으로 2개 haplotype 뽑았을 때 그것이 같을 확률은??
0 → haplotype이 다 다르다.
1 → haplotype이 모두 같다.
관심있는 haplotype을 Core라 한다.
EHHt =
s
i=1
eti
2
ct
2
(t: core haplotype, c: the number of samples of a particular core
haplotype, e: the number of samples of a particular extended haplotype, s:
the number of unique extended haplotype)
김진섭 (GSPH, SNU) FST & Some Selection Index November 22, 2017 24 / 65
31. Selection Index EHH
AATTACAGATTACA AACACGC 10
AATTACAGATTACA ATGATAG 8
AATTACAGATTACA AACCCAG 7
AATTACAGATTACA CTGACAG 5
AATTACAGATTACA CAGACAG 3
AATTACAGATTACA AACACAG 6
AATTACAGATTACA CACACAG 4
AATTACAGATTACA CACCCAG 7
GATTACAGATTACA CACATAG 24
GATTACAGATTACA CACACAG 26
EHH What It Is & What It Isn’t
Detects over‐representation of a haplotype
This will raise the p(two haps are homozygous)
Does NOT detect if a haplotype spread quickly
Low recombination != spread quickly
AATTACAGATTACA AACACGC 22
AATTACAGATTACA ATGATAG 28
GATTACAGATTACA CACATAG 24
GATTACAGATTACA CACACAG 26
김진섭 (GSPH, SNU) FST & Some Selection Index November 22, 2017 31 / 65
53. Selection Index iHS
iHS Characteristics
As both alleles have the same AUC, iHS zero
Large negative values indicate selection of allele in the
denominator
Large positive values indicate selection of allele in the
numerator
Still heavily biased by allele frequency!
Z‐score normalization
김진섭 (GSPH, SNU) FST & Some Selection Index November 22, 2017 53 / 65
54. Selection Index iHS
Unstandardized iHS ‐ E(iHS | Allele Frequency)
SD(iHS | Allele Frequency)
E(iHS | Allele Freq.): Estimated from empirical distribution
SD(iHS | Allele Freq.): Estimated from empirical distribution
Integrated Haplotype Score (iHS)
= iHS
김진섭 (GSPH, SNU) FST & Some Selection Index November 22, 2017 54 / 65
55. Selection Index iHS
iHS Overview
iHS and REHH are EHH based methods to detect
positive selection
iHS outperforms REHH in specific allele frequencies
They don’t completely outperform each other
김진섭 (GSPH, SNU) FST & Some Selection Index November 22, 2017 55 / 65
57. Selection Index xp-EHH
xp-EHH: sabeti(2007)[8]
Population 별, 같은 allele별 integreted EHH를 비교!!
김진섭 (GSPH, SNU) FST & Some Selection Index November 22, 2017 57 / 65
58. Selection Index xp-EHH
Cross Population EHH (XP‐EHH)
AATTACAGATTACA AACACGC 10
AATTACAGATTACA ATGATAG 8
AATTACAGATTACA AACCCAG 7
AATTACAGATTACA CTGACAG 5
AATTACAGATTACA CAGACAG 3
AATTACAGATTACA AACACAG 6
AATTACAGATTACA CACACAG 4
AATTACAGATTACA CACCCAG 7
Same allele but diff population
AATTACAGATTACA CACATAG 20
AATTACAGATTACA CACACAG 30
0.5
XP‐EHH = ln(3.3/0.5) = 1.89 Z‐score Norn
Integrate EHH over distance from allele
Calculated for fwd/rev sides independently
Integrate until EHH = 0.04 in e.a. population
3.3
김진섭 (GSPH, SNU) FST & Some Selection Index November 22, 2017 58 / 65
59. Selection Index xp-EHH
REHH and iHS are more or less complementary
e.a. is better at detecting pos. sel. at diff freqs.
XP‐EHH
Can detect pos. sel. in high freq. alleles
Susceptible to population variation in
recombination rate
Overview
김진섭 (GSPH, SNU) FST & Some Selection Index November 22, 2017 59 / 65
61. Selection Index xp-EHH
Rsb[9]
Population끼리 비교하는 또다른 지표.
Population별로만 비교.
Locus별로 두 allele의 integrated EHH의 average: iES
Locus의 대략적인 selection정도를 population끼리 비교.
김진섭 (GSPH, SNU) FST & Some Selection Index November 22, 2017 61 / 65
63. Practice
FST
hierfstat[3]
PER3 gene in HGDP(Human Genome Diversity Panel): 289 SNPs &
7 population
EHH, iHS
rehh[2]
패키지 자체 제공 예제
김진섭 (GSPH, SNU) FST & Some Selection Index November 22, 2017 63 / 65
64. Practice
Reference I
[1] Cockerham, C. C. (1969). Variance of gene frequencies. Evolution, pages 72–84.
[2] Gautier, M. and Vitalis, R. (2012). rehh: an r package to detect footprints of selection in genome-wide snp data from
haplotype structure. Bioinformatics, 28(8):1176–1177.
[3] Goudet, J. (2005). Hierfstat, a package for r to compute and test hierarchical f-statistics. Molecular Ecology Notes,
5(1):184–186.
[4] Hamilton, M. (2011). Population genetics. John Wiley & Sons.
[5] Holsinger, K. E. and Weir, B. S. (2009). Genetics in geographically structured populations: defining, estimating and
interpreting fst. Nature Reviews Genetics, 10(9):639–650.
[6] Huerta-S´anchez, E., Jin, X., Bianba, Z., Peter, B. M., Vinckenbosch, N., Liang, Y., Yi, X., He, M., Somel, M., Ni, P., et al.
(2014). Altitude adaptation in tibetans caused by introgression of denisovan-like dna. Nature, 512(7513):194–197.
[7] Sabeti, P. C., Reich, D. E., Higgins, J. M., Levine, H. Z., Richter, D. J., Schaffner, S. F., Gabriel, S. B., Platko, J. V.,
Patterson, N. J., McDonald, G. J., et al. (2002). Detecting recent positive selection in the human genome from haplotype
structure. Nature, 419(6909):832–837.
[8] Sabeti, P. C., Varilly, P., Fry, B., Lohmueller, J., Hostetter, E., Cotsapas, C., Xie, X., Byrne, E. H., McCarroll, S. A.,
Gaudet, R., et al. (2007). Genome-wide detection and characterization of positive selection in human populations. Nature,
449(7164):913–918.
[9] Tang, K., Thornton, K. R., and Stoneking, M. (2007). A new approach for using genome scans to detect recent positive
selection in the human genome. PLoS biology, 5(7):e171.
김진섭 (GSPH, SNU) FST & Some Selection Index November 22, 2017 64 / 65