THE GENETIC ARCHITECTURES OF PSYCHOLOGICAL TRAITS

THE GENETIC
ARCHITECTURES OF
PSYCHOLOGICAL TRAITS
James J. Lee
University of Minnesota Twin Cities

THREE LAWS OF BEHAVIOR
GENETICS
• First Law. All behavioral traits
are heritable.
• Second Law. The effect of being
raised in the same family is
smaller than the effect of
genes.
• Third Law. A substantial portion
of the variance in behavioral
traits is not accounted for by
genes or families.
Eric Turkheimer, the coiner of the Three
Laws of Behavior Genetics.

EVIDENCE FROM CLASSICAL
QUANTITATIVE GENETICS
90 100 110 120 130
The Minnesota Adolescent Adoption Study (Scarr & Weinberg, 1978; Scarr, 1997)
80 90 100 110 120 130
BIOLOGICAL FAMILIES
MIDPARENT IQ
OFFSPRING IQ
β = 0.61± 0.07
90 100 110 120 130
80 90 100 110 120 130
ADOPTIVE FAMILIES
MIDPARENT IQ
OFFSPRING IQ
β = 0.13 ± 0.08

EVIDENCE FROM CLASSICAL
QUANTITATIVE GENETICS
BIOLOGICAL FAMILIES
ADOPTIVE FAMILIES
The Sibling Interaction and Behavior Study (McGue et al., 2007)

THE SEARCH FOR CAUSAL
VARIANTS AT THE DNA LEVEL
• If studies of twins and other
kinships support the Three
Laws, it seems justified to
search for the causal loci at
the DNA level.
• This is the aim of genome-wide
association studies
(GWAS).
A research subject provides DNA by
spitting into a tube with a preservative.

BACKLASH AGAINST GWAS
OF PSYCHOLOGICAL TRAITS
• Correlations between
common variants and
phenotypes such as general
cognitive ability (g) and
schizophrenia have turned out
to be very small.
• We have just reported three
common SNPs that each
account for ~0.02% of IQ
variance (Rietveld et al., 2014).

BACKLASH AGAINST GWAS
OF PSYCHOLOGICAL TRAITS
• In response, a fellow at the
Center for Genetics and
Society wrote a blog post
called “The Stupidity of Smart
Genes.”
• Some academics are scarcely
more charitable. Kevin Mitchell
of Trinity College Dublin: “The
idea that this trait is
determined by common
variants … is really unproven.”

MY RESPONSE TO THE
BACKLASH
• We seem to have a paradox:
if traits are as heritable as
implied by classical studies,
then where are the genes?
• I argue that the heritability is
hiding in plain sight: there are
thousands of causal variants,
each of which exerts a small
effect—which means that it is
difficult to find any single one.

MY RESPONSE TO THE
BACKLASH
• I provide an estimate of the
total GWAS sample size
required to capture the entire
heritability (due to common
variants) of a phenotype like g.
• Most importantly, I argue that
chasing down thousands of
DNA variants with small
effects is a worthy scientific
enterprise.

HERITABILITY ESTIMATED
DIRECTLY FROM DNA DATA
• A critic might claim that
GWAS of psychological
traits cannot be guaranteed
to produce more “hits” as
sample size grows.
• Perhaps “indirect” heritability
estimates from studies of
twins, adoptees, etc. are
flawed and there are not
that many causal loci after all. Richard Nixon and Forrest Gump may be
slightly less similar at the DNA level than
most other random pairs of people.

• In recent years a new
method, often called GCTA
(after the software package
Genome-wide Complex
Trait Analysis), obtains
“direct” estimates of
heritability from DNA data.
• Think of two people who
are not related to you. Richard Nixon and Forrest Gump may be

• If we genotype/sequence all
three individuals, you will
turn out by chance to be
slightly more similar at the
genetic level to person A
than to person B.
• Are you also
phenotypically more
similar to A than to B? Richard Nixon and Forrest Gump may be

• In a large sample of
unrelated people, we look
at all pairs of people and
calculate their genetic and
phenotypic similarities.
• Higher heritability means
that genetically similar
people will tend to be
phenotypically similar. Richard Nixon and Forrest Gump may be

E (yy0) = A2A
+ I2E
,
where Aij =
1
p
Xp
k=1
zikzjk
!
• y: the vector of phenotypic values
• σA2: additive genetic variance
• σE2: residual variance
• A: the matrix of “relatedness” coefficients
• I: the identity matrix
• zik: the standardized gene count of person i at locus k

• According to GCTA, the
heritability of g is roughly
0.45 (Davies et al., 2011;
Chabris et al., 2012).
• This is actually a lower
bound on h2 because many
causal variants (particularly
those where one allele is
rare) are probably not
captured by SNP chips.
Peter Visscher, quantitative geneticist and a
developer of GCTA.

• We have studied the
conditions under which GCTA
provides a valid estimate of
SNP-based heritability (Lee
Chow, 2014).
• If the causal variants tend to
be less well tagged (a realistic
case), then GCTA will be
biased downward.
• Thus, h2GCTA h2SNP h2.
My postdoctoral supervisor, Carson Chow,
goes to the supermarket.

0.0 0.2 0.4 0.6 0.8 1.0
GREML HERITABILIY ESTIMATE
VERY WEAKLY
TAGGED
WEAKLY
TAGGED
MODERATELY
TAGGED
STRONGLY
TAGGED
VERY STRONGLY
TAGGED
The purple horizontal line corresponds to the true h2SNP in our simulations.

• “Direct” estimates based on
DNA data and “indirect”
estimates based on the
correlations between relatives
are thus fully consistent.
• “Our results unequivocally
confirm that a substantial
proportion of individual
differences in human
intelligence is due to genetic
variation” (Davies et al., 2011).
developer of GCTA.

• Some trait-associated SNPs
might only be correlated (in
linkage disequilibrium) with
untyped causal variants.
• How can we be sure that
GCTA-estimated heritability
reflects common variants?
• Basic principle of psychometrics.
Two dichotomously scored items
can show a strong correlation
only if their pass rates are similar.
0 0.5 1
b
0.2
0
0.2
0
frequency
SNP index i
SNP index j
C
2000 4000 6000 8000
2000
4000
6000
8000
1
0.8
0.6
0.4
0.2
0
A color-coded correlation matrix of SNPs
on chromosome 22.

0.2
0
DIRECTLY FROM 0 DNA 0.5 1
b
DATA
• This same principle also
applies to genetics!
• Two SNPs can show strong
linkage disequilibrium (LD)
only if their allele
frequencies are similar.
• Therefore, a substantial
h2GCTA implies that common
variants play a large role.
0.2
0
frequency
SNP index i
SNP index j
C
2000 4000 6000 8000
2000
4000
6000
8000
1
0.8
0.6
0.4
0.2
0
on chromosome 22.

… T A …
… T A …
… T A …
… C G …
… C …
G
… C …
G
… C G
…
Locus 1
MAF = 3/7
Locus 2
MAF = 3/7

… T A …
… T A …
… T A …
… T G …
… T …
G
… T …
G
… C G
…
Locus 1
MAF = 1/7
Locus 2
MAF = 3/7

THE NUMBER OF CAUSAL VARIANTS:
THE “POLY” IN POLYGENIC
• The simulations and
mathematical arguments by
Lee and Chow (2014) show
that GCTA can be valid even
if there is just one trait-associated
SNP.
• Can we find other evidence
supporting the notion that
missing heritability is
distributed among many
variants of very small effect?
developer of GCTA.

• GCTA has an advantage
over classical pedigree-based
methods. It can
partition h2 among
different parts of the
genome.
• E.g., we can determine how
much heritability is
contributed by each
chromosome.
developer of GCTA.

• Basic idea. Calculate
separate realized genetic
similarities for different parts
of the genome.
• Suppose that there are many
causal loci on chr1, but none
on chr2. Then chr1 genetic
similarity will predict
phenotypic similarity, whereas
chr2 genetic similarity will not.
developer of GCTA.

PARTITIONING SCHIZOPHRENIA
HERITABILITY AMONG CHROMOSOMES
Lee et al. (2012)

PARTITIONING SCHIZOPHRENIA
HERITABILITY AMONG CHROMOSOMES
• The remarkable correlation
between chromosome length
and heritability contribution
suggests that many loci
contribute to SCZ liability
(Gottesman Shields, 1967).
• E.g., if there were only ten
loci, each on a different
chromosome, we would not
see such a relationship.
Prof. Emeritus Irving Gottesman, a pioneer
in the genetic study of mental illness.

• We know that there are
many causal variants. But
can we get more precise?
• Even if a GWAS dataset has
too little power to yield
many “hits,” it still contains
substantial information
about the trait’s genetic
architecture. Naomi Wray and Peter Visscher
introduced a method to estimate
parameters of genetic architectures in
their 2009 study of schizophrenia.

• We have seen how GCTA
exploits this information in the
estimation of heritability.
• It is possible to get out more
than just h2.
• Approximate Bayesian
polygenic analysis (ABPA)
estimates the total number of
genotyped SNPs that are
associated with the trait (Stahl
et al., 2012). Naomi Wray and Peter Visscher
introduced a method to estimate
parameters of genetic architectures in
their 2009 study of schizophrenia.

• Suppose that we estimate
SNP regression coefficients
in a GWAS and use them
to predict the phenotypes
of individuals in a new
sample.
• The cross-validation R2 is
the predictive power of the
estimated coefficients in the
new sample.
Eli Stahl introduced ABPA in 2012, extending a
method devised by Visscher and colleagues.

• Suppose that we bin the SNP
effects estimated in the
GWAS (“training sample”) by
p-value.
• If the GWAS results in every
p-value bin—even in the bins
corresponding to large p-values—
show at least a small
cross-validation R2, then the
trait must be highly polygenic.

• What if the heritability were
due to just a few variants of
large effect? These variants
would be in a bin with low
p-values, and all other bins
would show no cross-validation.
• A failure to observe this
pattern implies polygenicity.

• This logic extends to larger
sample sizes.
• What if the bins corresponding
to p ≥ .05 no longer cross-validate?
Then all trait-associated
SNPs must have p .05!
• The number of SNPs meeting
the cutoff p .05 is then an
upper bound on the total
number of SNPs with nonzero
regression coefficients.

• Simulations can be used to
determine what values of
summary statistics (e.g.,
cross-validation R2 values of
different p-value bins) are likely
given the parameters (e.g.,
number of trait-associated
SNPs).
• Working backward from the
simulation results leads to
Bayesian posterior distributions.

THE POLYGENIC ARCHITECTURE OF
SCHIZOPHRENIA
Application of ABPA to schizophrenia GWAS data has yielded
an estimate of 8,300 common variants (Ripke et al., 2013).

A FOURTH LAW OF
BEHAVIOR GENETICS
• Results from GWAS of mental
illness, education, and
intelligence justify an additional
“law.”
• Fourth Law. Genetic variation is
caused by thousands of sites
across the genome, all of which
are individually responsible for
a minuscule fraction of the
variance (Chabris, Lee, Cesarini,
Benjamin, Laibson, in press).
My colleague Christopher Chabris, the coiner
of the Fourth Law.

A FOURTH LAW OF
BEHAVIOR GENETICS
• The coiner of the original
Three Laws has already
commented on some of the
evidence supporting our
proposed Fourth Law
(Turkheimer, 2012).
• Turkheimer suggests that
this evidence points toward
deemphasizing GWAS.

A FOURTH LAW OF
BEHAVIOR GENETICS
• Turkheimer’s arguments are
important. They are related
to recently expressed
concerns regarding the
trustworthiness of the
scientific enterprise (Pashler
Wagenmakers, 2012).
• Close scrutiny, however,
shows that these arguments
do not apply to GWAS.

ISSUE #1: REPLICABILITY OF
GWAS FINDINGS
• Some have argued that
GWAS findings show a poor
track record of replication.
• Kernel of truth. The small
effects described by the
Fourth Law are difficult to
distinguish from noise in
poorly powered studies and
require large samples to be
replicated.

ISSUE #1: REPLICABILITY
Given adequate sample sizes, however, the degree of
quantitative replication in GWAS is nothing short of astounding.

ISSUE #1: REPLICABILITY
The best-fitting straight line is close to the line of zero
intercept and unit slope (Marigorta Navarro, 2013).

WILL REPLICABILITY EXTEND
TO PSYCHOLOGICAL TRAITS?
• There have been few
GWAS of behavioral traits
in distinct populations.
• It is possible, however, to
use GCTA to estimate the
genetic correlation
between populations with
respect to a certain
phenotype.

YEUR = ↵0 + |X1↵1 + ·{·z· + XL↵L}
European breeding value
!
• YEUR : European individual’s SCZ liability
• Xj : number of SCZ + genes (0, 1, or 2) at the jth locus
• αj : average effect of gene substitution on SCZ liability at
the jth locus
• E : individual’s “residual” with respect to SCZ liability—a
composite of environmental effects, nonlinear (non-additive)
interactions, etc.
+E
YAFR = 0 + |W11 + · ·{·z+WKK}
African breeding value
+E

YEUR = ↵0 + |X1↵1 + ·{·z· + XL↵L}
!
• YAFR : African individual’s SCZ liability
• Wj : number of SCZ + genes (0, 1, or 2) at the jth locus
• βj : average effect of gene substitution on SCZ liability at
the jth locus
• E : individual’s “residual” with respect to SCZ liability—a
composite of environmental effects, nonlinear (non-additive)
interactions, etc.
+E
YAFR = 0 + |W11 + · ·{·z+WKK}
+E

YEUR = ↵0 + |X1↵1 + ·{·z· + XL↵L}
+E
YAFR = 0 + |W11 + · ·{·z+WKK}
+E
The genetic correlation between two phenotypes is
simply the correlation between their respective
breeding values.

YEUR = ↵0 + |X1↵1 + ·{·z· + XL↵L}
+E
YAFR = 0 + |W11 + · ·{·z+WKK}
+E
de Candia et al. (2013) used GCTA to estimate that
the correlation between European and African
breeding values with respect to schizophrenia is
greater than 0.60.

• The latest GWAS meta-analysis
of schizophrenia
included a number of East
Asian samples (Ripke et al.,
2014).
• The concordance between
Europeans and East Asians
is strong.

ISSUE #2: CORRELATION VS.
CAUSATION
• GWAS of unrelated
individuals can only tell us
that a given SNP is
correlated with the
phenotype.
• But we want to know
whether variation at the
genomic site causes
variation in the phenotype.
Sir Ronald Fisher, the founder of both
population genetics and modern statistics.

CAUSATION
• Since a given SNP is
correlated with many other
variants in its genomic
region, picking out the causal
variant (if any) is a challenge.
• Here I address the problem
of whether a GWAS signal
might be attributable to
confounding with an
environmental variable.
Sir Ronald Fisher, the founder of both
population genetics and modern statistics.

CAUSATION
• The simplest means of
addressing confounding is
the family-based design.
• By Mendel’s Law of
Segregation, a parent passes
on a random gene from
each homologous pair to a
given offspring.
father’s
genome
offspring’s
genome
mother’s
genome

CAUSATION
• Whether a heterozygous parent
(“+−”) passes on the “+” or “−”
gene to its offspring is equivalent
to randomized treatment
status in experimental design.
• If there is no selection bias, a
within-family correlation
between “+” transmission and
the phenotype means that the
marker must be linked and
associated with a causal variant.
father’s
genome
offspring’s
genome
mother’s
genome

CAUSATION
• Within-family designs are not
statistically powerful, but they
can be used to check that
studies of unrelated
individuals are not unduly
contaminated by confounding.
• So far, family-based studies
have affirmed the results of
standard GWAS (Rietveld et
al., 2013).
father’s
genome
offspring’s
genome
mother’s
genome

BUT WHY IS CAUSAL
INFERENCE SO SIMPLE HERE?
SNP 1 SNP 2 SNP 3 SNP 4 SNP 5 SNP 6 SNP 7 SNP 8 SNP 9
phenotype
This is the simplest possible causal system
(directed acyclic graph). If there is no confounding,
every partial regression coefficient is equal to its
corresponding average effect.

BUT WHY IS CAUSAL
• Why are genetic and
environmental causes not
confounded more severely?
• Anthropomorphic answer.
When Nature pushes up the
frequencies of some alleles
and pushes down others, she
can only tell which alleles are
correlated with fitness. She
cannot tell which alleles cause
higher fitness.
The Papilio caterpillar, which has evolved to
look like a snake.

BUT WHY IS CAUSAL
• Nevertheless, Nature seems
to adjust allele frequencies
in the correct way more
often than not.
• She can only do this if gene-trait
correlation is a robust
guide to gene-trait
causation. Be thankful that
we live in such a universe!
The Papilio caterpillar, which has evolved to
look like a snake.

ISSUE #3: THE SCIENTIFIC
WORTH OF SMALL EFFECTS
• One might object that only
large effect sizes are
scientifically significant (as
opposed to statistically
significant in a large enough
sample).
• On this view the Fourth Law
automatically discredits
further inquiry into the
genetic causes of behavior.
The clinical psychologist Paul Meehl, a
vocal critic of significance testing.

• This critique draws on the
penetrating writings of
Meehl (1978, 1990).
• Meehl thought that the null
hypothesis is often a
strawman because of
ubiquitous biases and an
abundance of alternative
explanations.

• In such cases the rejection
of the null hypothesis is not
scientifically valuable.
• In GWAS, however, we
have every reason to
believe that the null
hypothesis is true more
often than not.

THE POLYGENIC ARCHITECTURE OF
SCHIZOPHRENIA
~8,300 common variants seems to be a lot—but there are
~8 million common variants in the entire genome!

• Against a large background
of null effects, accepting the
alternative hypothesis of a
small effect is an inherently
meaningful step toward the
underlying biology.
• Perhaps to the surprise of
some, the latest GWAS meta-analysis
of schizophrenia
implicates acquired
immunity (Ripke et al., 2014).

WHAT KINDS OF ENHANCERS
HARBOR SCHIZOPHRENIA VARIANTS?

COMPRESSED SENSING:
ADDRESSING THE N ≪ P PROBLEM
• Point 1. Heritability is not
missing; it is hiding in plain sight
among thousands of variants
(many of them common).
• Point 2. Replicability crisis?
Distinguishing causation from
correlation? The Lykken-Meehl
crud factor? Unlike much of
behavioral science, GWAS is
remarkably free from these
problems.
Over a million people attend the
Minnesota State Fair each year.

COMPRESSED SENSING:
• But it is one thing to say
that there is scientific gold
buried somewhere. It is
quite another to dig it up!
• Can we identify enough
variants to make meaningful
scientific inferences without
n greater than the number
of protons in the Universe?
Over a million people attend the
Minnesota State Fair each year.

COMPRESSED SENSING:
• In Statistics 101, many of us
learned that the sample size (n)
must exceed the number of
RHS variables (p) for the partial
regression coefficients to be
identified.
• Recent work in the theory of
compressed sensing (CS) has
shown that coefficient recovery
is possible in the n ≪ p case
(Candes, Romberg, Tao, 2006). Terence Tao is the most distinguished
SMPY participant and perhaps the most
famous mathematician in the world.

COMPRESSED SENSING:
Consider the noisy linear system y = Ax+e, where A 2 Rn⇥p is the design
matrix and x 2 Rp has s nonzero elements. If n Cslog p for some constant
C, then the solution of the LASSO problem
min
ˆx
!
ky − Aˆxk2
L2 + !kˆxkL1

with a suitable choice of ! obeys
kˆx − xk2
L2 
2E
n
s polylog p,
where 2E
is the variance of the residuals in e.

COMPRESSED 0.2
SENSING:
0.2
• Simply statable CS theorems
assume that the RHS variables (e.g.,
genetic variants) are uncorrelated.
But in reality a genetic variant is in
LD with nearby genetic variants. So
do CS ideas apply here?
• If you squint at the GWAS
covariance matrix from a distance,
it looks diagonal. So it might be
reasonable to expect that LASSO
will still perform well (up to
GWAS precision).
0 0.5 1
b
0
0
frequency
SNP index i
SNP index j
C
2000 4000 6000 8000
2000
4000
6000
8000
1
0.8
0.6
0.4
0.2
0
on chromosome 22.

COMPRESSED SENSING:
Vattikuti, Lee, Chang, Hsu, Chow (2014)

GIANT SNP
L1 SNP, proxy
L1 SNP, not proxy
MR SNP

COMPRESSED SENSING:
• Can we tell when a GWAS has
crossed n C s log p?
• Yes! Certain observable
quantities (e.g., the typical p-value
of called nonzeros) begin
to decline sharply.
• Applying this method to real
GWAS data indicates that for a
trait with h2≈0.50, n 30s
triggers the phase transition to
good performance. The theoretical physicist Stephen Hsu
entertains a visitor to Michigan State
University.

IMPORTANT SCIENTIFIC QUESTIONS:
WHY TAKE THE ROAD TO 30S?
“Man may be excused for
feeling some pride at having
risen … to the very summit
of the organic scale; and the
fact of his having thus risen,
instead of having been
aboriginally placed there,
may give him hope for a still
higher destiny in the distant
future.

“[But] man with all his noble
qualities, with sympathy
which feels for the most
debased, with benevolence,
which extends not only to
other men but to the
humblest living creature, with
his god-like intellect which
has penetrated into the
movements and constitution
of the solar system …

“… with all these exalted
powers, Man still bears in
his bodily frame the
indelible stamp of his lowly
origin.”—CHARLES DARWIN,
THE DESCENT OF MAN

• Darwin knew no genes; we
do. Can we trace the
genetic basis of the
evolutionary change that
Darwin described?
• Recent spectacular
advances in the sequencing
of ancient hominin DNA
suggest that the answer
may be yes.

THE HUMAN FAMILY TREE
1.8 mya?
500 kya
380 kya
Prüfer et al. (2014)

THE GENETICS OF ANCIENT
HOMININS
• Usable DNA was recently
recovered from a
Denisovan-like hominin
who died more than 300
kya (Meyer et al., 2014).
• I will now show you a
comparison of sequences
from Neanderthals and
modern humans. An artist’s reconstruction of a
human-Neanderthal hybrid child.

THE GENETICS OF ANCIENT
HOMININS
This is the modern human sequence encompassing rs1487441, one of the “IQ hits”
identified by Rietveld et al. (2014). A is the “plus” allele; G is the “minus” allele.
TTCTTCCACTCACTCATCACCATAAA
The ancestors of Neanderthals and Denisovans split from our lineage ~500 kya.
Neanderthals probably did a lot of evolving since then … but it is still fun to ask:
What allele did Neanderthals carry at this site?
TTCTTCCACTCACTCG TCACCATAAA

PLEASE CITE THESE PAPERS!
• Vattikuti S, Lee JJ, Chang CC, Hsu SDH, Chow CC (2014). Applying compressed
sensing to genome-wide association studies. GigaScience, 3, 10.
• Lee JJ, Chow CC (2014). Conditions for the validity of SNP-based heritability
estimation. Human Genetics, 133, 1011-1022.
• Rietveld CA, Esko T, Davies G, Pers TH, Benyamin B, Chabris CF, Emilsson V,
Johnson AD, Lee JJ, de Leeuw C, et al. (2014). Common genetic variants
associated with cognitive performance identified using the proxy-phenotype
method. Proceedings of the National Academy of Sciences USA, 111, 13790-13794.
• Chabris CF, Lee JJ, Cesarini D, Benjamin DJ, Laibson DI (in press). The fourth law
of behavior genetics. Current Directions in Psychological Science.

THE GENETIC ARCHITECTURES OF PSYCHOLOGICAL TRAITS

Recomendados

Recomendados

Más contenido relacionado

Destacado

Destacado (20)

Similar a THE GENETIC ARCHITECTURES OF PSYCHOLOGICAL TRAITS

Similar a THE GENETIC ARCHITECTURES OF PSYCHOLOGICAL TRAITS (20)

Más de Nikolaos Tselios

Más de Nikolaos Tselios (20)

Último

Último (20)

THE GENETIC ARCHITECTURES OF PSYCHOLOGICAL TRAITS