SlideShare una empresa de Scribd logo
1 de 59
Descargar para leer sin conexión
Deplancke Lab
Monica Albarca
Jean-Daniel Feuz
Carine Gubelmann
Korneel Hens
Alina Isakova
Irina Krier
Andreas Massouras
Sunil Raghav
Jovan Simicevic     deplanckelab.epfl.ch
Sebastian Waszak
Wiebke Westhall
You?
Laboratory of Systems Biology and Genetics
  Bart Deplancke (bart.deplancke@epfl.ch)
Human genetic variation and its
 contribution to complex traits


           26 June 2000
The human genome
 First announcement
In June 2000: first announcement of a working draft (haplotype!)
with the Nature and Science papers in February 2001




                                                      James Kent (UCSC)    Eugene Myers
                                                                              (Celera)
                                       International Human Genome Sequencing Consortium
                                      (2001) Nature 409:860-921; Venter et al. (2001) Science
                                                         291:1304-1351.

In June 2001: finished chromosome 20, with others following
until finishing of chromosome 1 in May 2006

                             Gregory et al. (2006), Nature, 441, 315-321
Why are we so phenotypically different?
Classes of human genetic variation
Common versus rare
Refers to the frequency of the minor allele in the human population:
   • Common variants = minor allele frequency (MAF) >1% in the
   population. Also described as polymorphisms.
   • Rare variants = MAF < 1%

Neutrality:
   • The vast majority of genetic variants are likely neutral = no
   contribution to phenotypic variation.
   • Some may reach significant frequencies, but this is chance.

Two different nucleotide composition classes:
   • Single nucleotide variants
   • Structural variants
Single nucleotide variants
      T/G        T/G               A/C

ATTGCAATCCGTGG...ATCGAGCCA…TACGATTGCACGCCG…



ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG…



ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG…



ATTGCAATCCGTGG...ATCGAGCCA…TACGATTGCACGCCG…



ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG…
How are SNPs detected?
High-density oligonucleotide arrays
         Chee et al., Science, 1996
                               Simple 5’ to 3’ read-out




                                      Flanking issues


                          Unique oligonucleotide primers to
                         generate minimally overlapping lone
                         range-PCR products of 10-kb average
                                       length
How are SNPs detected?
                                Other strategies
                                                                          Clustered
       Reduced                                                            alignment
    representation
shotgun sequencing
followed by genomic
      alignment




                                                                           Gene-centric
                                                                             studies

                      Reference sequence




                                           From Rothberg et al. Nature Biotech, 2001
The SNP database - dbSNP
                        http://www.ncbi.nlm.nih.gov/projects/SNP/




                                                               >
                                                                    High
                                       >




Three “out of Africa” genomes:
• 1.2 million (67%) (all three), 1.7 million (52%) (any two), 1.0 million (30%) unique
• Overall, 5.2 million SNPs in the three genomes, the majority being present in dbSNP
• Data indicate that most SNVs are common rather than rare
Single nucleotide variants
• Estimated that the human genome contains > 11 million SNPs
(~7 million with MAF > 5%, rest between 1-5%).
• Unknown how many rare or even novel (“de novo”) SNVs

• SNP alleles in the same genomic interval are often correlated with
one another  “Linkage disequilibrium (LD)” = Nonrandom
association of alleles – varies in complex and unpredictable manner
across the genome and between different populations.

• International HapMap Project  can we divide the genome into
groups of highly correlated SNPs that are generally inherited
together = “LD bins”
         Number of tag SNPs required to capture common Phase II SNPs
Single nucleotide variants
                                      Recap
 • International HapMap Project  can we divide the genome into
 groups of highly correlated SNPs that are generally inherited
 together = “LD bins”
                Number of tag SNPs required to capture common Phase II SNPs
                                                                 Based on genotyping over 3.1
Pairwise linkage disequilibrium                                  million SNPs in 270 individuals
(LD) r2 (if 1  SNPs statistically                               from 4 geographically diverse
indistinguishable)                                               populations (Frazer et al., Nature,
                                                                 2007)




     By genotyping the DNA sample of an individual with a “tagging”
     SNP from each LD bin, knowledge regarding 80% of SNPs with a
                MAF > 5% across the genome is gained.
                (Frazer et al., Nature Rev. Genetic., 2010)
Querying human genetic variation
                           Scan Entire Genome
                           - 500,000 SNPs
Population Stratification
     Subdivision of a population into different ethnic groups with
   potentially different marker allele frequencies and thus different
                          disease prevalence
  From Sven
Bergmann, UNIL




       Principle Component Analysis reveals SNP-vectors
             explaining largest variation in the data
Population Stratification
Ethnic groups cluster according to geographic distances
  PC2
  PC2




                                                From Sven
              PC1                  PC1        Bergmann, UNIL
Population Stratification
PCA of POPRES cohort




                              From Sven
                            Bergmann, UNIL
Structural variants




                                                 (Frazer et al., Nature Rev. Genetic., 2010)

  A classic that opened the door to structural variant research:
Sebat et al. Large-Scale Copy Number Polymorphism in the Human Genome. Science, 2004.

                 Used ROMA technique to detect copy number variants
Representational Oligonucleotide Microarray Analysis (ROMA)

                                    1) Genome digestion
                                    2) Adapters to sticky ends and
                                       PCR amplification
                                    3) After PCR, representations of
                                       the entire genome (restriction
                                       fragments) are amplified to
                                       pronounce relative increases,
                                       decreases or preserve equal
                                       copy number in the two
                                       genomes.
                                    4) Representations of the two
                                       different genomes are labeled
                                       with different fluorophores
                                       and co-hybridized to a
                                       microarray with probes
                                       specific to restriction site
                                       locations across the entire
                                       human genome.
Representational Oligonucleotide Microarray Analysis (ROMA)




                                On average, individuals (20
                                tested) differed by 11 CNPs
                                 (average length = 465 kb)
                                    affecting 70 genes.
Structural variants (SVs)




                             (Frazer et al., Nature Rev. Genetic., 2010)
Our ability to detect SVs is still very poor (see later)
Structural variants (SVs)
                            Fosmid-based library
                        sequencing of 8 humans (4
                       Yorubian and 4 non-African)
                         (Kidd et al., Nature, 2008)

                     • 1 million fosmid clones/individual
                     • Both ends of each clone insert sequenced
                      a pair of high-quality end sequences
                     (termed an end-sequence pair (ESP).
                                                        Only SVs over 8 kb
                                                        can be detected




(~450 bp/sequence)
Structural variants (SVs)
Fosmid-based library sequencing of 8 humans (4 Yorubian
     and 4 non-African) (Kidd et al., Nature, 2008)



                                             ~2,000 SVs that were
                                             experimentally verified

                                                         Novel
                                                       sequence
                                                       (either in
                                                      gaps (black)
                                                         or not
                                                       (orange))
Structural variants (SVs)
    Fosmid-based library sequencing of 8 humans (4 Yorubian
         and 4 non-African) (Kidd et al., Nature, 2008)



• 50% of SVs seen >1 individual                       ~2,000 SVs that were
• ~50% outside regions previously annotated as SVs experimentally verified
nearly half lay outside regions of the genome previously
                                                                  Novel
described as structurally variant                              sequence
• 525 new insertion sequences                                   (either in
• 20% of all genetic variants = SVs, but covers >70% of       gaps (black)
                                                                  or not
nucleotide variation                                            (orange))
• SVs  b/w 9- 25 Mb (~0.5-1% of the genome)
• The majority of SVs are yet to be discovered
Structural variants (SVs)
Fosmid-based library sequencing of 8 humans (4 Yorubian
     and 4 non-African) (Kidd et al., Nature, 2008)

                                            Regions of
                                          increased SNV
                                              density
Structural variants and linkage disequilibrium
                    McCarroll et al., Nature Genet., 2008




 • Most common, diallelic CNPs (with MAF greater than 5%) were perfectly
 captured (r2 = 1.0) by at least one SNP tag from HapMap Phase II

 • Mean r2 as a function of distance from a polymorphism = indistinguishable for
 SNPs and diallelic CNPs  common, diallelic CNPs are ancestral mutations

                     Common SVs are in LD with tagging SNPs
Contribution of variants to phenotypes?
Common versus rare
               “Common disease – common variant hypothesis”
                                    versus
Common complex traits are the summation of low-frequency, high-penetrance variants




       OR = odd ratio or

       PAR = population attributable risk = measure of the multifactorial inherited component
                                            of a disease
Whole Genome Association studies




                     How significant is this?
Whole genome association studies
                           P-value




Note: “Genome-wide” is a misnomer
       • 20% of common SNPs not or only partially tagged
       • Rare variants not tagged at all
Whole Genome Association studies
                                        Concept




                          -log10(p)
Scan Entire Genome                                     *         *
- 500,000 SNPs




                            -log10(p)
                                             *    **
Identify local regions
of interest, examine
genes, SNP density
regulatory regions, etc




Replicate the finding


                                                             From Sven
                                                           Bergmann, UNIL
Whole Genome Association studies
                        Visualization




Wellcome Trust Case Control Consortium. Genome-wide association
study of 14,000 cases of seven common diseases and 3,000 shared
controls. Nature 447, 661–678 (2007).




                               McCarthy et al., Nature Rev. Genet., 2008
Whole genome association studies
                                        Concept



                          -log10(p)
Scan Entire Genome                                     *         *
- 500,000s SNPs


                            -log10(p)

                                              *   **
Identify local regions
of interest, examine
genes, SNP density
regulatory regions, etc




Replicate the finding

                                                           From Sven Bergmann
                                                                  (UNIL)
Whole genome association studies
          An avalanche of GWA studies




• From 2006  >220 studies reported to date
• For over 80 phenotypes  300 loci have been implicated
• Most implicated loci were identified for the first time (no prior knowledge)
Whole genome association studies
                     Type 2 diabetes: an example




                                              Frazer et al., Nat. Rev. Genet., 2010


• 18 genomic intervals with 4 containing previously implicated genes
• Major message: the molecular diversity of T2D genes was not anticipated, thus:
    (Patients with = disease) ≠ (Patients with = underlying biological disorder)
Whole genome association studies
Overlap of genetic risk factor loci for common diseases




                                               Frazer et al., Nat. Rev. Genet., 2010


• 15 loci are associated with two or more diseases (8 are shown)
• Not necessarily same impact (PTPN22 + Crohn’s, - for other ai diseases
• Different diseases may have similar molecular underpinnings
      • Expected: ai diseases (same clinical features)
      • Unexpected: e.g. GCKR in both TGC levels and ai disease
Whole genome association studies
                From association to molecular mechanism
• Very difficult:
      • what are the precise variants associated with a trait?
      • if located in exons: easy, but outside, then what?
      • most are located outside exons!
      (e.g. 9p21 <-> myocardial infarction is located 150 kb from the nearest gene!)
      • May have a regulatory function, i.e. control gene expression

           AG
                                                            1                 c2       3

• humans are heterozygous at more functional cis-regulatory sites than at amino acid positions, with
10,700 functional biallelic cis-regulatory polymorphisms in a typical human (Rockman and Wray. Mol.
Biol. Evol., 2002: 19, 1991).

• 34% of promoter polymorphisms (170 tested) significantly modulated reporter gene expression
(>1.5-fold) (Hoogendoorn et al., Hum. Mol. Genet., 2003: 12, 2249).

• Case study with the CC chemokine receptor 5, a major chemokine coreceptor of HIV-1 necessary for
viral entry into cells
       • G to A SNP of CCR5 at –2459 nt
       • CCR5 density – low (homozygous GG), intermediate (GA), and highest (homozygous –2459AA)
       (Salkowitz et al., Clin. Immunol., 2003: 108, 234).
Whole genome association studies
                                             Mapping eQTLs
• Transcript abundance = a quantitative trait that can be mapped with considerable power = eQTLs



                          Environment                                    Genetics


                       Heritability (H2) = genetic variance over total trait variance with 0 =
                         no genetic effects and 1 = all variance is under genetic control

                       Classic paper: Schadt et al., Nature, 2003
           Genetics of gene expression surveyed in maize, mouse and man

 • Liver tissues from 111 F2 mice constructed (from C57BL/6J and DBA/2J)
 • Microarray analysis of 23,574 genes: 7,861 significantly differentially expressed (either in the
 parental strains or in at least 10% of the F2 mice)
 • eQTL identification (log of the odds ratio (LOD) > 4.3 (P-value < 0.00005))for 2,123 genes
 • These eQTLs explained 25% of the transcription variation of the corresponding genes
Whole genome association studies
                               Mapping eQTLs
                             Schadt et al., Nature, 2003

% eQTL across 920 evenly spaced bins, each 2 cM wide
                                                       • Several hotspots (>1% of detected
                                                         eQTLs are located within a 4 cM
                                                                     interval)

                                                       • 40% of genes with ≥ 1 eQTL (LOD >
                                                        3.0) had more than one eQTL, and
                                                       close to 4% of such genes had more
                                                                 than three eQTL
                                                         Gene expression = complex trait
Whole genome association studies
                                 Mapping eQTLs
                               Schadt et al., Nature, 2003


Known polymorphisms between the two parental strains
                                                   • Overlap between polymorphism and
                                                      eQTL = cis-acting transcriptional
                                                                 regulation

                                                                  For example:
                                                      • The C5 gene 2 bp deletion in the
                                                     coding region in DBA mice resulting in
                                                   rapid transcript decay compared with B6.
                                                    A LOD of 27.4 centred over the C5 gene
                                                     on chromosome 2 is readily detected
                                                                  (black curve).
                                                     • The Alad gene present in 2 copies in
                                                                       DBA
Whole genome association studies
                                   Mapping eQTLs
                                Schadt et al., Nature, 2003

Combining clinical, gene expression and genetic factors



                                                          • Classical QTLs for FPM:
                                                               4 significant loci

                                                  • Further analyses with subgroups:
                                                        additional loci identified

                                               • Some QTLs only affect a subset of the F2
                                               population, demonstrating the complexity
                                                   underlying traits such as obesity
Whole genome association studies
                                    Mapping eQTLs
Dixon et al., Nature Genet., 2007: A genome-wide association study of global gene expression

          • 206 families of British descent using immortalized lymphoblastoid cell lines
         (LCLs) from 400 children (Affy microarrays; 54,675 transcripts ~ 20,599 genes)




                               ~15,000 H2 > 0.3
                                         Gene Ontology descriptors for:
                                         • Response to unfolded protein (HSFs, chaperones)
                                         • Immune responses and apoptosis
                                         • Regulation of progression through the cell cycle,
                                         • RNA processing and DNA repair.
Whole genome association studies
                                    Mapping eQTLs
Dixon et al., Nature Genet., 2007: A genome-wide association study of global gene expression

          • 206 families of British descent using immortalized lymphoblastoid cell lines
         (LCLs) from 400 children (Affy microarrays; 54,675 transcripts ~ 20,599 genes)


                                                    • Trans effects are weaker than those in cis

                                                   • Nevertheless, significant trans associations
                                                                   were detected:
                                                      e.g. 1) ~700 transcripts with the peak of
                                                     association on the same chromosome but
                                                   >100 kb from the nearest transcribed gene,
                                                          2) 10,382 transcripts, the peak of
                                                   association was on a different chromosome
Whole genome association studies
                                Mapping eQTLs
                   Using eQTLs to better understand GWAS results
                          Libioulle et al., PLOS Genet., 2007


GWAS for Crohn’s disease

                                 • One of the
                                 neighboring genes
                                 PTGER4 may be
          1.25 Mb Gene desert    involved
                                 • Trace eQTLs in
                                 LCL data




      • Disease-associated polymorphisms may be regulating PTGER4 expression
      in cis, but >250 kb away  more research needed but likely regulatory
      polymorphism
Whole genome association studies
                                      Mapping eQTLs
                 We looked at SNPs but what about other structural variants?
Stranger et al., Science, 2007: Relative Impact of Nucleotide and Copy Number Variation on
                                 Gene Expression Phenotypes
  • LCLs of 210 unrelated HapMap individuals from four populations
  • Copy number variants were identified via CGH against a common reference individual
                     SNP                                    CNV




                       From probe associated with linked gene   From probe associated with linked gene

               • 83.6%                    and                       17.7%
               of the total detected genetic variation in gene expression
               • SNPs close to their respective genes, less so for CNVs
               • Little overlap between SNP and CNV associations (only 20%)
               • Not “mere” gene dosage effects
Whole genome association studies
                                How universal are GWAS findings?
Frazer et al., Nat. Rev. Genet., 2010
                                                      Associated with myocardial
                                                              infarction


                                                                             • Allele frequencies are
                                                                             different in different
                                                                             populations

                                                                             • LD patterns across loci
                                                                             that co-segregate with
                                                                             a causally associated
                                                                             variant may be different
                                        LD less strong in African population
                                                                             from population to
                                         bottleneck principle
                                                                             population

                                                                             • Control for population
                                                                             differences is essential
      Red = high pairwise                 SNPs that efficiently (r2 >        in large studies
        SNP correlation                   0.8) tag one another are
                                                 connected
Whole genome association studies
                                 Impact so far
• No complex traits for which there is > 10% of the genetic variance explained
    e.g. T2D: 18 genetic variants together < 4% of the total trait liability

• Sample size may compensate (increased statistical power)
    But…studies for lipid phenotypes involving >40,000 people still <10%
       … some diseases have only a low number of affected individuals

• Does the answer lie in structural variants? Most are still unmapped
    But… they are likely in LD with common SNPs

• Does the answer lie in rare variants?
    Possibly…
         • Rare variants are not in LD with tagging SNPs and thus so far undetected
           (Amish study)
         • Can have very high penetrance
         • However, how to detect on a population-wide basis?
Whole genome association studies
              The power of whole-genome sequencing
Miller syndrome: autosomal recessive genetic trait (Roach et al., Science, 2010)




• Sequenced genomes of 2 parents and 2 children, both affected by Miller Syndrome

• Identified 3.7 million SNPs that varied within the family

• Resequenced 34000 candidate mutations  28 de novo mutations

• Narrowing down via “rare” assumption and knowledge of recessive inheritance

• Found one gene, dihydroorotate dehydrogenase (DHOH) known to be involved
Entering the age of personalized medicine
Toward the elucidation of each person’s genetic make-up
Necessary for:
   1) DNA-based risk assessment for common complex disease
   2) Drug discovery (new implicated genes can be identified)

But also to:
    3) Identify molecular signatures for disease diagnosis and prognosis

And for:
    4) A DNA-guided therapy and dose selection

A person’s genetic make-up significantly affects the efficacy of a drug
    • Polymorphisms in the VKORC1 and CYP2C9 genes dictate the effective dose levels of the
    anti-coagulant Warfarin
    • Polymorphisms in the UGT1A1 gene correlate with increased toxicity of the anti-colon
    cancer drug Irinotecan
    • Polymorphisms in the MTHFR gene are associated with increased toxicity of Methotrexate
    used to treat Crohn’s disease
    • Polymorphisms in the CYP2D6 gene dictates the probability of relapse in women with
    breastcancer treated with Tamoxifen
Entering the age of personalized medicine
    The revolution of high-throughput sequencing: Illumina
                       Metzker et al., Nat. Rev. Genet., 2010




Solid phase amplification: 1) initial priming and extending
of the single-stranded, single-molecule template, and 2)
bridge amplification of the immobilized template with
immediately adjacent primers to form clusters.                      1


                                                                1
Entering the age of personalized medicine
                From sequence to genome: mapping reads
                               Trapnell and Salzberg, Nat. Biotech., 2009



                                                                             Using BW, the index
                                                                            for the entire human
     Four sequences of equal
                                                                             genome fits into < 2
        strength = seeds
                                                                                Gb of memory

If 1SNP, the other 3                                                         Is 30 times faster
seeds intact;                                                                  than indexing
If 2 SNPs, the other 2
seeds intact;                                                                Also is limited to 2
                                                                              SNPs within one
Thus, max 2 SNPs/read                                                                read

Limitation:
Indexing takes up huge
memory
Entering the age of personalized medicine
                 Burrows-Wheeler transform
     Wikipedia




         Easier to compress strings with runs of repeated characters
Entering the age of personalized medicine
    A first human genome project using HTS

               Bentley et al., Nature,
               2008
               • Solexa Technology
               • First: X-chromosome
                    • 204 million reads
                    • Sampling of
                    sequence fragments
                    is close to random
                    (GC content slight
                    effect)
Entering the age of personalized medicine
       A first human genome project using HTS
                    Bentley et al., Nature, 2008
 • 135 Gb of sequence (~4 billion paired 35-base reads) (8 weeks)
 • The approximate consumables cost = $250,000
 • 97% of the reads were aligned using MAQ
 • 99.9% of the human reference covered with ≥ 1 reads at 40.6X




                      99% agreement with HapMap results!
Entering the age of personalized medicine
         More human genome projects
            Snyder et al., G&D, 2010
Entering the age of personalized medicine
         More human genome projects
            Snyder et al., G&D, 2010
Entering the age of personalized medicine
         More human genome projects
            Snyder et al., G&D, 2010
Entering the age of personalized medicine
                Tackling the SV problem using HTS
• Really difficult and progress is limited.
• Existing methods are based on two approaches:
    • Paired-end mapping (PEM)
    • Depth-of-coverage (DOC) approach




• The ends of each fragment tagged by a biotinylated (B) nucleotide
• Circularization forms a junction between the two ends
• Random fragmentation and recovery of biotinylated fragments
• Circularized DNA is randomly fragmented and the biotinylated junction fragments are
recovered
• Standard sequencing procedure thereafter
Entering the age of personalized medicine
Tackling the SV problem using HTS: paired-end mapping
           Medvedev et al., Nature Meth., 2009
Entering the age of personalized medicine
          Tackling the SV problem using HTS: DOC
Snyder et al., G&D, 2010      Campbell et al., Nature Genet., 2008
Entering the age of personalized medicine
Tackling the SV problem using HTS: state-of-the-art
                 Snyder et al., G&D, 2010

Más contenido relacionado

La actualidad más candente

CCBC tutorial beiko
CCBC tutorial beikoCCBC tutorial beiko
CCBC tutorial beikobeiko
 
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017David Cook
 
Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.
Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.
Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.jennomics
 
Reconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomicsReconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomicsRutger Vos
 
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun SequencesTools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun SequencesSurya Saha
 
Single-cell RNA-seq tutorial
Single-cell RNA-seq tutorialSingle-cell RNA-seq tutorial
Single-cell RNA-seq tutorialAaron Diaz
 
Centre of innovation, Agricultural College and Research Institute,Madurai
Centre of innovation, Agricultural College and Research Institute,MaduraiCentre of innovation, Agricultural College and Research Institute,Madurai
Centre of innovation, Agricultural College and Research Institute,MaduraiSenthil Natesan
 
Next Generation Sequencing Informatics - Challenges and Opportunities
Next Generation Sequencing Informatics - Challenges and OpportunitiesNext Generation Sequencing Informatics - Challenges and Opportunities
Next Generation Sequencing Informatics - Challenges and OpportunitiesChung-Tsai Su
 
2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorialc.titus.brown
 
Next Generation Sequencing and its Applications in Medical Research - Frances...
Next Generation Sequencing and its Applications in Medical Research - Frances...Next Generation Sequencing and its Applications in Medical Research - Frances...
Next Generation Sequencing and its Applications in Medical Research - Frances...Sri Ambati
 
A Comparison of NGS Platforms.
A Comparison of NGS Platforms.A Comparison of NGS Platforms.
A Comparison of NGS Platforms.mkim8
 
SPIN Workshop Microbial Genomics @NIST
SPIN Workshop Microbial Genomics @NISTSPIN Workshop Microbial Genomics @NIST
SPIN Workshop Microbial Genomics @NISTNathan Olson
 
Ernesto Picardi – Bioinformatica e genomica comparata: nuove strategie sperim...
Ernesto Picardi – Bioinformatica e genomica comparata: nuove strategie sperim...Ernesto Picardi – Bioinformatica e genomica comparata: nuove strategie sperim...
Ernesto Picardi – Bioinformatica e genomica comparata: nuove strategie sperim...eventi-ITBbari
 
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practice
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practiceAug2013 Heidi Rehm integrating large scale sequencing into clinical practice
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practiceGenomeInABottle
 

La actualidad más candente (20)

CCBC tutorial beiko
CCBC tutorial beikoCCBC tutorial beiko
CCBC tutorial beiko
 
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
 
Introduction to 16S Microbiome Analysis
Introduction to 16S Microbiome AnalysisIntroduction to 16S Microbiome Analysis
Introduction to 16S Microbiome Analysis
 
Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.
Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.
Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.
 
Reconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomicsReconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomics
 
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun SequencesTools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
 
Clinical Applications of Next Generation Sequencing
Clinical Applications of Next Generation SequencingClinical Applications of Next Generation Sequencing
Clinical Applications of Next Generation Sequencing
 
Future of metagenomics
Future of metagenomicsFuture of metagenomics
Future of metagenomics
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Single-cell RNA-seq tutorial
Single-cell RNA-seq tutorialSingle-cell RNA-seq tutorial
Single-cell RNA-seq tutorial
 
Centre of innovation, Agricultural College and Research Institute,Madurai
Centre of innovation, Agricultural College and Research Institute,MaduraiCentre of innovation, Agricultural College and Research Institute,Madurai
Centre of innovation, Agricultural College and Research Institute,Madurai
 
Next Generation Sequencing Informatics - Challenges and Opportunities
Next Generation Sequencing Informatics - Challenges and OpportunitiesNext Generation Sequencing Informatics - Challenges and Opportunities
Next Generation Sequencing Informatics - Challenges and Opportunities
 
2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial
 
Next Generation Sequencing and its Applications in Medical Research - Frances...
Next Generation Sequencing and its Applications in Medical Research - Frances...Next Generation Sequencing and its Applications in Medical Research - Frances...
Next Generation Sequencing and its Applications in Medical Research - Frances...
 
A Comparison of NGS Platforms.
A Comparison of NGS Platforms.A Comparison of NGS Platforms.
A Comparison of NGS Platforms.
 
SPIN Workshop Microbial Genomics @NIST
SPIN Workshop Microbial Genomics @NISTSPIN Workshop Microbial Genomics @NIST
SPIN Workshop Microbial Genomics @NIST
 
Ernesto Picardi – Bioinformatica e genomica comparata: nuove strategie sperim...
Ernesto Picardi – Bioinformatica e genomica comparata: nuove strategie sperim...Ernesto Picardi – Bioinformatica e genomica comparata: nuove strategie sperim...
Ernesto Picardi – Bioinformatica e genomica comparata: nuove strategie sperim...
 
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practice
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practiceAug2013 Heidi Rehm integrating large scale sequencing into clinical practice
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practice
 
Transcriptomics approaches
Transcriptomics approachesTranscriptomics approaches
Transcriptomics approaches
 
NGS and the molecular basis of disease: a practical view
NGS and the molecular basis of disease: a practical viewNGS and the molecular basis of disease: a practical view
NGS and the molecular basis of disease: a practical view
 

Similar a Human genetic variation and its contribution to complex traits

SNPs Presentation Cavalcanti Lab
SNPs Presentation Cavalcanti LabSNPs Presentation Cavalcanti Lab
SNPs Presentation Cavalcanti Labjsrep91
 
Detection of genomic homology in eukaryotic genomes
Detection of genomic homology in eukaryotic genomesDetection of genomic homology in eukaryotic genomes
Detection of genomic homology in eukaryotic genomesKlaas Vandepoele
 
Unilag workshop complex genome analysis
Unilag workshop   complex genome analysisUnilag workshop   complex genome analysis
Unilag workshop complex genome analysisDr. Olusoji Adewumi
 
snp-150505131615-conversion-gate02.pdf
snp-150505131615-conversion-gate02.pdfsnp-150505131615-conversion-gate02.pdf
snp-150505131615-conversion-gate02.pdfLawalBelloDanchadi
 
L11 dna__polymorphisms__mutations_and_genetic_diseases4
L11  dna__polymorphisms__mutations_and_genetic_diseases4L11  dna__polymorphisms__mutations_and_genetic_diseases4
L11 dna__polymorphisms__mutations_and_genetic_diseases4MUBOSScz
 
L11 dna__polymorphisms__mutations_and_genetic_diseases
L11  dna__polymorphisms__mutations_and_genetic_diseasesL11  dna__polymorphisms__mutations_and_genetic_diseases
L11 dna__polymorphisms__mutations_and_genetic_diseasesMUBOSScz
 
Molecular markers by tahura mariyam ansari
Molecular markers by tahura mariyam ansariMolecular markers by tahura mariyam ansari
Molecular markers by tahura mariyam ansariTahura Mariyam Ansari
 
7- A researcher is collaborating with a population geneticist who has.docx
7- A researcher is collaborating with a population geneticist who has.docx7- A researcher is collaborating with a population geneticist who has.docx
7- A researcher is collaborating with a population geneticist who has.docxKevinDjvMorgank
 
genomics and system biology
genomics and system biologygenomics and system biology
genomics and system biologyNawfal Aldujaily
 
Genomics Technologies
Genomics TechnologiesGenomics Technologies
Genomics TechnologiesSean Davis
 
DNA Fingerprinting for Taxonomy and Phylogeny.pptx
DNA Fingerprinting for Taxonomy and Phylogeny.pptxDNA Fingerprinting for Taxonomy and Phylogeny.pptx
DNA Fingerprinting for Taxonomy and Phylogeny.pptxsharanabasapppa
 
Microarrays;application
Microarrays;applicationMicroarrays;application
Microarrays;applicationFyzah Bashir
 
Presentation on Difference between GISH and FISH
Presentation on Difference between GISH and FISHPresentation on Difference between GISH and FISH
Presentation on Difference between GISH and FISHDr. Kaushik Kumar Panigrahi
 

Similar a Human genetic variation and its contribution to complex traits (20)

SNPs Presentation Cavalcanti Lab
SNPs Presentation Cavalcanti LabSNPs Presentation Cavalcanti Lab
SNPs Presentation Cavalcanti Lab
 
Detection of genomic homology in eukaryotic genomes
Detection of genomic homology in eukaryotic genomesDetection of genomic homology in eukaryotic genomes
Detection of genomic homology in eukaryotic genomes
 
Molecular tagging
Molecular tagging Molecular tagging
Molecular tagging
 
Lecture 2
Lecture 2Lecture 2
Lecture 2
 
THE human genome
THE human genomeTHE human genome
THE human genome
 
Unilag workshop complex genome analysis
Unilag workshop   complex genome analysisUnilag workshop   complex genome analysis
Unilag workshop complex genome analysis
 
SNP
SNPSNP
SNP
 
snp-150505131615-conversion-gate02.pdf
snp-150505131615-conversion-gate02.pdfsnp-150505131615-conversion-gate02.pdf
snp-150505131615-conversion-gate02.pdf
 
Snp
SnpSnp
Snp
 
L11 dna__polymorphisms__mutations_and_genetic_diseases4
L11  dna__polymorphisms__mutations_and_genetic_diseases4L11  dna__polymorphisms__mutations_and_genetic_diseases4
L11 dna__polymorphisms__mutations_and_genetic_diseases4
 
L11 dna__polymorphisms__mutations_and_genetic_diseases
L11  dna__polymorphisms__mutations_and_genetic_diseasesL11  dna__polymorphisms__mutations_and_genetic_diseases
L11 dna__polymorphisms__mutations_and_genetic_diseases
 
Molecular markers by tahura mariyam ansari
Molecular markers by tahura mariyam ansariMolecular markers by tahura mariyam ansari
Molecular markers by tahura mariyam ansari
 
Comparitive genomics
Comparitive genomicsComparitive genomics
Comparitive genomics
 
7- A researcher is collaborating with a population geneticist who has.docx
7- A researcher is collaborating with a population geneticist who has.docx7- A researcher is collaborating with a population geneticist who has.docx
7- A researcher is collaborating with a population geneticist who has.docx
 
genomics and system biology
genomics and system biologygenomics and system biology
genomics and system biology
 
Genomics Technologies
Genomics TechnologiesGenomics Technologies
Genomics Technologies
 
DNA Fingerprinting for Taxonomy and Phylogeny.pptx
DNA Fingerprinting for Taxonomy and Phylogeny.pptxDNA Fingerprinting for Taxonomy and Phylogeny.pptx
DNA Fingerprinting for Taxonomy and Phylogeny.pptx
 
Microarrays;application
Microarrays;applicationMicroarrays;application
Microarrays;application
 
Markers
MarkersMarkers
Markers
 
Presentation on Difference between GISH and FISH
Presentation on Difference between GISH and FISHPresentation on Difference between GISH and FISH
Presentation on Difference between GISH and FISH
 

Último

Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 

Último (20)

Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 

Human genetic variation and its contribution to complex traits

  • 1. Deplancke Lab Monica Albarca Jean-Daniel Feuz Carine Gubelmann Korneel Hens Alina Isakova Irina Krier Andreas Massouras Sunil Raghav Jovan Simicevic deplanckelab.epfl.ch Sebastian Waszak Wiebke Westhall You?
  • 2. Laboratory of Systems Biology and Genetics Bart Deplancke (bart.deplancke@epfl.ch) Human genetic variation and its contribution to complex traits 26 June 2000
  • 3. The human genome First announcement In June 2000: first announcement of a working draft (haplotype!) with the Nature and Science papers in February 2001 James Kent (UCSC) Eugene Myers (Celera) International Human Genome Sequencing Consortium (2001) Nature 409:860-921; Venter et al. (2001) Science 291:1304-1351. In June 2001: finished chromosome 20, with others following until finishing of chromosome 1 in May 2006 Gregory et al. (2006), Nature, 441, 315-321
  • 4. Why are we so phenotypically different?
  • 5. Classes of human genetic variation Common versus rare Refers to the frequency of the minor allele in the human population: • Common variants = minor allele frequency (MAF) >1% in the population. Also described as polymorphisms. • Rare variants = MAF < 1% Neutrality: • The vast majority of genetic variants are likely neutral = no contribution to phenotypic variation. • Some may reach significant frequencies, but this is chance. Two different nucleotide composition classes: • Single nucleotide variants • Structural variants
  • 6. Single nucleotide variants T/G T/G A/C ATTGCAATCCGTGG...ATCGAGCCA…TACGATTGCACGCCG… ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG… ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG… ATTGCAATCCGTGG...ATCGAGCCA…TACGATTGCACGCCG… ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG…
  • 7. How are SNPs detected? High-density oligonucleotide arrays Chee et al., Science, 1996 Simple 5’ to 3’ read-out Flanking issues Unique oligonucleotide primers to generate minimally overlapping lone range-PCR products of 10-kb average length
  • 8. How are SNPs detected? Other strategies Clustered Reduced alignment representation shotgun sequencing followed by genomic alignment Gene-centric studies Reference sequence From Rothberg et al. Nature Biotech, 2001
  • 9. The SNP database - dbSNP http://www.ncbi.nlm.nih.gov/projects/SNP/ > High > Three “out of Africa” genomes: • 1.2 million (67%) (all three), 1.7 million (52%) (any two), 1.0 million (30%) unique • Overall, 5.2 million SNPs in the three genomes, the majority being present in dbSNP • Data indicate that most SNVs are common rather than rare
  • 10. Single nucleotide variants • Estimated that the human genome contains > 11 million SNPs (~7 million with MAF > 5%, rest between 1-5%). • Unknown how many rare or even novel (“de novo”) SNVs • SNP alleles in the same genomic interval are often correlated with one another  “Linkage disequilibrium (LD)” = Nonrandom association of alleles – varies in complex and unpredictable manner across the genome and between different populations. • International HapMap Project  can we divide the genome into groups of highly correlated SNPs that are generally inherited together = “LD bins” Number of tag SNPs required to capture common Phase II SNPs
  • 11. Single nucleotide variants Recap • International HapMap Project  can we divide the genome into groups of highly correlated SNPs that are generally inherited together = “LD bins” Number of tag SNPs required to capture common Phase II SNPs Based on genotyping over 3.1 Pairwise linkage disequilibrium million SNPs in 270 individuals (LD) r2 (if 1  SNPs statistically from 4 geographically diverse indistinguishable) populations (Frazer et al., Nature, 2007) By genotyping the DNA sample of an individual with a “tagging” SNP from each LD bin, knowledge regarding 80% of SNPs with a MAF > 5% across the genome is gained. (Frazer et al., Nature Rev. Genetic., 2010)
  • 12. Querying human genetic variation Scan Entire Genome - 500,000 SNPs
  • 13. Population Stratification Subdivision of a population into different ethnic groups with potentially different marker allele frequencies and thus different disease prevalence From Sven Bergmann, UNIL Principle Component Analysis reveals SNP-vectors explaining largest variation in the data
  • 14. Population Stratification Ethnic groups cluster according to geographic distances PC2 PC2 From Sven PC1 PC1 Bergmann, UNIL
  • 15. Population Stratification PCA of POPRES cohort From Sven Bergmann, UNIL
  • 16. Structural variants (Frazer et al., Nature Rev. Genetic., 2010) A classic that opened the door to structural variant research: Sebat et al. Large-Scale Copy Number Polymorphism in the Human Genome. Science, 2004. Used ROMA technique to detect copy number variants
  • 17. Representational Oligonucleotide Microarray Analysis (ROMA) 1) Genome digestion 2) Adapters to sticky ends and PCR amplification 3) After PCR, representations of the entire genome (restriction fragments) are amplified to pronounce relative increases, decreases or preserve equal copy number in the two genomes. 4) Representations of the two different genomes are labeled with different fluorophores and co-hybridized to a microarray with probes specific to restriction site locations across the entire human genome.
  • 18. Representational Oligonucleotide Microarray Analysis (ROMA) On average, individuals (20 tested) differed by 11 CNPs (average length = 465 kb) affecting 70 genes.
  • 19. Structural variants (SVs) (Frazer et al., Nature Rev. Genetic., 2010) Our ability to detect SVs is still very poor (see later)
  • 20. Structural variants (SVs) Fosmid-based library sequencing of 8 humans (4 Yorubian and 4 non-African) (Kidd et al., Nature, 2008) • 1 million fosmid clones/individual • Both ends of each clone insert sequenced  a pair of high-quality end sequences (termed an end-sequence pair (ESP). Only SVs over 8 kb can be detected (~450 bp/sequence)
  • 21. Structural variants (SVs) Fosmid-based library sequencing of 8 humans (4 Yorubian and 4 non-African) (Kidd et al., Nature, 2008) ~2,000 SVs that were experimentally verified Novel sequence (either in gaps (black) or not (orange))
  • 22. Structural variants (SVs) Fosmid-based library sequencing of 8 humans (4 Yorubian and 4 non-African) (Kidd et al., Nature, 2008) • 50% of SVs seen >1 individual ~2,000 SVs that were • ~50% outside regions previously annotated as SVs experimentally verified nearly half lay outside regions of the genome previously Novel described as structurally variant sequence • 525 new insertion sequences (either in • 20% of all genetic variants = SVs, but covers >70% of gaps (black) or not nucleotide variation (orange)) • SVs  b/w 9- 25 Mb (~0.5-1% of the genome) • The majority of SVs are yet to be discovered
  • 23. Structural variants (SVs) Fosmid-based library sequencing of 8 humans (4 Yorubian and 4 non-African) (Kidd et al., Nature, 2008) Regions of increased SNV density
  • 24. Structural variants and linkage disequilibrium McCarroll et al., Nature Genet., 2008 • Most common, diallelic CNPs (with MAF greater than 5%) were perfectly captured (r2 = 1.0) by at least one SNP tag from HapMap Phase II • Mean r2 as a function of distance from a polymorphism = indistinguishable for SNPs and diallelic CNPs  common, diallelic CNPs are ancestral mutations Common SVs are in LD with tagging SNPs
  • 25. Contribution of variants to phenotypes?
  • 26. Common versus rare “Common disease – common variant hypothesis” versus Common complex traits are the summation of low-frequency, high-penetrance variants OR = odd ratio or PAR = population attributable risk = measure of the multifactorial inherited component of a disease
  • 27. Whole Genome Association studies How significant is this?
  • 28. Whole genome association studies P-value Note: “Genome-wide” is a misnomer • 20% of common SNPs not or only partially tagged • Rare variants not tagged at all
  • 29. Whole Genome Association studies Concept -log10(p) Scan Entire Genome * * - 500,000 SNPs -log10(p) * ** Identify local regions of interest, examine genes, SNP density regulatory regions, etc Replicate the finding From Sven Bergmann, UNIL
  • 30. Whole Genome Association studies Visualization Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007). McCarthy et al., Nature Rev. Genet., 2008
  • 31. Whole genome association studies Concept -log10(p) Scan Entire Genome * * - 500,000s SNPs -log10(p) * ** Identify local regions of interest, examine genes, SNP density regulatory regions, etc Replicate the finding From Sven Bergmann (UNIL)
  • 32. Whole genome association studies An avalanche of GWA studies • From 2006  >220 studies reported to date • For over 80 phenotypes  300 loci have been implicated • Most implicated loci were identified for the first time (no prior knowledge)
  • 33. Whole genome association studies Type 2 diabetes: an example Frazer et al., Nat. Rev. Genet., 2010 • 18 genomic intervals with 4 containing previously implicated genes • Major message: the molecular diversity of T2D genes was not anticipated, thus: (Patients with = disease) ≠ (Patients with = underlying biological disorder)
  • 34. Whole genome association studies Overlap of genetic risk factor loci for common diseases Frazer et al., Nat. Rev. Genet., 2010 • 15 loci are associated with two or more diseases (8 are shown) • Not necessarily same impact (PTPN22 + Crohn’s, - for other ai diseases • Different diseases may have similar molecular underpinnings • Expected: ai diseases (same clinical features) • Unexpected: e.g. GCKR in both TGC levels and ai disease
  • 35. Whole genome association studies From association to molecular mechanism • Very difficult: • what are the precise variants associated with a trait? • if located in exons: easy, but outside, then what? • most are located outside exons! (e.g. 9p21 <-> myocardial infarction is located 150 kb from the nearest gene!) • May have a regulatory function, i.e. control gene expression AG 1 c2 3 • humans are heterozygous at more functional cis-regulatory sites than at amino acid positions, with 10,700 functional biallelic cis-regulatory polymorphisms in a typical human (Rockman and Wray. Mol. Biol. Evol., 2002: 19, 1991). • 34% of promoter polymorphisms (170 tested) significantly modulated reporter gene expression (>1.5-fold) (Hoogendoorn et al., Hum. Mol. Genet., 2003: 12, 2249). • Case study with the CC chemokine receptor 5, a major chemokine coreceptor of HIV-1 necessary for viral entry into cells • G to A SNP of CCR5 at –2459 nt • CCR5 density – low (homozygous GG), intermediate (GA), and highest (homozygous –2459AA) (Salkowitz et al., Clin. Immunol., 2003: 108, 234).
  • 36. Whole genome association studies Mapping eQTLs • Transcript abundance = a quantitative trait that can be mapped with considerable power = eQTLs Environment Genetics Heritability (H2) = genetic variance over total trait variance with 0 = no genetic effects and 1 = all variance is under genetic control Classic paper: Schadt et al., Nature, 2003 Genetics of gene expression surveyed in maize, mouse and man • Liver tissues from 111 F2 mice constructed (from C57BL/6J and DBA/2J) • Microarray analysis of 23,574 genes: 7,861 significantly differentially expressed (either in the parental strains or in at least 10% of the F2 mice) • eQTL identification (log of the odds ratio (LOD) > 4.3 (P-value < 0.00005))for 2,123 genes • These eQTLs explained 25% of the transcription variation of the corresponding genes
  • 37. Whole genome association studies Mapping eQTLs Schadt et al., Nature, 2003 % eQTL across 920 evenly spaced bins, each 2 cM wide • Several hotspots (>1% of detected eQTLs are located within a 4 cM interval) • 40% of genes with ≥ 1 eQTL (LOD > 3.0) had more than one eQTL, and close to 4% of such genes had more than three eQTL  Gene expression = complex trait
  • 38. Whole genome association studies Mapping eQTLs Schadt et al., Nature, 2003 Known polymorphisms between the two parental strains • Overlap between polymorphism and eQTL = cis-acting transcriptional regulation For example: • The C5 gene 2 bp deletion in the coding region in DBA mice resulting in rapid transcript decay compared with B6. A LOD of 27.4 centred over the C5 gene on chromosome 2 is readily detected (black curve). • The Alad gene present in 2 copies in DBA
  • 39. Whole genome association studies Mapping eQTLs Schadt et al., Nature, 2003 Combining clinical, gene expression and genetic factors • Classical QTLs for FPM: 4 significant loci • Further analyses with subgroups: additional loci identified • Some QTLs only affect a subset of the F2 population, demonstrating the complexity underlying traits such as obesity
  • 40. Whole genome association studies Mapping eQTLs Dixon et al., Nature Genet., 2007: A genome-wide association study of global gene expression • 206 families of British descent using immortalized lymphoblastoid cell lines (LCLs) from 400 children (Affy microarrays; 54,675 transcripts ~ 20,599 genes) ~15,000 H2 > 0.3 Gene Ontology descriptors for: • Response to unfolded protein (HSFs, chaperones) • Immune responses and apoptosis • Regulation of progression through the cell cycle, • RNA processing and DNA repair.
  • 41. Whole genome association studies Mapping eQTLs Dixon et al., Nature Genet., 2007: A genome-wide association study of global gene expression • 206 families of British descent using immortalized lymphoblastoid cell lines (LCLs) from 400 children (Affy microarrays; 54,675 transcripts ~ 20,599 genes) • Trans effects are weaker than those in cis • Nevertheless, significant trans associations were detected: e.g. 1) ~700 transcripts with the peak of association on the same chromosome but >100 kb from the nearest transcribed gene, 2) 10,382 transcripts, the peak of association was on a different chromosome
  • 42. Whole genome association studies Mapping eQTLs Using eQTLs to better understand GWAS results Libioulle et al., PLOS Genet., 2007 GWAS for Crohn’s disease • One of the neighboring genes PTGER4 may be 1.25 Mb Gene desert involved • Trace eQTLs in LCL data • Disease-associated polymorphisms may be regulating PTGER4 expression in cis, but >250 kb away  more research needed but likely regulatory polymorphism
  • 43. Whole genome association studies Mapping eQTLs We looked at SNPs but what about other structural variants? Stranger et al., Science, 2007: Relative Impact of Nucleotide and Copy Number Variation on Gene Expression Phenotypes • LCLs of 210 unrelated HapMap individuals from four populations • Copy number variants were identified via CGH against a common reference individual SNP CNV From probe associated with linked gene From probe associated with linked gene • 83.6% and 17.7% of the total detected genetic variation in gene expression • SNPs close to their respective genes, less so for CNVs • Little overlap between SNP and CNV associations (only 20%) • Not “mere” gene dosage effects
  • 44. Whole genome association studies How universal are GWAS findings? Frazer et al., Nat. Rev. Genet., 2010 Associated with myocardial infarction • Allele frequencies are different in different populations • LD patterns across loci that co-segregate with a causally associated variant may be different LD less strong in African population from population to  bottleneck principle population • Control for population differences is essential Red = high pairwise SNPs that efficiently (r2 > in large studies SNP correlation 0.8) tag one another are connected
  • 45. Whole genome association studies Impact so far • No complex traits for which there is > 10% of the genetic variance explained e.g. T2D: 18 genetic variants together < 4% of the total trait liability • Sample size may compensate (increased statistical power) But…studies for lipid phenotypes involving >40,000 people still <10% … some diseases have only a low number of affected individuals • Does the answer lie in structural variants? Most are still unmapped But… they are likely in LD with common SNPs • Does the answer lie in rare variants? Possibly… • Rare variants are not in LD with tagging SNPs and thus so far undetected (Amish study) • Can have very high penetrance • However, how to detect on a population-wide basis?
  • 46. Whole genome association studies The power of whole-genome sequencing Miller syndrome: autosomal recessive genetic trait (Roach et al., Science, 2010) • Sequenced genomes of 2 parents and 2 children, both affected by Miller Syndrome • Identified 3.7 million SNPs that varied within the family • Resequenced 34000 candidate mutations  28 de novo mutations • Narrowing down via “rare” assumption and knowledge of recessive inheritance • Found one gene, dihydroorotate dehydrogenase (DHOH) known to be involved
  • 47. Entering the age of personalized medicine Toward the elucidation of each person’s genetic make-up Necessary for: 1) DNA-based risk assessment for common complex disease 2) Drug discovery (new implicated genes can be identified) But also to: 3) Identify molecular signatures for disease diagnosis and prognosis And for: 4) A DNA-guided therapy and dose selection A person’s genetic make-up significantly affects the efficacy of a drug • Polymorphisms in the VKORC1 and CYP2C9 genes dictate the effective dose levels of the anti-coagulant Warfarin • Polymorphisms in the UGT1A1 gene correlate with increased toxicity of the anti-colon cancer drug Irinotecan • Polymorphisms in the MTHFR gene are associated with increased toxicity of Methotrexate used to treat Crohn’s disease • Polymorphisms in the CYP2D6 gene dictates the probability of relapse in women with breastcancer treated with Tamoxifen
  • 48. Entering the age of personalized medicine The revolution of high-throughput sequencing: Illumina Metzker et al., Nat. Rev. Genet., 2010 Solid phase amplification: 1) initial priming and extending of the single-stranded, single-molecule template, and 2) bridge amplification of the immobilized template with immediately adjacent primers to form clusters. 1 1
  • 49. Entering the age of personalized medicine From sequence to genome: mapping reads Trapnell and Salzberg, Nat. Biotech., 2009 Using BW, the index for the entire human Four sequences of equal genome fits into < 2 strength = seeds Gb of memory If 1SNP, the other 3 Is 30 times faster seeds intact; than indexing If 2 SNPs, the other 2 seeds intact; Also is limited to 2 SNPs within one Thus, max 2 SNPs/read read Limitation: Indexing takes up huge memory
  • 50. Entering the age of personalized medicine Burrows-Wheeler transform Wikipedia Easier to compress strings with runs of repeated characters
  • 51. Entering the age of personalized medicine A first human genome project using HTS Bentley et al., Nature, 2008 • Solexa Technology • First: X-chromosome • 204 million reads • Sampling of sequence fragments is close to random (GC content slight effect)
  • 52. Entering the age of personalized medicine A first human genome project using HTS Bentley et al., Nature, 2008 • 135 Gb of sequence (~4 billion paired 35-base reads) (8 weeks) • The approximate consumables cost = $250,000 • 97% of the reads were aligned using MAQ • 99.9% of the human reference covered with ≥ 1 reads at 40.6X 99% agreement with HapMap results!
  • 53. Entering the age of personalized medicine More human genome projects Snyder et al., G&D, 2010
  • 54. Entering the age of personalized medicine More human genome projects Snyder et al., G&D, 2010
  • 55. Entering the age of personalized medicine More human genome projects Snyder et al., G&D, 2010
  • 56. Entering the age of personalized medicine Tackling the SV problem using HTS • Really difficult and progress is limited. • Existing methods are based on two approaches: • Paired-end mapping (PEM) • Depth-of-coverage (DOC) approach • The ends of each fragment tagged by a biotinylated (B) nucleotide • Circularization forms a junction between the two ends • Random fragmentation and recovery of biotinylated fragments • Circularized DNA is randomly fragmented and the biotinylated junction fragments are recovered • Standard sequencing procedure thereafter
  • 57. Entering the age of personalized medicine Tackling the SV problem using HTS: paired-end mapping Medvedev et al., Nature Meth., 2009
  • 58. Entering the age of personalized medicine Tackling the SV problem using HTS: DOC Snyder et al., G&D, 2010 Campbell et al., Nature Genet., 2008
  • 59. Entering the age of personalized medicine Tackling the SV problem using HTS: state-of-the-art Snyder et al., G&D, 2010