SlideShare una empresa de Scribd logo
1 de 58
Descargar para leer sin conexión
Introduction to




Bioinformatics


                  1
Introduction to Bioinformatics.
LECTURE 5: Variation within and between
                    species

*   Chapter 5: Are Neanderthals among us?



                                        2
Neandertal, Germany, 1856
              Initial interpretations:

                    * bear skull
                    * pathological idiot
                    * Old Dutchman ...




                         3
Introduction to Bioinformatics
LECTURE 5: INTER- AND INTRASPECIES VARIATION




                                     4
Introduction to Bioinformatics
LECTURE 5: INTER- AND INTRASPECIES VARIATION




                                     5
Introduction to Bioinformatics
LECTURE 5: INTER- AND INTRASPECIES VARIATION




                                     6
Introduction to Bioinformatics
LECTURE 5: INTER- AND INTRASPECIES VARIATION



 5.1 Variation in DNA sequences

 * Even closely related individuals differ in genetic sequences

 * (point) mutations : copy error at certain location

 * Sexual reproduction – diploid genome




                                                  7
Introduction to Bioinformatics
5.1 VARIATION IN DNA SEQUENCES

Diploid chromosomes




                                 8
Introduction to Bioinformatics
5.1 VARIATION IN DNA SEQUENCES

Mitosis: diploid reproduction




                                 9
Introduction to Bioinformatics
5.1 VARIATION IN DNA SEQUENCES

Meiosis: diploid (=double) → haploid (=single)




                                  10
Introduction to Bioinformatics
5.1 VARIATION IN DNA SEQUENCES




  * typing error rate very good typist: 1 error / 1K typed letters

  * all our diploid cells constantly reproduce 7 billion letters

  * typical cell copying error rate is ~ 1 error /1 Gbp




                                                    11
Introduction to Bioinformatics
5.1 VARIATION IN DNA SEQUENCES


 GERM LINE
 Reverse time and follow your cells:

 • Now you count ~ 1013 cells
 • One generation ago you had 2 cells ‘somewhere’ in your parents body
 • Small T generations ago you had (2T – multiple ancestors) cells
 • Large T generations ago you counted #(fertile ancestors) cells
 • Congratulations: you are 3.4 billion years old !!!


 Fast-forward time and follow your cells:
 • Only a few cells in your reproductive organs have a chance to live on
 in the next generations

 • The rest (including you) will die …                   12
Introduction to Bioinformatics
5.1 VARIATION IN DNA SEQUENCES



   GERM LINE MUTATIONS
   This potentially immortal lineage of (germ) cells is
   called the GERM LINE

   All mutations that we have accumulated are en route on
   the germ line




                                              13
Introduction to Bioinformatics
5.1 VARIATION IN DNA SEQUENCES


  * Polymorphism : multiple possibilities for a nucleotide: allelle

  * Single Nucleotide Polymorphism – SNP (“snip”) point mutation
    example: AAATAAA vs AAACAAA

  * Humans: SNP = 1/1500 bases = 0.067%

  * STR = Short Tandem Repeats (microsatelites)
    example: CACACACACACACACACA …

  * Transition - transversion



                                                 14
Introduction to Bioinformatics
5.1 VARIATION IN DNA SEQUENCES

Purines – Pyrimidines




                                 15
Introduction to Bioinformatics
5.1 VARIATION IN DNA SEQUENCES


 Transitions – Transversions




                                 16
Introduction to Bioinformatics
LECTURE 5: INTER- AND INTRASPECIES VARIATION


 5.2 Mitochondrial DNA

 * mitochondriae are inherited only via the maternal line!!!

 * Very suitable for comparing evolution, not reshuffled




                                                 17
Introduction to Bioinformatics
 5.2 MITOCHONDRIAL DNA




H.sapiens mitochondrion          18
Introduction to Bioinformatics
 5.2 MITOCHONDRIAL DNA




           EM photograph of H. Sapiens mtDNA
                                               19
Introduction to Bioinformatics
 5.2 MITOCHONDRIAL DNA




                                 20
Introduction to Bioinformatics
LECTURE 5: INTER- AND INTRASPECIES VARIATION


 5.3 Variation between species

 * genetic variation accounts for morphological-
 physiological-behavioral variation

 * Genetic variation (c.q. distance) relates to phylogenetic
 relation (=relationship)

 * Necessity to measure distances between sequences: a
 metric


                                               21
Introduction to Bioinformatics
5.3 VARIATION BETWEEN SPECIES

Substitution rate
* Mutations originate in single individuals

* Mutations can become fixed in a population

* Mutation rate: rate at which new mutations arise

* Substitution rate: rate at which a species fixes new mutations

* For neutral mutations



                                                22
Introduction to Bioinformatics
5.3 VARIATION BETWEEN SPECIES


    Substitution rate and mutation rate
    * For neutral mutations

    * ρ = 2Nμ*1/(2N) = μ

    * ρ = K/(2T)




                                  23
Introduction to Bioinformatics
LECTURE 5: INTER- AND INTRASPECIES VARIATION


   5.4 Estimating genetic distance

   * Substitutions are independent (?)
   * Substitutions are random
   * Multiple substitutions may occur
   * Back-mutations mutate a nucleotide back to an earlier value




                                              24
Introduction to Bioinformatics
 5.4 ESTIMATING GENETIC DISTANCE



     Multiple substitutions and Back-mutations
     conceal the real genetic distance



      GACTGATCCACCTCTGATCCTTTGGAACTGATCGT
      TTCTGATCCACCTCTGATCCTTTGGAACTGATCGT
      TTCTGATCCACCTCTGATCCATCGGAACTGATCGT
      GTCTGATCCACCTCTGATCCATTGGAACTGATCGT
                                           observed : 2 (= d)
                                           actual :   4 (= K)


                                                 25
Introduction to Bioinformatics
 5.4 ESTIMATING GENETIC DISTANCE



    * Saturation: on average one substitution per site

    * Two random sequences of equal length will match
    for approximately ¼ of their sites

    * In saturation therefore the proportional genetic
    distance is ¼




                                                26
Introduction to Bioinformatics
5.4 ESTIMATING GENETIC DISTANCE



      * True genetic distance (proportion): K

      * Observed proportion of differences: d

      * Due to back-mutations K ≥ d




                                                27
Introduction to Bioinformatics
 5.4 ESTIMATING GENETIC DISTANCE



 SEQUENCE EVOLUTION is a Markov process: a
 sequence at generation (= time) t depends only the
 sequence at generation t-1




                                              28
Introduction to Bioinformatics
 5.4 ESTIMATING GENETIC DISTANCE

 The Jukes-Cantor model
 Correction for multiple substitutions

 Substitution probability per site per second is α

 Substitution means there are 3 possible replacements
 (e.g. C → {A,G,T})

 Non-substitution means there is 1 possibility
 (e.g. C → C)



                                                 29
Introduction to Bioinformatics
 5.4 THE JUKES-CANTOR MODEL

 Therefore, the one-step Markov process has the following
 transition matrix:

                     A      C     G      T
               A     1-α    α/3   α/3    α/3
               C     α/3    1-α   α/3    α/3
 MJC =
               G     α/3    α/3   1-α    α/3
               T     α/3    α/3   α/3    1-α




                                               30
Introduction to Bioinformatics
 5.4 THE JUKES-CANTOR MODEL
  After t generations the substitution probability is:


         M(t) = MJCt

  Eigen-values and eigen-vectors of M(t):

         λ1 = 1, (multiplicity 1):      v1 = 1/4 (1 1 1 1)T

         λ2..4 = 1-4α/3, (multiplicity 3): v2 = 1/4 (-1 -1 1 1)T
                                         v3 = 1/4 (-1 -1 -1 1)T
                                         v4 = 1/4 (1 -1 1 -1)T


                                                      31
Introduction to Bioinformatics
 5.4 THE JUKES-CANTOR MODEL
 Spectral decomposition of M(t):


        MJCt = ∑i λitviviT

 Define M(t) as:
                           r(t)     s(t)   s(t)   s(t)
                           s(t)     r(t)   s(t)   s(t)
               MJCt =      s(t)     s(t)   r(t)   s(t)
                           s(t)     s(t)   s(t)   r(t)


 Therefore, substitution probability s(t) per site after t
 generations is:

               s(t) = ¼ - ¼ (1 - 4α/3)t            32
Introduction to Bioinformatics
 5.4 THE JUKES-CANTOR MODEL

   substitution probability s(t) per site after t generations:

                   s(t) = ¼ - ¼ (1 - 4α/3)t
   observed genetic distance d after t generations ≈ s(t) :


                   d = ¼ - ¼ (1 - 4α/3)t
   For small α :            3
                       t≈−    ln (1 − 4 d )
                           4α         3


                                                   33
Introduction to Bioinformatics
 5.4 THE JUKES-CANTOR MODEL

 For small α the observed genetic distance is:
                   3
              t≈−    ln (1 − 4 d )
                  4α         3


 The actual genetic distance is (of course):
              K = αt

 So:
              K ≈ − 3 ln (1 − 4 d )
                    4         3


 This is the Jukes-Cantor formula : independent of α and t.


                                                 34
Introduction to Bioinformatics
 5.4 THE JUKES-CANTOR MODEL

   The Jukes-Cantor formula :    K ≈ − 3 ln (1 − 4 d )
                                       4         3




   For small d using ln(1+x) ≈ x :   K≈d
   So: actual distance ≈ observed distance


   For saturation: d ↑ ¾ :           K →∞
   So: if observed distance corresponds to random sequence-
   distance then the actual distance becomes indeterminate

                                               35
Jukes-Cantor




               36
Introduction to Bioinformatics
 5.4 THE JUKES-CANTOR MODEL

 Variance in K
                                                   2
                           ∂K               ∂K 
 If: K = f(d) then: 2δK =      δd ⇒ δK 2 =      δd
                                                       2


                  ∂K     ∂d               ∂d 
 So: Var ( K ) =  ∂d  Var(d )
                     

 Generation of a sequence of length n with substitution rate
                                     n k
 d is a binomial process: Prob(k ) =  d (1 − d ) n − k
                                     k 
                                      
 and therefore with variance: Var(d) = d(1-d)/n
                                           ∂K    1
 Because of the Jukes-Cantor formula:          =
                                           ∂d 1 − 4 d
                                                  3

                                                           37
Introduction to Bioinformatics
 5.4 THE JUKES-CANTOR MODEL

 Variance in K

 Variance: Var(d) = d(1-d)/n

               ∂K    1
 Jukes-Cantor:    =
               ∂d 1 − 4 d
                      3



 So:                   d (1 − d )
          Var ( K ) ≈
                      n(1 − 4 d ) 2
                            3




                                      38
Var(K)




         39
Introduction to Bioinformatics
 5.4 THE JUKES-CANTOR MODEL


  EXAMPLE 5.4 on page 90

  * Create artificial data with n = 1000: generate K* mutations
  * Count d
  * With Jukes-Cantor relation reconstruct estimate K(d)
  * Plot K(d) – K*




                                               40
Introduction to Bioinformatics
 5.4 EXAMPLE 5.4 on page 90




                                 41
Introduction to Bioinformatics
 5.4 EXAMPLE 5.4 on page 90




                                 42
Introduction to Bioinformatics
 5.4 EXAMPLE 5.4 on page 90




                                 43
Introduction to Bioinformatics
 5.4 EXAMPLE 5.4 on page 90 (= FIG 5.3)




                                      44
Introduction to Bioinformatics
 5.4 ESTIMATING GENETIC DISTANCE

 The Kimura 2-parameter model
 Include substitution bias in correction factor

 Transition probability (G↔A and T↔C) per site per second
 is α

 Transversion probability (G↔T, G↔C, A↔T, and A↔C)
 per site per second is β




                                                  45
Introduction to Bioinformatics
 5.4 THE KIMURA 2-PARAM MODEL

 The one-step Markov process substitution matrix
 now becomes:



                    A       C       G       T
             A      1-α-β   β       α       β
 MK2P =      C      β       1-α-β   β       α
             G      α       β       1-α-β   β
             T      β       α       β       1-α-β




                                                    46
Introduction to Bioinformatics
 5.4 THE KIMURA 2-PARAM MODEL
  After t generations the substitution probability is:


         M(t) = MK2Pt

  Determine of M(t):

         eigen-values {λi}

         and eigen-vectors {vi}



                                                   47
Introduction to Bioinformatics
 5.4 THE KIMURA 2-PARAM MODEL
  Spectral decomposition of M(t):


         MK2Pt = ∑i λitviviT

  Determine fraction of transitions per site after t
  generations : P(t)

  Determine fraction of transitions per site after t
  generations : Q(t)

  Genetic distance: K ≈ - ½ ln(1-2P-Q) – ¼ ln(1 – 2Q)

  Fraction of substitutions d = P + Q → Jukes-Cantor
                                              48
Introduction to Bioinformatics
 5.4 ESTIMATING GENETIC DISTANCE

Other models for nucleotide evolution
* Different types of transitions/transversions

* Pairwise substitutions GTR (= General Time Reversible) model

* Amino-acid substitutions matrices

*…




                                                 49
Introduction to Bioinformatics
 5.4 ESTIMATING GENETIC DISTANCE

Other models for nucleotide evolution
DEFICIT:

all above models assume symmetric substitution probs;

      prob(A→T) = prob(T→A)

Now strong evidence that this assumption is not true

Challenge: incorporate this in a self-consistent model


                                               50
Introduction to Bioinformatics
LECTURE 5: INTER- AND INTRASPECIES VARIATION


  5.5 CASE STUDY: Neanderthals

  * mtDNA of 206 H. sapiens from different regions

  * Fragments of mtDNA of 2 H. neanderthaliensis, including
  the original 1856 specimen.

  * all 208 samples from GenBank

  * A homologous sequence of 800 bp of the HVR could be
  found in all 208 specimen.


                                              51
Introduction to Bioinformatics
5.5 CASE STUDY: Neanderthals

  * Pairwise genetic difference – corrected with Jukes-Cantor
  formula

  * d(i,j) is JC-corrected genetic difference between pair (i,j);

  * dT = d

  * MDS (Multi Dimensional Scaling): translate distance table
  d to a nD-map X, here 2D-map




                                                  52
Introduction to Bioinformatics
5.5 CASE STUDY: Neanderthals
                     distance map d(i,j)




                                           53
Introduction to Bioinformatics
5.5 CASE STUDY: Neanderthals
   MDS




                                       ted
                                se para
                          well-       H. neanderthaliensis




                       H. sapiens




                                                   54
Introduction to Bioinformatics
5.5 CASE STUDY: Neanderthals
phylogentic tree




                                 55
END of LECTURE 5



             56
Introduction to Bioinformatics
LECTURE 5: INTER- AND INTRASPECIES VARIATION




                                               57
58

Más contenido relacionado

La actualidad más candente

Gutell 077.mbe.2001.18.1654
Gutell 077.mbe.2001.18.1654Gutell 077.mbe.2001.18.1654
Gutell 077.mbe.2001.18.1654
Robin Gutell
 
1-s2.0-037811199390549I-main
1-s2.0-037811199390549I-main1-s2.0-037811199390549I-main
1-s2.0-037811199390549I-main
Teresa Zimny
 
PNAS-2013-Arambula-8212-7
PNAS-2013-Arambula-8212-7PNAS-2013-Arambula-8212-7
PNAS-2013-Arambula-8212-7
Wenge Wong
 

La actualidad más candente (20)

1041
10411041
1041
 
IGEM poster
IGEM posterIGEM poster
IGEM poster
 
Ssr assignment
Ssr assignmentSsr assignment
Ssr assignment
 
Transcriptional factors in stress tolerance
Transcriptional factors in stress toleranceTranscriptional factors in stress tolerance
Transcriptional factors in stress tolerance
 
Gutell 077.mbe.2001.18.1654
Gutell 077.mbe.2001.18.1654Gutell 077.mbe.2001.18.1654
Gutell 077.mbe.2001.18.1654
 
1-s2.0-037811199390549I-main
1-s2.0-037811199390549I-main1-s2.0-037811199390549I-main
1-s2.0-037811199390549I-main
 
Biological method of transformation
Biological method of transformation Biological method of transformation
Biological method of transformation
 
Dominant and codominant markers30nov
Dominant and codominant markers30novDominant and codominant markers30nov
Dominant and codominant markers30nov
 
Molecular markers application in fisheries
Molecular markers application in fisheriesMolecular markers application in fisheries
Molecular markers application in fisheries
 
Tissue Culture and Cloning Work
Tissue Culture and Cloning WorkTissue Culture and Cloning Work
Tissue Culture and Cloning Work
 
DNA sequencing
DNA sequencing  DNA sequencing
DNA sequencing
 
PNAS-2013-Arambula-8212-7
PNAS-2013-Arambula-8212-7PNAS-2013-Arambula-8212-7
PNAS-2013-Arambula-8212-7
 
Chloroplast transformation
Chloroplast transformationChloroplast transformation
Chloroplast transformation
 
pc DNA3
pc DNA3pc DNA3
pc DNA3
 
Molecular markers in legumes
Molecular markers in legumesMolecular markers in legumes
Molecular markers in legumes
 
Biochemical and molecular markers for characterization
Biochemical and molecular markers for characterizationBiochemical and molecular markers for characterization
Biochemical and molecular markers for characterization
 
Biotechnology final
Biotechnology finalBiotechnology final
Biotechnology final
 
Oryza sativa
Oryza sativaOryza sativa
Oryza sativa
 
Structure of a carotenoid gene cluster from Pantoea sp. strain C1B1Y
Structure of a carotenoid gene  cluster from Pantoea sp. strain C1B1YStructure of a carotenoid gene  cluster from Pantoea sp. strain C1B1Y
Structure of a carotenoid gene cluster from Pantoea sp. strain C1B1Y
 
Fine structureof gene,allelic complementation,and split gene
Fine structureof gene,allelic complementation,and split gene Fine structureof gene,allelic complementation,and split gene
Fine structureof gene,allelic complementation,and split gene
 

Similar a Varriation Within and Between Species

Tryptophan Scanning Reveals Dense Packing of Connexin Transmembrane Domains i...
Tryptophan Scanning Reveals Dense Packing of Connexin Transmembrane Domains i...Tryptophan Scanning Reveals Dense Packing of Connexin Transmembrane Domains i...
Tryptophan Scanning Reveals Dense Packing of Connexin Transmembrane Domains i...
Nicholas Vaughn
 
Reverse genetics Approaches in Crop.pptx
Reverse genetics Approaches in Crop.pptxReverse genetics Approaches in Crop.pptx
Reverse genetics Approaches in Crop.pptx
ManjeetKhokhar
 
119. Gene Pyramiding A Strategy for Durable Crop Protection in Vegetable Crop...
119. Gene Pyramiding A Strategy for Durable Crop Protection in Vegetable Crop...119. Gene Pyramiding A Strategy for Durable Crop Protection in Vegetable Crop...
119. Gene Pyramiding A Strategy for Durable Crop Protection in Vegetable Crop...
DeepikaSood21
 
transposons complete ppt
transposons complete ppttransposons complete ppt
transposons complete ppt
tauseefsko
 
Thermostable polymerases areisolated from microorganisms that inha.pdf
Thermostable polymerases areisolated from microorganisms that inha.pdfThermostable polymerases areisolated from microorganisms that inha.pdf
Thermostable polymerases areisolated from microorganisms that inha.pdf
rushabhshah600
 

Similar a Varriation Within and Between Species (20)

Collegepart B.Burgering Deel 2
Collegepart B.Burgering Deel 2Collegepart B.Burgering Deel 2
Collegepart B.Burgering Deel 2
 
Markers
MarkersMarkers
Markers
 
Genetic diversity clustering and AMOVA
Genetic diversityclustering and AMOVAGenetic diversityclustering and AMOVA
Genetic diversity clustering and AMOVA
 
QTL lecture for Bio4025
QTL lecture for Bio4025QTL lecture for Bio4025
QTL lecture for Bio4025
 
Tryptophan Scanning Reveals Dense Packing of Connexin Transmembrane Domains i...
Tryptophan Scanning Reveals Dense Packing of Connexin Transmembrane Domains i...Tryptophan Scanning Reveals Dense Packing of Connexin Transmembrane Domains i...
Tryptophan Scanning Reveals Dense Packing of Connexin Transmembrane Domains i...
 
Igor Segota: PhD thesis presentation
Igor Segota: PhD thesis presentationIgor Segota: PhD thesis presentation
Igor Segota: PhD thesis presentation
 
Clinical molecular diagnostics for drug guidance
Clinical molecular diagnostics for drug guidanceClinical molecular diagnostics for drug guidance
Clinical molecular diagnostics for drug guidance
 
Tetrad analysis by rk
Tetrad analysis by rkTetrad analysis by rk
Tetrad analysis by rk
 
Reverse genetics Approaches in Crop.pptx
Reverse genetics Approaches in Crop.pptxReverse genetics Approaches in Crop.pptx
Reverse genetics Approaches in Crop.pptx
 
Vivo vitrothingamajig
Vivo vitrothingamajigVivo vitrothingamajig
Vivo vitrothingamajig
 
Bioinformatica 20-10-2011-t3-scoring matrices
Bioinformatica 20-10-2011-t3-scoring matricesBioinformatica 20-10-2011-t3-scoring matrices
Bioinformatica 20-10-2011-t3-scoring matrices
 
119. Gene Pyramiding A Strategy for Durable Crop Protection in Vegetable Crop...
119. Gene Pyramiding A Strategy for Durable Crop Protection in Vegetable Crop...119. Gene Pyramiding A Strategy for Durable Crop Protection in Vegetable Crop...
119. Gene Pyramiding A Strategy for Durable Crop Protection in Vegetable Crop...
 
M Sc Molecular Biology Final- project SV.pptx
M Sc Molecular Biology Final-  project SV.pptxM Sc Molecular Biology Final-  project SV.pptx
M Sc Molecular Biology Final- project SV.pptx
 
transposons complete ppt
transposons complete ppttransposons complete ppt
transposons complete ppt
 
2014 davis-talk
2014 davis-talk2014 davis-talk
2014 davis-talk
 
DNA Chip
DNA ChipDNA Chip
DNA Chip
 
Thermostable polymerases areisolated from microorganisms that inha.pdf
Thermostable polymerases areisolated from microorganisms that inha.pdfThermostable polymerases areisolated from microorganisms that inha.pdf
Thermostable polymerases areisolated from microorganisms that inha.pdf
 
Dynamics, control and synchronization of some models of neuronal oscillators
Dynamics, control and synchronization of some models of neuronal oscillatorsDynamics, control and synchronization of some models of neuronal oscillators
Dynamics, control and synchronization of some models of neuronal oscillators
 
Cancer, Quantum Computing and TP53 Tumor Suppressor Gene Mutations Prediction...
Cancer, Quantum Computing and TP53 Tumor Suppressor Gene Mutations Prediction...Cancer, Quantum Computing and TP53 Tumor Suppressor Gene Mutations Prediction...
Cancer, Quantum Computing and TP53 Tumor Suppressor Gene Mutations Prediction...
 
Introduction-to-Bioinformatics-1.ppt
Introduction-to-Bioinformatics-1.pptIntroduction-to-Bioinformatics-1.ppt
Introduction-to-Bioinformatics-1.ppt
 

Último

Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Último (20)

Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 

Varriation Within and Between Species

  • 2. Introduction to Bioinformatics. LECTURE 5: Variation within and between species * Chapter 5: Are Neanderthals among us? 2
  • 3. Neandertal, Germany, 1856 Initial interpretations: * bear skull * pathological idiot * Old Dutchman ... 3
  • 4. Introduction to Bioinformatics LECTURE 5: INTER- AND INTRASPECIES VARIATION 4
  • 5. Introduction to Bioinformatics LECTURE 5: INTER- AND INTRASPECIES VARIATION 5
  • 6. Introduction to Bioinformatics LECTURE 5: INTER- AND INTRASPECIES VARIATION 6
  • 7. Introduction to Bioinformatics LECTURE 5: INTER- AND INTRASPECIES VARIATION 5.1 Variation in DNA sequences * Even closely related individuals differ in genetic sequences * (point) mutations : copy error at certain location * Sexual reproduction – diploid genome 7
  • 8. Introduction to Bioinformatics 5.1 VARIATION IN DNA SEQUENCES Diploid chromosomes 8
  • 9. Introduction to Bioinformatics 5.1 VARIATION IN DNA SEQUENCES Mitosis: diploid reproduction 9
  • 10. Introduction to Bioinformatics 5.1 VARIATION IN DNA SEQUENCES Meiosis: diploid (=double) → haploid (=single) 10
  • 11. Introduction to Bioinformatics 5.1 VARIATION IN DNA SEQUENCES * typing error rate very good typist: 1 error / 1K typed letters * all our diploid cells constantly reproduce 7 billion letters * typical cell copying error rate is ~ 1 error /1 Gbp 11
  • 12. Introduction to Bioinformatics 5.1 VARIATION IN DNA SEQUENCES GERM LINE Reverse time and follow your cells: • Now you count ~ 1013 cells • One generation ago you had 2 cells ‘somewhere’ in your parents body • Small T generations ago you had (2T – multiple ancestors) cells • Large T generations ago you counted #(fertile ancestors) cells • Congratulations: you are 3.4 billion years old !!! Fast-forward time and follow your cells: • Only a few cells in your reproductive organs have a chance to live on in the next generations • The rest (including you) will die … 12
  • 13. Introduction to Bioinformatics 5.1 VARIATION IN DNA SEQUENCES GERM LINE MUTATIONS This potentially immortal lineage of (germ) cells is called the GERM LINE All mutations that we have accumulated are en route on the germ line 13
  • 14. Introduction to Bioinformatics 5.1 VARIATION IN DNA SEQUENCES * Polymorphism : multiple possibilities for a nucleotide: allelle * Single Nucleotide Polymorphism – SNP (“snip”) point mutation example: AAATAAA vs AAACAAA * Humans: SNP = 1/1500 bases = 0.067% * STR = Short Tandem Repeats (microsatelites) example: CACACACACACACACACA … * Transition - transversion 14
  • 15. Introduction to Bioinformatics 5.1 VARIATION IN DNA SEQUENCES Purines – Pyrimidines 15
  • 16. Introduction to Bioinformatics 5.1 VARIATION IN DNA SEQUENCES Transitions – Transversions 16
  • 17. Introduction to Bioinformatics LECTURE 5: INTER- AND INTRASPECIES VARIATION 5.2 Mitochondrial DNA * mitochondriae are inherited only via the maternal line!!! * Very suitable for comparing evolution, not reshuffled 17
  • 18. Introduction to Bioinformatics 5.2 MITOCHONDRIAL DNA H.sapiens mitochondrion 18
  • 19. Introduction to Bioinformatics 5.2 MITOCHONDRIAL DNA EM photograph of H. Sapiens mtDNA 19
  • 20. Introduction to Bioinformatics 5.2 MITOCHONDRIAL DNA 20
  • 21. Introduction to Bioinformatics LECTURE 5: INTER- AND INTRASPECIES VARIATION 5.3 Variation between species * genetic variation accounts for morphological- physiological-behavioral variation * Genetic variation (c.q. distance) relates to phylogenetic relation (=relationship) * Necessity to measure distances between sequences: a metric 21
  • 22. Introduction to Bioinformatics 5.3 VARIATION BETWEEN SPECIES Substitution rate * Mutations originate in single individuals * Mutations can become fixed in a population * Mutation rate: rate at which new mutations arise * Substitution rate: rate at which a species fixes new mutations * For neutral mutations 22
  • 23. Introduction to Bioinformatics 5.3 VARIATION BETWEEN SPECIES Substitution rate and mutation rate * For neutral mutations * ρ = 2Nμ*1/(2N) = μ * ρ = K/(2T) 23
  • 24. Introduction to Bioinformatics LECTURE 5: INTER- AND INTRASPECIES VARIATION 5.4 Estimating genetic distance * Substitutions are independent (?) * Substitutions are random * Multiple substitutions may occur * Back-mutations mutate a nucleotide back to an earlier value 24
  • 25. Introduction to Bioinformatics 5.4 ESTIMATING GENETIC DISTANCE Multiple substitutions and Back-mutations conceal the real genetic distance GACTGATCCACCTCTGATCCTTTGGAACTGATCGT TTCTGATCCACCTCTGATCCTTTGGAACTGATCGT TTCTGATCCACCTCTGATCCATCGGAACTGATCGT GTCTGATCCACCTCTGATCCATTGGAACTGATCGT observed : 2 (= d) actual : 4 (= K) 25
  • 26. Introduction to Bioinformatics 5.4 ESTIMATING GENETIC DISTANCE * Saturation: on average one substitution per site * Two random sequences of equal length will match for approximately ¼ of their sites * In saturation therefore the proportional genetic distance is ¼ 26
  • 27. Introduction to Bioinformatics 5.4 ESTIMATING GENETIC DISTANCE * True genetic distance (proportion): K * Observed proportion of differences: d * Due to back-mutations K ≥ d 27
  • 28. Introduction to Bioinformatics 5.4 ESTIMATING GENETIC DISTANCE SEQUENCE EVOLUTION is a Markov process: a sequence at generation (= time) t depends only the sequence at generation t-1 28
  • 29. Introduction to Bioinformatics 5.4 ESTIMATING GENETIC DISTANCE The Jukes-Cantor model Correction for multiple substitutions Substitution probability per site per second is α Substitution means there are 3 possible replacements (e.g. C → {A,G,T}) Non-substitution means there is 1 possibility (e.g. C → C) 29
  • 30. Introduction to Bioinformatics 5.4 THE JUKES-CANTOR MODEL Therefore, the one-step Markov process has the following transition matrix: A C G T A 1-α α/3 α/3 α/3 C α/3 1-α α/3 α/3 MJC = G α/3 α/3 1-α α/3 T α/3 α/3 α/3 1-α 30
  • 31. Introduction to Bioinformatics 5.4 THE JUKES-CANTOR MODEL After t generations the substitution probability is: M(t) = MJCt Eigen-values and eigen-vectors of M(t): λ1 = 1, (multiplicity 1): v1 = 1/4 (1 1 1 1)T λ2..4 = 1-4α/3, (multiplicity 3): v2 = 1/4 (-1 -1 1 1)T v3 = 1/4 (-1 -1 -1 1)T v4 = 1/4 (1 -1 1 -1)T 31
  • 32. Introduction to Bioinformatics 5.4 THE JUKES-CANTOR MODEL Spectral decomposition of M(t): MJCt = ∑i λitviviT Define M(t) as: r(t) s(t) s(t) s(t) s(t) r(t) s(t) s(t) MJCt = s(t) s(t) r(t) s(t) s(t) s(t) s(t) r(t) Therefore, substitution probability s(t) per site after t generations is: s(t) = ¼ - ¼ (1 - 4α/3)t 32
  • 33. Introduction to Bioinformatics 5.4 THE JUKES-CANTOR MODEL substitution probability s(t) per site after t generations: s(t) = ¼ - ¼ (1 - 4α/3)t observed genetic distance d after t generations ≈ s(t) : d = ¼ - ¼ (1 - 4α/3)t For small α : 3 t≈− ln (1 − 4 d ) 4α 3 33
  • 34. Introduction to Bioinformatics 5.4 THE JUKES-CANTOR MODEL For small α the observed genetic distance is: 3 t≈− ln (1 − 4 d ) 4α 3 The actual genetic distance is (of course): K = αt So: K ≈ − 3 ln (1 − 4 d ) 4 3 This is the Jukes-Cantor formula : independent of α and t. 34
  • 35. Introduction to Bioinformatics 5.4 THE JUKES-CANTOR MODEL The Jukes-Cantor formula : K ≈ − 3 ln (1 − 4 d ) 4 3 For small d using ln(1+x) ≈ x : K≈d So: actual distance ≈ observed distance For saturation: d ↑ ¾ : K →∞ So: if observed distance corresponds to random sequence- distance then the actual distance becomes indeterminate 35
  • 37. Introduction to Bioinformatics 5.4 THE JUKES-CANTOR MODEL Variance in K 2  ∂K   ∂K  If: K = f(d) then: 2δK =  δd ⇒ δK 2 =   δd 2  ∂K   ∂d   ∂d  So: Var ( K ) =  ∂d  Var(d )   Generation of a sequence of length n with substitution rate n k d is a binomial process: Prob(k ) =  d (1 − d ) n − k k    and therefore with variance: Var(d) = d(1-d)/n ∂K 1 Because of the Jukes-Cantor formula: = ∂d 1 − 4 d 3 37
  • 38. Introduction to Bioinformatics 5.4 THE JUKES-CANTOR MODEL Variance in K Variance: Var(d) = d(1-d)/n ∂K 1 Jukes-Cantor: = ∂d 1 − 4 d 3 So: d (1 − d ) Var ( K ) ≈ n(1 − 4 d ) 2 3 38
  • 39. Var(K) 39
  • 40. Introduction to Bioinformatics 5.4 THE JUKES-CANTOR MODEL EXAMPLE 5.4 on page 90 * Create artificial data with n = 1000: generate K* mutations * Count d * With Jukes-Cantor relation reconstruct estimate K(d) * Plot K(d) – K* 40
  • 41. Introduction to Bioinformatics 5.4 EXAMPLE 5.4 on page 90 41
  • 42. Introduction to Bioinformatics 5.4 EXAMPLE 5.4 on page 90 42
  • 43. Introduction to Bioinformatics 5.4 EXAMPLE 5.4 on page 90 43
  • 44. Introduction to Bioinformatics 5.4 EXAMPLE 5.4 on page 90 (= FIG 5.3) 44
  • 45. Introduction to Bioinformatics 5.4 ESTIMATING GENETIC DISTANCE The Kimura 2-parameter model Include substitution bias in correction factor Transition probability (G↔A and T↔C) per site per second is α Transversion probability (G↔T, G↔C, A↔T, and A↔C) per site per second is β 45
  • 46. Introduction to Bioinformatics 5.4 THE KIMURA 2-PARAM MODEL The one-step Markov process substitution matrix now becomes: A C G T A 1-α-β β α β MK2P = C β 1-α-β β α G α β 1-α-β β T β α β 1-α-β 46
  • 47. Introduction to Bioinformatics 5.4 THE KIMURA 2-PARAM MODEL After t generations the substitution probability is: M(t) = MK2Pt Determine of M(t): eigen-values {λi} and eigen-vectors {vi} 47
  • 48. Introduction to Bioinformatics 5.4 THE KIMURA 2-PARAM MODEL Spectral decomposition of M(t): MK2Pt = ∑i λitviviT Determine fraction of transitions per site after t generations : P(t) Determine fraction of transitions per site after t generations : Q(t) Genetic distance: K ≈ - ½ ln(1-2P-Q) – ¼ ln(1 – 2Q) Fraction of substitutions d = P + Q → Jukes-Cantor 48
  • 49. Introduction to Bioinformatics 5.4 ESTIMATING GENETIC DISTANCE Other models for nucleotide evolution * Different types of transitions/transversions * Pairwise substitutions GTR (= General Time Reversible) model * Amino-acid substitutions matrices *… 49
  • 50. Introduction to Bioinformatics 5.4 ESTIMATING GENETIC DISTANCE Other models for nucleotide evolution DEFICIT: all above models assume symmetric substitution probs; prob(A→T) = prob(T→A) Now strong evidence that this assumption is not true Challenge: incorporate this in a self-consistent model 50
  • 51. Introduction to Bioinformatics LECTURE 5: INTER- AND INTRASPECIES VARIATION 5.5 CASE STUDY: Neanderthals * mtDNA of 206 H. sapiens from different regions * Fragments of mtDNA of 2 H. neanderthaliensis, including the original 1856 specimen. * all 208 samples from GenBank * A homologous sequence of 800 bp of the HVR could be found in all 208 specimen. 51
  • 52. Introduction to Bioinformatics 5.5 CASE STUDY: Neanderthals * Pairwise genetic difference – corrected with Jukes-Cantor formula * d(i,j) is JC-corrected genetic difference between pair (i,j); * dT = d * MDS (Multi Dimensional Scaling): translate distance table d to a nD-map X, here 2D-map 52
  • 53. Introduction to Bioinformatics 5.5 CASE STUDY: Neanderthals distance map d(i,j) 53
  • 54. Introduction to Bioinformatics 5.5 CASE STUDY: Neanderthals MDS ted se para well- H. neanderthaliensis H. sapiens 54
  • 55. Introduction to Bioinformatics 5.5 CASE STUDY: Neanderthals phylogentic tree 55
  • 57. Introduction to Bioinformatics LECTURE 5: INTER- AND INTRASPECIES VARIATION 57
  • 58. 58