SlideShare una empresa de Scribd logo
1 de 98
FBW
              6-11-2012




Wim Van Criekinge
Inhoud Lessen: Bioinformatica




                                GEEN LES
Phylogenetics
                Introduction
                    Definitions
                    Species concept
                    Examples
                    The Tree-of-life
                Phylogenetics Methodologies
                    Algorithms
                      Distance Methods
                      Maximum Likelihood
                      Maximum Parsimony
                   Rooting
                   Statistical Validation
                Conclusions
                   Orthologous genes
                   Horizontal Gene Transfer
                   Phylogenomics
                Practical Approach: PHYLIP
                Weblems
What is phylogenetics ?

                Phylogeny (phylo =tribe + genesis)

                Phylogenetic trees are about visualising evolutionary
                  relationships. They reconstruct the pattern of events
                  that have led to the distribution and diversity of life.

                The purpose of a phylogenetic tree is to illustrate how a
                  group of objects (usually genes or organisms) are
                  related to one another

                Nothing in Biology Makes Sense Except in the Light of
                  Evolution. Theodosius Dobzhansky (1900-1975)
Trees

        • Diagram consisting of branches and nodes
        • Species tree (how are my species related?)
           – contains only one representative from each
             species.
           – all nodes indicate speciation events
        • Gene tree (how are my genes related?)
           – normally contains a number of genes from a
             single species
           – nodes relate either to speciation or gene
             duplication events
Clade: A set of species which includes all of the species
derived from a single common ancestor
S p e c ie s C o n c e p ts from V a rio u s A u th o rs
D .A . B a um a nd K .L . S ha w - E x c lu s iv e g rou p s o f org a n ism s, w h ere a n ex c lu s iv e g rou p is o ne w h ose m e m b ers are a ll m ore c lose ly re la ted to
       ea c h oth er th a n to a n y org a n is m s ou ts id e the g rou p .
J . C ra cra ft - A n irred u c ib le c lu ster o f org a n ism s, d iag n osab ly d is tin ct fro m oth er su c h c lu sters, a nd w ith in w h ic h there is a p are n ta l p a ttern o f
       a nc estry a nd d esce n t.
C ha rles D a rw in - "F rom these rem arks it w ill b e se e n th at I lo o k a t th e term sp e c ies, as o n e arb itrarily g iv e n for the sa k e o f c o n v e n ie nc e to a set
      o f in d iv id u a ls c lose ly rese m b lin g e ac h o ther, a nd th a t it d oes n ot essen tia lly d iffer from the term varie ty, w h ic h is g iv e n to l ess d istin ct a nd
      m ore flu c tu a ting form s. T he term varie ty, ag a in, in c o m p aris o n w ith m ere in d iv id u a l d iffere n ces, is a ls o a p p lied arb itrarily, a n d for m ere
      c o n ve n ie n ce sa k e " (O rig in o f S p ec ies, 1 st ed., p . 1 0 8 ).
T . D o b zha nsk y - T h e larg est a nd m ost in c lu s ive rep rod u ctiv e c om m u n ity o f sex u a l a nd cross-fertiliz ing in d iv id u a ls w h ic h sh are a c o m m o n g e ne
       p o o l. A nd la ter...S ys te m s o f p op u la tio ns, th e g e n e ex c ha ng e b e tw ee n w h ic h is lim ited or p re v e nted b y rep rod u ctiv e is o la ting m e c h a n is m s.
M . G hise lin - T h e m ost ex te ns ive u n its in the n atu ra l e c o n om y, su c h tha t rep r od u ctiv e c om p etitio n oc cu rs am o ng th e ir p arts.
D .M . L a m b ert - G rou p s o f ind iv id u a ls th at d e fin e th em se lv es b y a sp e c ific m a te rec og n itio n s ystem .
J . M a llet - Id e ntifia b le g e n o typ ic c lu sters re c og n iz e d b y a d e fic it o f in term ed iates, b o th a t s ing le lo c i a n d at m u ltip le lo c i.
E . M a y r - G rou p s o f ac tu a lly or p o te n tia lly in terb ree d ing na tu ra l p op u lat io ns w h ic h are rep rod u ctiv e ly is o la ted fro m oth er su c h g rou p s.
C .D . M ich en er - A g rou p o f org a n is m s n o t itse lf d iv is ib le b y p he n etic g ap s resu ltin g from c o nc ord a nt d iffere n ces in c harac ter states (ex c ep t for
       m orp hs - su ch as sex , ag e, or caste), b u t sep ara ted b y su ch p h e ne tic g ap s from o ther su c h u n its.
H .E .H . P a tte rso n - T h at m ost in c lu s iv e p op u latio n o f in d iv id u a l b ip are n ta l org a n is m s w h ic h sh are a c o m m o n fertiliz atio n s ystem .
G .G . S im p so n - A lin eag e o f p op u latio ns e v o lv in g w ith tim e, sep arate ly fro m ot h ers, w ith its ow n u n iq u e e v o lu tio n ary ro le a nd te nd e n c ies.
P .H .A . S nea th a nd R .R . S o k a l - T he sm a llest (m ost h o m og e n e ou s) c lu ster th at ca n b e re c og n iz ed u p o n s o m e g iv e n criterio n as b e in g d istin c t
       fro m oth er c lu sters.
A .R . T em p leto n - T h e m ost in c lu s ive p op u la tio n o f in d iv id u a ls h a v ing the p o te n ti a l for p h e n otyp ic c o he sio n throu g h in trins ic c o h es io n
       m e c ha n is m s (g e ne tic a nd /or d e m og rap h ic - i.e. ec o lo g ic a l -ex c h a ng e ab ility).
E .O . W iley - A s ing le lin e ag e o f a nc estor -d esc e nd a n t p op u latio ns w h ic h m a in ta ins its id e ntity fro m oth er s u ch lin e ag es a nd w h ic h h as its ow n
       e v o lu tio n ary te nd e n c ies a nd h istoric a l fate.
S . W rig ht - A sp ec ies in tim e a nd sp ac e is c o m p ose d o f nu m erou s lo ca l p op u latio ns, ea c h o ne in terc o m m u n ica ting a nd in terg rad ing w ith oth ers.
Species

I. Definitions:

Species = the basic unit of classification

> Three different ways to recognize species:
Plant Species

Definitions:

> Three different ways to recognize species:

1) Morphological species = the smallest group that is
    consistently and persistently distinct (Clusters in
    morphospace)

    species are recognized initially on the basis of
     appearance; the individuals of one species look
         different from the individuals of another
Species

Definitions:

> Three different ways to recognize species:

2) Biological species = a set of interbreeding or
    potentially interbreeding individuals that are
    separated from other species by reproductive
    barriers

    species are unable to interbreed
Species

Definitions:

> Three different ways to recognize species:

3) Phylogenetic species = the boundary between
   reticulate (among interbreeding individuals) and
   divergent relationships (between lineages with no
   gene exchange)
Phylogenetic species
             divergent




boundary



                   reticulate




 recognized by the pattern of ancestor - descendent relationships
Species

Definitions:

> Three different ways to recognize species:

4) Phylogenomics species = ability to transmit (and
   maintain) a (stable) gene pool

    Adresses the Anopheles genome topology
    variations
Branching Order in a Phylogenetic Tree

                    • In the tree to the left, A and B share the most recent
                      common ancestry. Thus, of the species in the
                      tree, A and B are the most closely related.
                    • The next most recent common ancestry is C with
                      the group composed of A and B. Notice that the
                      relationship of C is with the group containing A
                      and B. In particular, C is not more closely related to
                      B than to A. This can be emphasized by the
                      following two trees, which are equivalent to each
                      other:
More definitions …

                              Edge, Branch




   Leafs
   Tips
   external node                 Branch node, internal node

• A common simplifying assumption is that the three is bifurcating,
meaning that each brach node has exactly two descendents.
• The edges, taken together, are sometimes said to define the topology
of the tree
Outgroups, rooted versus unrooted


    An unrooted reptilian phylogeny with an avian outgroup and
    the corresponding rooted phylogeny. The Ri represent modern
    reptiles; the Ai, inferred ancestors and the B a bird.
Some definitions …
Examples

  Phylogenetic methods may be used to
  solve crimes, test purity of products, and
  determine whether endangered species
  have been smuggled or mislabeled:
   – Vogel, G. 1998. HIV strain analysis debuts in
     murder trial. Science 282(5390): 851-853.
   – Lau, D. T.-W., et al. 2001. Authentication of
     medicinal Dendrobium species by the internal
     transcribed spacer of ribosomal DNA. Planta
     Med 67:456-460.
Examples

 – Epidemiologists use phylogenetic methods to
   understand the development of
   pandemics, patterns of disease transmission, and
   development of antimicrobial resistance or
   pathogenicity:
    • Basler, C.F., et al. 2001. Sequence of the 1918
      pandemic influenza virus nonstructural gene (NS)
      segment and characterization of recombinant viruses
      bearing the 1918 NS genes. PNAS, 98(5):2746-2751.
    • Ou, C.-Y., et al. 1992. Molecular epidemiology of HIV
      transmission in a dental practice. Science
      256(5060):1165-1171.
    • Bacillus Antracis:
Examples
• Conservation biologists may use these techniques to
  determine which populations are in greatest need of
  protection, and other questions of population structure:
   – Trepanier, T.L., and R.W. Murphy. 2001. The Coachella Valley
     fringe-toed lizard (Uma inornata): genetic diversity and
     phylogenetic relationships of an endangered species. Mol
     Phylogenet Evol 18(3):327-334.
   – Alves, M.J., et al. 2001. Mitochondrial DNA variation in the
     highly endangered cyprinid fish Anaecypris hispanica:
     importance for conservation. Heredity 87(Pt 4):463-473.
• Pharmaceutical researchers may use phylogenetic
  methods to determine which species are most closely
  related to other medicinal species, thus perhaps sharing
  their medicinal qualities:
   – Komatsu, K., et al. 2001. Phylogenetic analysis based on 18S
     rRNA gene and matK gene sequences of Panax vietnamensis
     and five related species. Planta Med 67:461-465.
Tree-of-life
Some Important Dates in History

                      Origin of the Universe          15 billion yrs
                      Formation of the Solar System          4.6 "
                      First Self-replicating System          3.5 "
                      Prokaryotic-Eukaryotic Divergence 2.0 "
                      Plant-Animal Divergence                1.0 "
                      Invertebrate-Vertebrate Divergence 0.5 "
                      Mammalian Radiation Beginning          0.1 "
Tree Of Life
Tree Of Life
Tree Of Life
Tree Of Life
What Sequence to Use ?

                         • To infer relationships that span the
                           diversity of known life, it is
                           necessary to look at genes
                           conserved through the billions of
                           years of evolutionary divergence.
                         • The gene must display an
                           appropriate level of sequence
                           conservation for the divergences of
                           interest.
                              .
What Sequence to Use ?

                         • If there is too much change, then
                           the sequences become
                           randomized, and there is a limit to
                           the depth of the divergences that
                           can be accurately inferred.
                         • If there is too little change (if the
                           gene is too conserved), then there
                           may be little or no change between
                           the evolutionary branchings of
                           interest, and it will not be possible to
                           infer close (genus or species level)
                           relationships.
Ribosomal RNA Genes and Their Sequences

                    recognized the full potential of rRNA
                      sequences as a measure of phylogenetic
                      relatedness. He initially used an RNA
                      sequencing method that determined about
                      1/4 of the nucleotides in the 16S rRNA (the
                      best technology available at the time). This
                      amount of data greatly exceeded anything
                      else then available. Using newer methods,
                      it is now routine to determine the
  Carl Woese
                      sequence of the entire 16S rRNA
                      molecule. Today, the accumulated 16S
                      rRNA sequences (about 10,000) constitute
                      the largest body of data available for
                      inferring relationships among organisms.
What Sequence to Use ?


                         An example of genes in this category are
                           those that define the ribosomal RNAs
                           (rRNAs). Most prokaryotes have three
                           rRNAs, called the 5S, 16S and 23S
                           rRNA.


                          Namea    Size (nucleotides)   Location
                          5S      120     Large subunit of ribosome
                          16S      1500   Small subunit of ribosome
                          23S      2900   Large subunit of ribosome
                          a   The name is based on the rate that the
                          molecule sediments (sinks) in water.
                          Bigger molecules sediment faster than small
                            ones.
Ribosomal RNA Genes and Their Sequences


The extraordinary conservation of rRNA genes can
  be seen in these fragments of the small subunit
  rRNA gene sequences from organisms spanning
  the known diversity of life:
human                  ...GTGCCAGCAGCCGCGGTAATTCCAGCTCCAATAGCGTATATTAAAGTTGCTGCAGTTAAAAAG...

yeast                  ...GTGCCAGCAGCCGCGGTAATTCCAGCTCCAATAGCGTATATTAAAGTTGTTGCAGTTAAAAAG...

Corn                   ...GTGCCAGCAGCCGCGGTAATTCCAGCTCCAATAGCGTATATTTAAGTTGTTGCAGTTAAAAAG...

Escherichia coli        ...GTGCCAGCAGCCGCGGTAATACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCG...

Anacystis nidulans      ...GTGCCAGCAGCCGCGGTAATACGGGAGAGGCAAGCGTTATCCGGAATTATTGGGCGTAAAGCG...

Thermotoga maratima     ...GTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTACCCGGATTTACTGGGCGTAAAGGG...

Methanococcus vannielii ...GTGCCAGCAGCCGCGGTAATACCGACGGCCCGAGTGGTAGCCACTCTTATTGGGCCTAAAGCG...

Thermococcus celer      ...GTGGCAGCCGCCGCGGTAATACCGGCGGCCCGAGTGGTGGCCGCTATTATTGGGCCTAAAGCG...

Sulfolobus sulfotaricus ...GTGTCAGCCGCCGCGGTAATACCAGCTCCGCGAGTGGTCGGGGTGATTACTGGGCCTAAAGCG...
Other genes …
Molecular Clock (MC)

                  • Rate of evolution = rate of mutation
                  • rate of evolution for any macromolecule is
                    approximately constant over time (Neutral
                    Theory of evolution)
                  • For a given protein the rate of sequence
                    evolution is approximately constant across
                    lineages. Zuckerkandl and Pauling (1965)
                  • This would allow speciation and duplication
                    events to be dated accurately based on
                    molecular data
Noval trees using Hox genes
• (a) A traditional phylogenetic tree and
• (a) A traditional phylogenetic tree and
• (b) the new phylogenetic tree, each showing the
  positions of selected phyla. B, bilateria;
  AC, acoelomates; PC, pseudocoelomates;
  C, coelomates; P, protostomes; L, lophotrochozoa;
  E, ecdysozoa; D, deuterostomes.
Molecular Clock (MC)


                  • Local and approximate molecular
                    clocks more reasonable
                       – one amino acid subst. 14.5 My
                       – 1.3 10-9 substitutions/nucleotide site/year
                       – Relative rate test (see further)
                          • ((A,B),C) then measure distance between
                            (A,C) & (B,C)
Proteins evolve at highly different rates

                                       Rate of Change Theoretical Lookback Time
                                     (PAMs / 100 myrs)        (myrs)
                       Pseudogenes           400                45
                       Fibrinopeptides       90                 200
                       Lactalbumins          27                 670
                       Lysozymes             24                 850
                       Ribonucleases         21                 850
                       Haemoglobins          12                 1500
                       Acid proteases        8                  2300
                       Cytochrome c          4                  5000
                       Glyceraldehyde-P dehydrogenase2          9000
                       Glutamate dehydrogenase        1         18000


                       PAM = number of Accepted Point Mutations per 100 amino acids.
Phylogenetics
                Introduction
                    Definitions
                    Species concept
                    Examples
                    The Tree-of-life
                Phylogenetics Methodologies
                    Algorithms
                      Distance Methods
                      Maximum Likelihood
                      Maximum Parsimony
                   Rooting
                   Statistical Validation
                Conclusions
                   Orthologous genes
                   Horizontal Gene Transfer
                   Phylogenomics
                Practical Approach: PHYLIP
                Weblems
Multiple Alignment Method
4-steps


          • align
          • select method (evolutionary
            model)
            – Distance
            – ML
            – MP
          • generate tree
          • validate tree
Some definitions …
Distance matrix methods (upgma, nj, Fitch,...)

                       • Convert sequence data into a
                         set of discrete pairwise distance
                         values (n*(n-1)/2), arranged into
                         a matrix. Distance methods fit a
                         tree to this matrix.
                       • The phylogenetic topology tree
                         is constructed by using a cluster
                         analysis method (like upgma or
                         nj methods).
Distance matrix methods (upgma, nj, Fitch,...)
Distance matrix methods (upgma, nj, Fitch,...)




                                                 CGT
Distance matrix methods (upgma, nj, Fitch,...)




                          Since we start with A,p(A)=1
Distance matrix methods (upgma, nj, Fitch,...)




                        D=evolutionary distance ~ tijd
                        F = dissimilarity ~ (1 – PX(t))



                                          F~1–            d
Distance matrix methods (upgma, nj, Fitch,...)
Unweighted Pair Group Method with Arithmatic Mean (UPGMA)
Unweighted Pair Group Method with Arithmatic Mean (UPGMA)
Unweighted Pair Group Method with Arithmatic Mean (UPGMA)
Unweighted Pair Group Method with Arithmatic Mean (UPGMA)
Distance matrix methods: Summary




      http://www.bioportal.bic.nus.edu.sg/phylip/neighbor.html
Distance matrix methods (upgma, nj, Fitch,...)

                      • The phylogeny makes an estimation of
                        the distance for each pair as the sum
                        of branch lengths in the path from one
                        sequence to another through the tree.
                                  easy to perform ;
                                  quick calculation ;
                                  fit for sequences having high similarity scores ;
                      • drawbacks :
                                  the sequences are not considered as such (loss
                                  of information) ;
                                  all sites are generally equally treated (do not
                                  take into account differences of substitution
                                  rates ) ;
                                  not applicable to distantly divergent sequences.
Maximum likelihood

                     • In this method, the bases
                       (nucleotides or amino acids) of all
                       sequences at each site are
                       considered separately (as
                       independent), and the log-likelihood
                       of having these bases are computed
                       for a given topology by using a
                       particular probability model.
                     • This log-likelihood is added for all
                       sites, and the sum of the log-
                       likelihood is maximized to estimate
                       the branch length of the tree.
Maximum likelihood
Maximum likelihood

                     • This procedure is repeated for all
                       possible topologies, and the topology
                       that shows the highest likelihood is
                       chosen as the final tree.
                     • Notes :
                         ML estimates the branch lengths of the
                         final tree ;
                         ML methods are usually consistent ;
                         ML is extented to allow differences
                         between the rate of transition and
                         transversion.
                     • Drawbacks
                         need long computation time to construct a
                         tree.
Maximum likelihood
Maximum Parsimony

              Parsimony criterion
              • It consists of determining the minimum
                number of changes (substitutions) required to
                transform a sequence to its nearest neighbor.
              Maximum Parsimony
              • The maximum parsimony algorithm searches
                for the minimum number of genetic events
                (nucleotide substitutions or amino-acid
                changes) to infer the most parsimonious tree
                from a set of sequences.
Maximum Parsimony


                Occam’s Razor
Entia non sunt multiplicanda praeter necessitatem.
                                          William of Occam (1300-1349)




       The best tree is the one which requires the least number of
                               substitutions
Maximum Parsimony

                    • The best tree is the one which needs the
                      fewest changes.
                       – If the evolutionary clock is not constant, the
                         procedure generates results which can be
                         misleading ;
                       – within practical computational limits, this
                         often leads in the generation of tens or more
                         "equally most parsimonious trees" which
                         make it difficult to justify the choice of a
                         particular tree ;
                       – long computation time to construct a tree.
Maximum Parsimony: Branch Node A or B ?
Maximum Parsimony: A requires 5 mutaties
Maximum Parsimony: B (and propagating A->B) requires only 4 mutations
Maximum Parsimony

                    • The best tree is the one which
                      needs the fewest changes.
                    • Problems :
                           – If the evolutionary clock is not
                             constant, the procedure generates
                             results which can be misleading ;
                           – within practical computational
                             limits, this often leads in the
                             generation of tens or more "equally
                             most parsimonious trees" which make
                             it difficult to justify the choice of a
                             particular tree ;
                           – long computation time to construct a
                             tree.
Phylogenetics
                Introduction
                    Definitions
                    Species concept
                    Examples
                    The Tree-of-life
                Phylogenetics Methodologies
                    Algorithms
                      Distance Methods
                      Maximum Likelihood
                      Maximum Parsimony
                   Rooting
                   Statistical Validation
                Conclusions
                   Orthologous genes
                   Horizontal Gene Transfer
                   Phylogenomics
                Practical Approach: PHYLIP
                Weblems
Comparative evaluation of different methods


                       There is at present no statistical
                       methods which allow
                       comparisons of trees obtained
                       from different phylogenetic
                       methods, nevertheless many
                       studies have been made to
                       compare the relative consistency
                       of the existing methods.
Comparative evaluation of different methods

                     The consistency depends on many
                     factors, among these the topology
                     and branch lengths of the real
                     tree, the transition/transversion rate
                     and the variability of the
                     substitution rates.
                     One expects that if sequences have
                     strong phylogenetic
                     relationship, different methods will
                     show the same phylogenetic tree
Comparison of methods

               • Inconsistency
               • Neighbour Joining (NJ) is very fast but depends on
                 accurate estimates of distance. This is more
                 difficult with very divergent data
               • Parsimony suffers from Long Branch Attraction.
                 This may be a particular problem for very divergent
                 data
               • NJ can suffer from Long Branch Attraction
               • Parsimony is also computationally intensive
               • Codon usage bias can be a problem for MP and NJ
               • Maximum Likelihood is the most reliable but
                 depends on the choice of model and is very slow
               • Methods may be combined
Rooting the Tree

                   • In an unrooted tree the direction of
                     evolution is unknown
                   • The root is the hypothesized ancestor
                     of the sequences in the tree
                   • The root can either be placed on a
                     branch or at a node
                   • You should start by viewing an
                     unrooted tree
Automatic rooting

                    • Many software packages will root
                      trees automaticall (e.g. mid-point
                      rooting in NJPlot)
                    • Sometimes two trees may look very
                      different but, in fact, differ only in the
                      position of the root
                    • This normally involves assumptions…
                      BEWARE!
Rooting Using an Outgroup

                1. The outgroup should be a sequence (or set
                   of sequences) known to be less closely
                   related to the rest of the sequences than they
                   are to each other
                2. It should ideally be as closely related as
                   possible to the rest of the sequences while
                   still satisfying condition 1
                The root must be somewhere between the
                   outgroup and the rest (either on the node or
                   in a branch)
How confident am I that my tree is correct?

                Bootstrap values
                Bootstrapping is a statistical
                technique that can use random
                resampling of data to determine
                sampling error for tree topologies
Bootstrapping phylogenies


• Characters are resampled with replacement
  to create many bootstrap replicate data sets
• Each bootstrap replicate data set is analysed
  (e.g. with parsimony, distance, ML etc.)
• Agreement among the resulting trees is
  summarized with a majority-rule consensus
  tree
• Frequencies of occurrence of
  groups, bootstrap proportions (BPs), are a
  measure of support for those groups
Bootstrapping - an example

                        Ciliate SSUrDNA - parsimony bootstrap
                                                   Ochromonas (1)

                                                   Symbiodinium (2)
                                  100
                                                   Prorocentrum (3)

                                                   Euplotes (8)
                                        84
                                                   Tetrahymena (9)

                             96                    Loxodes (4)
                                             100
                                                   Tracheloraphis (5)
                                  100
                                                   Spirostomum (6)
                                             100
                                                   Gruberia (7)
                      Majority-rule consensus
Bootstrap - interpretation

             • Bootstrapping is a very valuable and widely used
               technique (it is demanded by some journals)
             • BPs give an idea of how likely a given branch
               would be to be unaffected if additional data, with
               the same distribution, became available
             • BPs are not the same as confidence intervals.
               There is no simple mapping between bootstrap
               values and confidence intervals. There is no
               agreement about what constitutes a ‘good’
               bootstrap value (> 70%, > 80%, > 85% ????)
             • Some theoretical work indicates that BPs can be a
               conservative estimate of confidence intervals
             • If the estimated tree is inconsistent all the
               bootstraps in the world won’t help you…..
Jack-knifing

               • Jack-knifing is very similar to
                 bootstrapping and differs only in the
                 character resampling strategy
               • Jack-knifing is not as widely
                 available or widely used as
                 bootstrapping
               • Tends to produce broadly similar
                 results
Statistical evaluation of the obtained phylogenetic trees

                  At present only sampling techniques allow testing the
                   topology of a phylogenetic tree
                     Bootstrapping
                  » It consists of drawing columns from a sample of
                    aligned sequences, with replacement, until one gets
                    a data set of the same size as the original one.
                    (usually some columns are sampled several times
                    others left out)
                     Half-Jacknife
                  » This technique resamples half of the sequence sites
                    considered and eliminates the rest. The final sample
                    has half the number of initial number of sites
                    without duplication.
Weblems
          W6.1: The growth hormones in most mammals have very similar ammo acid
             sequences. (The growth hormones of the Alpaca, Dog Cat Horse, Rabbit, and
             Elephant each differ from that of the Pig at no more than 3 positions out of 191.)
             Human growth hormone is very different, differing at 62 positions. The evolution of
             growth hormone accelerated sharply in the line leading to humans. By retrieving
             and aligning growth hormone sequences from species closely related to humans
             and our ancestors, determine where in the evolutionary tree leading to humans the
             accelerated evolution of growth hormone took place.
          W6.2: Humans are primates, an order that we, apes and monkeys share with lemurs
             and tarsiers. On the basis of the Beta-globin gene cluster of human, a
             chimpanzee, an old-world monkey, a new-world monkey, a lemur, and a tarsier,
             derive a phylogenetic tree of these groups.
          W6.3: Primates are mammals, a class we share with marsupials and monotremes;
             Extant marsupials live primarily in Australia, except for the opossum, found also in
             North and South America. Extant monotremes are limited to two animals from
             Australia: the platypus and echidna. Using the complete mitochondnal genome
             from human, horse (Equus caballus), wallaroo (Macropus robustus), American
             opossum (Didelphis mrgimana), and platypus (Ormthorhynchus anatmus), draw
             an evolutionary tree, indicating branch lengths. Are monotremes more closely
             related to placental mammals or to marsupials?
          W6.4: Mammals are vertebrates, a subphylum that we share with fishes, sharks, birds
             and reptiles, amphibia, and primitive jawless fishes (example: lampreys). For the
             coelacanth (Latimeria chalumnae), the great white shark (Carcharodon
             carcharias), skipjack tuna (Katsuwonus pelamis), sea lamprey (Petromyzon
             marinus), frog (Rana Ripens), and Nile crocodile (Crocodylus niloticus), using
             sequences of cytochromes c and pancreatic ribonucleases, derive evolutionary
             trees of these species.

Más contenido relacionado

Destacado

2016 bioinformatics i_score_matrices_wim_vancriekinge
2016 bioinformatics i_score_matrices_wim_vancriekinge2016 bioinformatics i_score_matrices_wim_vancriekinge
2016 bioinformatics i_score_matrices_wim_vancriekingeProf. Wim Van Criekinge
 
2016 bioinformatics i_python_part_2_strings_wim_vancriekinge
2016 bioinformatics i_python_part_2_strings_wim_vancriekinge2016 bioinformatics i_python_part_2_strings_wim_vancriekinge
2016 bioinformatics i_python_part_2_strings_wim_vancriekingeProf. Wim Van Criekinge
 
2016 bioinformatics i_phylogenetics_wim_vancriekinge
2016 bioinformatics i_phylogenetics_wim_vancriekinge2016 bioinformatics i_phylogenetics_wim_vancriekinge
2016 bioinformatics i_phylogenetics_wim_vancriekingeProf. Wim Van Criekinge
 
2016 bioinformatics i_io_wim_vancriekinge
2016 bioinformatics i_io_wim_vancriekinge2016 bioinformatics i_io_wim_vancriekinge
2016 bioinformatics i_io_wim_vancriekingeProf. Wim Van Criekinge
 
2016 bioinformatics i_database_searching_wimvancriekinge
2016 bioinformatics i_database_searching_wimvancriekinge2016 bioinformatics i_database_searching_wimvancriekinge
2016 bioinformatics i_database_searching_wimvancriekingeProf. Wim Van Criekinge
 
2016 bioinformatics i_bio_python_ii_wimvancriekinge
2016 bioinformatics i_bio_python_ii_wimvancriekinge2016 bioinformatics i_bio_python_ii_wimvancriekinge
2016 bioinformatics i_bio_python_ii_wimvancriekingeProf. Wim Van Criekinge
 
2016 bioinformatics i_proteins_wim_vancriekinge
2016 bioinformatics i_proteins_wim_vancriekinge2016 bioinformatics i_proteins_wim_vancriekinge
2016 bioinformatics i_proteins_wim_vancriekingeProf. Wim Van Criekinge
 
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekingeProf. Wim Van Criekinge
 
2016 bioinformatics i_python_part_3_io_and_strings_wim_vancriekinge
2016 bioinformatics i_python_part_3_io_and_strings_wim_vancriekinge2016 bioinformatics i_python_part_3_io_and_strings_wim_vancriekinge
2016 bioinformatics i_python_part_3_io_and_strings_wim_vancriekingeProf. Wim Van Criekinge
 
2016 bioinformatics i_bio_python_wimvancriekinge
2016 bioinformatics i_bio_python_wimvancriekinge2016 bioinformatics i_bio_python_wimvancriekinge
2016 bioinformatics i_bio_python_wimvancriekingeProf. Wim Van Criekinge
 

Destacado (10)

2016 bioinformatics i_score_matrices_wim_vancriekinge
2016 bioinformatics i_score_matrices_wim_vancriekinge2016 bioinformatics i_score_matrices_wim_vancriekinge
2016 bioinformatics i_score_matrices_wim_vancriekinge
 
2016 bioinformatics i_python_part_2_strings_wim_vancriekinge
2016 bioinformatics i_python_part_2_strings_wim_vancriekinge2016 bioinformatics i_python_part_2_strings_wim_vancriekinge
2016 bioinformatics i_python_part_2_strings_wim_vancriekinge
 
2016 bioinformatics i_phylogenetics_wim_vancriekinge
2016 bioinformatics i_phylogenetics_wim_vancriekinge2016 bioinformatics i_phylogenetics_wim_vancriekinge
2016 bioinformatics i_phylogenetics_wim_vancriekinge
 
2016 bioinformatics i_io_wim_vancriekinge
2016 bioinformatics i_io_wim_vancriekinge2016 bioinformatics i_io_wim_vancriekinge
2016 bioinformatics i_io_wim_vancriekinge
 
2016 bioinformatics i_database_searching_wimvancriekinge
2016 bioinformatics i_database_searching_wimvancriekinge2016 bioinformatics i_database_searching_wimvancriekinge
2016 bioinformatics i_database_searching_wimvancriekinge
 
2016 bioinformatics i_bio_python_ii_wimvancriekinge
2016 bioinformatics i_bio_python_ii_wimvancriekinge2016 bioinformatics i_bio_python_ii_wimvancriekinge
2016 bioinformatics i_bio_python_ii_wimvancriekinge
 
2016 bioinformatics i_proteins_wim_vancriekinge
2016 bioinformatics i_proteins_wim_vancriekinge2016 bioinformatics i_proteins_wim_vancriekinge
2016 bioinformatics i_proteins_wim_vancriekinge
 
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
 
2016 bioinformatics i_python_part_3_io_and_strings_wim_vancriekinge
2016 bioinformatics i_python_part_3_io_and_strings_wim_vancriekinge2016 bioinformatics i_python_part_3_io_and_strings_wim_vancriekinge
2016 bioinformatics i_python_part_3_io_and_strings_wim_vancriekinge
 
2016 bioinformatics i_bio_python_wimvancriekinge
2016 bioinformatics i_bio_python_wimvancriekinge2016 bioinformatics i_bio_python_wimvancriekinge
2016 bioinformatics i_bio_python_wimvancriekinge
 

Similar a Bioinformatica t6-phylogenetics

Bioinformatica 24-11-2011-t6-phylogenetics
Bioinformatica 24-11-2011-t6-phylogeneticsBioinformatica 24-11-2011-t6-phylogenetics
Bioinformatica 24-11-2011-t6-phylogeneticsProf. Wim Van Criekinge
 
2015 bioinformatics phylogenetics_wim_vancriekinge
2015 bioinformatics phylogenetics_wim_vancriekinge2015 bioinformatics phylogenetics_wim_vancriekinge
2015 bioinformatics phylogenetics_wim_vancriekingeProf. Wim Van Criekinge
 
Report on Phylogenetic tree
Report on Phylogenetic treeReport on Phylogenetic tree
Report on Phylogenetic treeSanzid Kawsar
 
Taxonomy n Systematics 2
Taxonomy n Systematics 2Taxonomy n Systematics 2
Taxonomy n Systematics 2Hamid Ur-Rahman
 
Animal Systematics Lecture 2
Animal Systematics Lecture 2Animal Systematics Lecture 2
Animal Systematics Lecture 2Hamid Ur-Rahman
 
Topic 7 Phylogeny.ppt
Topic 7 Phylogeny.pptTopic 7 Phylogeny.ppt
Topic 7 Phylogeny.pptNanda508387
 
Nucleotide Groupings
Nucleotide GroupingsNucleotide Groupings
Nucleotide GroupingsKara Bell
 
Phylogenetic analyses1
Phylogenetic analyses1Phylogenetic analyses1
Phylogenetic analyses1Satyam Sonker
 
Biodiversity electurespecies and DNA Barcoding handout.pptx
Biodiversity electurespecies and DNA Barcoding handout.pptxBiodiversity electurespecies and DNA Barcoding handout.pptx
Biodiversity electurespecies and DNA Barcoding handout.pptxRIZWANALI245
 
Recent Trends in Evolutionary Biology by Pratanu Banerjee - Anthropology Teacher
Recent Trends in Evolutionary Biology by Pratanu Banerjee - Anthropology TeacherRecent Trends in Evolutionary Biology by Pratanu Banerjee - Anthropology Teacher
Recent Trends in Evolutionary Biology by Pratanu Banerjee - Anthropology Teacherpratanubanerjee3
 
Phylogeny & classification
Phylogeny & classificationPhylogeny & classification
Phylogeny & classificationilanasaxe
 
Basics of constructing Phylogenetic tree.ppt
Basics of constructing Phylogenetic tree.pptBasics of constructing Phylogenetic tree.ppt
Basics of constructing Phylogenetic tree.pptSehrishSarfraz2
 
AnMicro-TBRC Seminar on Phylogenetic Analysis (EP.1)
AnMicro-TBRC Seminar on Phylogenetic Analysis (EP.1)AnMicro-TBRC Seminar on Phylogenetic Analysis (EP.1)
AnMicro-TBRC Seminar on Phylogenetic Analysis (EP.1)Somsak Likhitrattanapisal
 
CATCH Y TITLE WH AT AM I D O ING IN TH IS PAPE.docx
CATCH Y TITLE  WH AT AM I D O ING  IN TH IS  PAPE.docxCATCH Y TITLE  WH AT AM I D O ING  IN TH IS  PAPE.docx
CATCH Y TITLE WH AT AM I D O ING IN TH IS PAPE.docxcravennichole326
 

Similar a Bioinformatica t6-phylogenetics (20)

Bioinformatica 24-11-2011-t6-phylogenetics
Bioinformatica 24-11-2011-t6-phylogeneticsBioinformatica 24-11-2011-t6-phylogenetics
Bioinformatica 24-11-2011-t6-phylogenetics
 
Bioinformatics t6-phylogenetics v2014
Bioinformatics t6-phylogenetics v2014Bioinformatics t6-phylogenetics v2014
Bioinformatics t6-phylogenetics v2014
 
2015 bioinformatics phylogenetics_wim_vancriekinge
2015 bioinformatics phylogenetics_wim_vancriekinge2015 bioinformatics phylogenetics_wim_vancriekinge
2015 bioinformatics phylogenetics_wim_vancriekinge
 
Report on Phylogenetic tree
Report on Phylogenetic treeReport on Phylogenetic tree
Report on Phylogenetic tree
 
Phylogeny-Abida.pptx
Phylogeny-Abida.pptxPhylogeny-Abida.pptx
Phylogeny-Abida.pptx
 
Diabetes
DiabetesDiabetes
Diabetes
 
Taxonomy n Systematics 2
Taxonomy n Systematics 2Taxonomy n Systematics 2
Taxonomy n Systematics 2
 
Animal Systematics Lecture 2
Animal Systematics Lecture 2Animal Systematics Lecture 2
Animal Systematics Lecture 2
 
Topic 7 Phylogeny.ppt
Topic 7 Phylogeny.pptTopic 7 Phylogeny.ppt
Topic 7 Phylogeny.ppt
 
phylogenetic tree.pptx
phylogenetic tree.pptxphylogenetic tree.pptx
phylogenetic tree.pptx
 
Nucleotide Groupings
Nucleotide GroupingsNucleotide Groupings
Nucleotide Groupings
 
Phylogenetic analyses1
Phylogenetic analyses1Phylogenetic analyses1
Phylogenetic analyses1
 
Biodiversity electurespecies and DNA Barcoding handout.pptx
Biodiversity electurespecies and DNA Barcoding handout.pptxBiodiversity electurespecies and DNA Barcoding handout.pptx
Biodiversity electurespecies and DNA Barcoding handout.pptx
 
Evolutionary biology
Evolutionary biologyEvolutionary biology
Evolutionary biology
 
11u bio div 04
11u bio div 0411u bio div 04
11u bio div 04
 
Recent Trends in Evolutionary Biology by Pratanu Banerjee - Anthropology Teacher
Recent Trends in Evolutionary Biology by Pratanu Banerjee - Anthropology TeacherRecent Trends in Evolutionary Biology by Pratanu Banerjee - Anthropology Teacher
Recent Trends in Evolutionary Biology by Pratanu Banerjee - Anthropology Teacher
 
Phylogeny & classification
Phylogeny & classificationPhylogeny & classification
Phylogeny & classification
 
Basics of constructing Phylogenetic tree.ppt
Basics of constructing Phylogenetic tree.pptBasics of constructing Phylogenetic tree.ppt
Basics of constructing Phylogenetic tree.ppt
 
AnMicro-TBRC Seminar on Phylogenetic Analysis (EP.1)
AnMicro-TBRC Seminar on Phylogenetic Analysis (EP.1)AnMicro-TBRC Seminar on Phylogenetic Analysis (EP.1)
AnMicro-TBRC Seminar on Phylogenetic Analysis (EP.1)
 
CATCH Y TITLE WH AT AM I D O ING IN TH IS PAPE.docx
CATCH Y TITLE  WH AT AM I D O ING  IN TH IS  PAPE.docxCATCH Y TITLE  WH AT AM I D O ING  IN TH IS  PAPE.docx
CATCH Y TITLE WH AT AM I D O ING IN TH IS PAPE.docx
 

Más de Prof. Wim Van Criekinge

2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_upload2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_uploadProf. Wim Van Criekinge
 
2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_uploadProf. Wim Van Criekinge
 
2019 03 05_biological_databases_part3_v_upload
2019 03 05_biological_databases_part3_v_upload2019 03 05_biological_databases_part3_v_upload
2019 03 05_biological_databases_part3_v_uploadProf. Wim Van Criekinge
 
2019 02 21_biological_databases_part2_v_upload
2019 02 21_biological_databases_part2_v_upload2019 02 21_biological_databases_part2_v_upload
2019 02 21_biological_databases_part2_v_uploadProf. Wim Van Criekinge
 
2019 02 12_biological_databases_part1_v_upload
2019 02 12_biological_databases_part1_v_upload2019 02 12_biological_databases_part1_v_upload
2019 02 12_biological_databases_part1_v_uploadProf. Wim Van Criekinge
 
Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]Prof. Wim Van Criekinge
 
2018 03 27_biological_databases_part4_v_upload
2018 03 27_biological_databases_part4_v_upload2018 03 27_biological_databases_part4_v_upload
2018 03 27_biological_databases_part4_v_uploadProf. Wim Van Criekinge
 
2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_upload2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_uploadProf. Wim Van Criekinge
 
2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_uploadProf. Wim Van Criekinge
 

Más de Prof. Wim Van Criekinge (20)

2020 02 11_biological_databases_part1
2020 02 11_biological_databases_part12020 02 11_biological_databases_part1
2020 02 11_biological_databases_part1
 
2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_upload2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_upload
 
2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload
 
2019 03 05_biological_databases_part3_v_upload
2019 03 05_biological_databases_part3_v_upload2019 03 05_biological_databases_part3_v_upload
2019 03 05_biological_databases_part3_v_upload
 
2019 02 21_biological_databases_part2_v_upload
2019 02 21_biological_databases_part2_v_upload2019 02 21_biological_databases_part2_v_upload
2019 02 21_biological_databases_part2_v_upload
 
2019 02 12_biological_databases_part1_v_upload
2019 02 12_biological_databases_part1_v_upload2019 02 12_biological_databases_part1_v_upload
2019 02 12_biological_databases_part1_v_upload
 
P7 2018 biopython3
P7 2018 biopython3P7 2018 biopython3
P7 2018 biopython3
 
P6 2018 biopython2b
P6 2018 biopython2bP6 2018 biopython2b
P6 2018 biopython2b
 
P4 2018 io_functions
P4 2018 io_functionsP4 2018 io_functions
P4 2018 io_functions
 
P3 2018 python_regexes
P3 2018 python_regexesP3 2018 python_regexes
P3 2018 python_regexes
 
T1 2018 bioinformatics
T1 2018 bioinformaticsT1 2018 bioinformatics
T1 2018 bioinformatics
 
P1 2018 python
P1 2018 pythonP1 2018 python
P1 2018 python
 
Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]
 
2018 05 08_biological_databases_no_sql
2018 05 08_biological_databases_no_sql2018 05 08_biological_databases_no_sql
2018 05 08_biological_databases_no_sql
 
2018 03 27_biological_databases_part4_v_upload
2018 03 27_biological_databases_part4_v_upload2018 03 27_biological_databases_part4_v_upload
2018 03 27_biological_databases_part4_v_upload
 
2018 03 20_biological_databases_part3
2018 03 20_biological_databases_part32018 03 20_biological_databases_part3
2018 03 20_biological_databases_part3
 
2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_upload2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_upload
 
2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload
 
P7 2017 biopython3
P7 2017 biopython3P7 2017 biopython3
P7 2017 biopython3
 
P6 2017 biopython2
P6 2017 biopython2P6 2017 biopython2
P6 2017 biopython2
 

Último

_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersChitralekhaTherkar
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptxPoojaSen20
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 

Último (20)

_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of Powders
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptx
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 

Bioinformatica t6-phylogenetics

  • 1.
  • 2. FBW 6-11-2012 Wim Van Criekinge
  • 4.
  • 5. Phylogenetics Introduction Definitions Species concept Examples The Tree-of-life Phylogenetics Methodologies Algorithms Distance Methods Maximum Likelihood Maximum Parsimony Rooting Statistical Validation Conclusions Orthologous genes Horizontal Gene Transfer Phylogenomics Practical Approach: PHYLIP Weblems
  • 6. What is phylogenetics ? Phylogeny (phylo =tribe + genesis) Phylogenetic trees are about visualising evolutionary relationships. They reconstruct the pattern of events that have led to the distribution and diversity of life. The purpose of a phylogenetic tree is to illustrate how a group of objects (usually genes or organisms) are related to one another Nothing in Biology Makes Sense Except in the Light of Evolution. Theodosius Dobzhansky (1900-1975)
  • 7. Trees • Diagram consisting of branches and nodes • Species tree (how are my species related?) – contains only one representative from each species. – all nodes indicate speciation events • Gene tree (how are my genes related?) – normally contains a number of genes from a single species – nodes relate either to speciation or gene duplication events
  • 8. Clade: A set of species which includes all of the species derived from a single common ancestor
  • 9. S p e c ie s C o n c e p ts from V a rio u s A u th o rs D .A . B a um a nd K .L . S ha w - E x c lu s iv e g rou p s o f org a n ism s, w h ere a n ex c lu s iv e g rou p is o ne w h ose m e m b ers are a ll m ore c lose ly re la ted to ea c h oth er th a n to a n y org a n is m s ou ts id e the g rou p . J . C ra cra ft - A n irred u c ib le c lu ster o f org a n ism s, d iag n osab ly d is tin ct fro m oth er su c h c lu sters, a nd w ith in w h ic h there is a p are n ta l p a ttern o f a nc estry a nd d esce n t. C ha rles D a rw in - "F rom these rem arks it w ill b e se e n th at I lo o k a t th e term sp e c ies, as o n e arb itrarily g iv e n for the sa k e o f c o n v e n ie nc e to a set o f in d iv id u a ls c lose ly rese m b lin g e ac h o ther, a nd th a t it d oes n ot essen tia lly d iffer from the term varie ty, w h ic h is g iv e n to l ess d istin ct a nd m ore flu c tu a ting form s. T he term varie ty, ag a in, in c o m p aris o n w ith m ere in d iv id u a l d iffere n ces, is a ls o a p p lied arb itrarily, a n d for m ere c o n ve n ie n ce sa k e " (O rig in o f S p ec ies, 1 st ed., p . 1 0 8 ). T . D o b zha nsk y - T h e larg est a nd m ost in c lu s ive rep rod u ctiv e c om m u n ity o f sex u a l a nd cross-fertiliz ing in d iv id u a ls w h ic h sh are a c o m m o n g e ne p o o l. A nd la ter...S ys te m s o f p op u la tio ns, th e g e n e ex c ha ng e b e tw ee n w h ic h is lim ited or p re v e nted b y rep rod u ctiv e is o la ting m e c h a n is m s. M . G hise lin - T h e m ost ex te ns ive u n its in the n atu ra l e c o n om y, su c h tha t rep r od u ctiv e c om p etitio n oc cu rs am o ng th e ir p arts. D .M . L a m b ert - G rou p s o f ind iv id u a ls th at d e fin e th em se lv es b y a sp e c ific m a te rec og n itio n s ystem . J . M a llet - Id e ntifia b le g e n o typ ic c lu sters re c og n iz e d b y a d e fic it o f in term ed iates, b o th a t s ing le lo c i a n d at m u ltip le lo c i. E . M a y r - G rou p s o f ac tu a lly or p o te n tia lly in terb ree d ing na tu ra l p op u lat io ns w h ic h are rep rod u ctiv e ly is o la ted fro m oth er su c h g rou p s. C .D . M ich en er - A g rou p o f org a n is m s n o t itse lf d iv is ib le b y p he n etic g ap s resu ltin g from c o nc ord a nt d iffere n ces in c harac ter states (ex c ep t for m orp hs - su ch as sex , ag e, or caste), b u t sep ara ted b y su ch p h e ne tic g ap s from o ther su c h u n its. H .E .H . P a tte rso n - T h at m ost in c lu s iv e p op u latio n o f in d iv id u a l b ip are n ta l org a n is m s w h ic h sh are a c o m m o n fertiliz atio n s ystem . G .G . S im p so n - A lin eag e o f p op u latio ns e v o lv in g w ith tim e, sep arate ly fro m ot h ers, w ith its ow n u n iq u e e v o lu tio n ary ro le a nd te nd e n c ies. P .H .A . S nea th a nd R .R . S o k a l - T he sm a llest (m ost h o m og e n e ou s) c lu ster th at ca n b e re c og n iz ed u p o n s o m e g iv e n criterio n as b e in g d istin c t fro m oth er c lu sters. A .R . T em p leto n - T h e m ost in c lu s ive p op u la tio n o f in d iv id u a ls h a v ing the p o te n ti a l for p h e n otyp ic c o he sio n throu g h in trins ic c o h es io n m e c ha n is m s (g e ne tic a nd /or d e m og rap h ic - i.e. ec o lo g ic a l -ex c h a ng e ab ility). E .O . W iley - A s ing le lin e ag e o f a nc estor -d esc e nd a n t p op u latio ns w h ic h m a in ta ins its id e ntity fro m oth er s u ch lin e ag es a nd w h ic h h as its ow n e v o lu tio n ary te nd e n c ies a nd h istoric a l fate. S . W rig ht - A sp ec ies in tim e a nd sp ac e is c o m p ose d o f nu m erou s lo ca l p op u latio ns, ea c h o ne in terc o m m u n ica ting a nd in terg rad ing w ith oth ers.
  • 10. Species I. Definitions: Species = the basic unit of classification > Three different ways to recognize species:
  • 11. Plant Species Definitions: > Three different ways to recognize species: 1) Morphological species = the smallest group that is consistently and persistently distinct (Clusters in morphospace) species are recognized initially on the basis of appearance; the individuals of one species look different from the individuals of another
  • 12. Species Definitions: > Three different ways to recognize species: 2) Biological species = a set of interbreeding or potentially interbreeding individuals that are separated from other species by reproductive barriers species are unable to interbreed
  • 13. Species Definitions: > Three different ways to recognize species: 3) Phylogenetic species = the boundary between reticulate (among interbreeding individuals) and divergent relationships (between lineages with no gene exchange)
  • 14. Phylogenetic species divergent boundary reticulate recognized by the pattern of ancestor - descendent relationships
  • 15. Species Definitions: > Three different ways to recognize species: 4) Phylogenomics species = ability to transmit (and maintain) a (stable) gene pool Adresses the Anopheles genome topology variations
  • 16. Branching Order in a Phylogenetic Tree • In the tree to the left, A and B share the most recent common ancestry. Thus, of the species in the tree, A and B are the most closely related. • The next most recent common ancestry is C with the group composed of A and B. Notice that the relationship of C is with the group containing A and B. In particular, C is not more closely related to B than to A. This can be emphasized by the following two trees, which are equivalent to each other:
  • 17. More definitions … Edge, Branch Leafs Tips external node Branch node, internal node • A common simplifying assumption is that the three is bifurcating, meaning that each brach node has exactly two descendents. • The edges, taken together, are sometimes said to define the topology of the tree
  • 18. Outgroups, rooted versus unrooted An unrooted reptilian phylogeny with an avian outgroup and the corresponding rooted phylogeny. The Ri represent modern reptiles; the Ai, inferred ancestors and the B a bird.
  • 20. Examples Phylogenetic methods may be used to solve crimes, test purity of products, and determine whether endangered species have been smuggled or mislabeled: – Vogel, G. 1998. HIV strain analysis debuts in murder trial. Science 282(5390): 851-853. – Lau, D. T.-W., et al. 2001. Authentication of medicinal Dendrobium species by the internal transcribed spacer of ribosomal DNA. Planta Med 67:456-460.
  • 21.
  • 22. Examples – Epidemiologists use phylogenetic methods to understand the development of pandemics, patterns of disease transmission, and development of antimicrobial resistance or pathogenicity: • Basler, C.F., et al. 2001. Sequence of the 1918 pandemic influenza virus nonstructural gene (NS) segment and characterization of recombinant viruses bearing the 1918 NS genes. PNAS, 98(5):2746-2751. • Ou, C.-Y., et al. 1992. Molecular epidemiology of HIV transmission in a dental practice. Science 256(5060):1165-1171. • Bacillus Antracis:
  • 23.
  • 24. Examples • Conservation biologists may use these techniques to determine which populations are in greatest need of protection, and other questions of population structure: – Trepanier, T.L., and R.W. Murphy. 2001. The Coachella Valley fringe-toed lizard (Uma inornata): genetic diversity and phylogenetic relationships of an endangered species. Mol Phylogenet Evol 18(3):327-334. – Alves, M.J., et al. 2001. Mitochondrial DNA variation in the highly endangered cyprinid fish Anaecypris hispanica: importance for conservation. Heredity 87(Pt 4):463-473. • Pharmaceutical researchers may use phylogenetic methods to determine which species are most closely related to other medicinal species, thus perhaps sharing their medicinal qualities: – Komatsu, K., et al. 2001. Phylogenetic analysis based on 18S rRNA gene and matK gene sequences of Panax vietnamensis and five related species. Planta Med 67:461-465.
  • 26. Some Important Dates in History Origin of the Universe 15 billion yrs Formation of the Solar System 4.6 " First Self-replicating System 3.5 " Prokaryotic-Eukaryotic Divergence 2.0 " Plant-Animal Divergence 1.0 " Invertebrate-Vertebrate Divergence 0.5 " Mammalian Radiation Beginning 0.1 "
  • 31. What Sequence to Use ? • To infer relationships that span the diversity of known life, it is necessary to look at genes conserved through the billions of years of evolutionary divergence. • The gene must display an appropriate level of sequence conservation for the divergences of interest. .
  • 32. What Sequence to Use ? • If there is too much change, then the sequences become randomized, and there is a limit to the depth of the divergences that can be accurately inferred. • If there is too little change (if the gene is too conserved), then there may be little or no change between the evolutionary branchings of interest, and it will not be possible to infer close (genus or species level) relationships.
  • 33. Ribosomal RNA Genes and Their Sequences recognized the full potential of rRNA sequences as a measure of phylogenetic relatedness. He initially used an RNA sequencing method that determined about 1/4 of the nucleotides in the 16S rRNA (the best technology available at the time). This amount of data greatly exceeded anything else then available. Using newer methods, it is now routine to determine the Carl Woese sequence of the entire 16S rRNA molecule. Today, the accumulated 16S rRNA sequences (about 10,000) constitute the largest body of data available for inferring relationships among organisms.
  • 34. What Sequence to Use ? An example of genes in this category are those that define the ribosomal RNAs (rRNAs). Most prokaryotes have three rRNAs, called the 5S, 16S and 23S rRNA. Namea Size (nucleotides) Location 5S 120 Large subunit of ribosome 16S 1500 Small subunit of ribosome 23S 2900 Large subunit of ribosome a The name is based on the rate that the molecule sediments (sinks) in water. Bigger molecules sediment faster than small ones.
  • 35. Ribosomal RNA Genes and Their Sequences The extraordinary conservation of rRNA genes can be seen in these fragments of the small subunit rRNA gene sequences from organisms spanning the known diversity of life: human ...GTGCCAGCAGCCGCGGTAATTCCAGCTCCAATAGCGTATATTAAAGTTGCTGCAGTTAAAAAG... yeast ...GTGCCAGCAGCCGCGGTAATTCCAGCTCCAATAGCGTATATTAAAGTTGTTGCAGTTAAAAAG... Corn ...GTGCCAGCAGCCGCGGTAATTCCAGCTCCAATAGCGTATATTTAAGTTGTTGCAGTTAAAAAG... Escherichia coli ...GTGCCAGCAGCCGCGGTAATACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCG... Anacystis nidulans ...GTGCCAGCAGCCGCGGTAATACGGGAGAGGCAAGCGTTATCCGGAATTATTGGGCGTAAAGCG... Thermotoga maratima ...GTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTACCCGGATTTACTGGGCGTAAAGGG... Methanococcus vannielii ...GTGCCAGCAGCCGCGGTAATACCGACGGCCCGAGTGGTAGCCACTCTTATTGGGCCTAAAGCG... Thermococcus celer ...GTGGCAGCCGCCGCGGTAATACCGGCGGCCCGAGTGGTGGCCGCTATTATTGGGCCTAAAGCG... Sulfolobus sulfotaricus ...GTGTCAGCCGCCGCGGTAATACCAGCTCCGCGAGTGGTCGGGGTGATTACTGGGCCTAAAGCG...
  • 37. Molecular Clock (MC) • Rate of evolution = rate of mutation • rate of evolution for any macromolecule is approximately constant over time (Neutral Theory of evolution) • For a given protein the rate of sequence evolution is approximately constant across lineages. Zuckerkandl and Pauling (1965) • This would allow speciation and duplication events to be dated accurately based on molecular data
  • 38. Noval trees using Hox genes
  • 39. • (a) A traditional phylogenetic tree and
  • 40. • (a) A traditional phylogenetic tree and • (b) the new phylogenetic tree, each showing the positions of selected phyla. B, bilateria; AC, acoelomates; PC, pseudocoelomates; C, coelomates; P, protostomes; L, lophotrochozoa; E, ecdysozoa; D, deuterostomes.
  • 41. Molecular Clock (MC) • Local and approximate molecular clocks more reasonable – one amino acid subst. 14.5 My – 1.3 10-9 substitutions/nucleotide site/year – Relative rate test (see further) • ((A,B),C) then measure distance between (A,C) & (B,C)
  • 42. Proteins evolve at highly different rates Rate of Change Theoretical Lookback Time (PAMs / 100 myrs) (myrs) Pseudogenes 400 45 Fibrinopeptides 90 200 Lactalbumins 27 670 Lysozymes 24 850 Ribonucleases 21 850 Haemoglobins 12 1500 Acid proteases 8 2300 Cytochrome c 4 5000 Glyceraldehyde-P dehydrogenase2 9000 Glutamate dehydrogenase 1 18000 PAM = number of Accepted Point Mutations per 100 amino acids.
  • 43. Phylogenetics Introduction Definitions Species concept Examples The Tree-of-life Phylogenetics Methodologies Algorithms Distance Methods Maximum Likelihood Maximum Parsimony Rooting Statistical Validation Conclusions Orthologous genes Horizontal Gene Transfer Phylogenomics Practical Approach: PHYLIP Weblems
  • 45. 4-steps • align • select method (evolutionary model) – Distance – ML – MP • generate tree • validate tree
  • 46.
  • 48. Distance matrix methods (upgma, nj, Fitch,...) • Convert sequence data into a set of discrete pairwise distance values (n*(n-1)/2), arranged into a matrix. Distance methods fit a tree to this matrix. • The phylogenetic topology tree is constructed by using a cluster analysis method (like upgma or nj methods).
  • 49.
  • 50.
  • 51.
  • 52. Distance matrix methods (upgma, nj, Fitch,...)
  • 53. Distance matrix methods (upgma, nj, Fitch,...) CGT
  • 54. Distance matrix methods (upgma, nj, Fitch,...) Since we start with A,p(A)=1
  • 55. Distance matrix methods (upgma, nj, Fitch,...) D=evolutionary distance ~ tijd F = dissimilarity ~ (1 – PX(t)) F~1– d
  • 56. Distance matrix methods (upgma, nj, Fitch,...)
  • 57.
  • 58. Unweighted Pair Group Method with Arithmatic Mean (UPGMA)
  • 59. Unweighted Pair Group Method with Arithmatic Mean (UPGMA)
  • 60. Unweighted Pair Group Method with Arithmatic Mean (UPGMA)
  • 61. Unweighted Pair Group Method with Arithmatic Mean (UPGMA)
  • 62. Distance matrix methods: Summary http://www.bioportal.bic.nus.edu.sg/phylip/neighbor.html
  • 63. Distance matrix methods (upgma, nj, Fitch,...) • The phylogeny makes an estimation of the distance for each pair as the sum of branch lengths in the path from one sequence to another through the tree. easy to perform ; quick calculation ; fit for sequences having high similarity scores ; • drawbacks : the sequences are not considered as such (loss of information) ; all sites are generally equally treated (do not take into account differences of substitution rates ) ; not applicable to distantly divergent sequences.
  • 64.
  • 65.
  • 66. Maximum likelihood • In this method, the bases (nucleotides or amino acids) of all sequences at each site are considered separately (as independent), and the log-likelihood of having these bases are computed for a given topology by using a particular probability model. • This log-likelihood is added for all sites, and the sum of the log- likelihood is maximized to estimate the branch length of the tree.
  • 68. Maximum likelihood • This procedure is repeated for all possible topologies, and the topology that shows the highest likelihood is chosen as the final tree. • Notes : ML estimates the branch lengths of the final tree ; ML methods are usually consistent ; ML is extented to allow differences between the rate of transition and transversion. • Drawbacks need long computation time to construct a tree.
  • 70.
  • 71. Maximum Parsimony Parsimony criterion • It consists of determining the minimum number of changes (substitutions) required to transform a sequence to its nearest neighbor. Maximum Parsimony • The maximum parsimony algorithm searches for the minimum number of genetic events (nucleotide substitutions or amino-acid changes) to infer the most parsimonious tree from a set of sequences.
  • 72. Maximum Parsimony Occam’s Razor Entia non sunt multiplicanda praeter necessitatem. William of Occam (1300-1349) The best tree is the one which requires the least number of substitutions
  • 73. Maximum Parsimony • The best tree is the one which needs the fewest changes. – If the evolutionary clock is not constant, the procedure generates results which can be misleading ; – within practical computational limits, this often leads in the generation of tens or more "equally most parsimonious trees" which make it difficult to justify the choice of a particular tree ; – long computation time to construct a tree.
  • 74.
  • 75.
  • 76.
  • 77.
  • 78.
  • 79.
  • 80.
  • 81. Maximum Parsimony: Branch Node A or B ?
  • 82. Maximum Parsimony: A requires 5 mutaties
  • 83. Maximum Parsimony: B (and propagating A->B) requires only 4 mutations
  • 84. Maximum Parsimony • The best tree is the one which needs the fewest changes. • Problems : – If the evolutionary clock is not constant, the procedure generates results which can be misleading ; – within practical computational limits, this often leads in the generation of tens or more "equally most parsimonious trees" which make it difficult to justify the choice of a particular tree ; – long computation time to construct a tree.
  • 85. Phylogenetics Introduction Definitions Species concept Examples The Tree-of-life Phylogenetics Methodologies Algorithms Distance Methods Maximum Likelihood Maximum Parsimony Rooting Statistical Validation Conclusions Orthologous genes Horizontal Gene Transfer Phylogenomics Practical Approach: PHYLIP Weblems
  • 86. Comparative evaluation of different methods There is at present no statistical methods which allow comparisons of trees obtained from different phylogenetic methods, nevertheless many studies have been made to compare the relative consistency of the existing methods.
  • 87. Comparative evaluation of different methods The consistency depends on many factors, among these the topology and branch lengths of the real tree, the transition/transversion rate and the variability of the substitution rates. One expects that if sequences have strong phylogenetic relationship, different methods will show the same phylogenetic tree
  • 88. Comparison of methods • Inconsistency • Neighbour Joining (NJ) is very fast but depends on accurate estimates of distance. This is more difficult with very divergent data • Parsimony suffers from Long Branch Attraction. This may be a particular problem for very divergent data • NJ can suffer from Long Branch Attraction • Parsimony is also computationally intensive • Codon usage bias can be a problem for MP and NJ • Maximum Likelihood is the most reliable but depends on the choice of model and is very slow • Methods may be combined
  • 89. Rooting the Tree • In an unrooted tree the direction of evolution is unknown • The root is the hypothesized ancestor of the sequences in the tree • The root can either be placed on a branch or at a node • You should start by viewing an unrooted tree
  • 90. Automatic rooting • Many software packages will root trees automaticall (e.g. mid-point rooting in NJPlot) • Sometimes two trees may look very different but, in fact, differ only in the position of the root • This normally involves assumptions… BEWARE!
  • 91. Rooting Using an Outgroup 1. The outgroup should be a sequence (or set of sequences) known to be less closely related to the rest of the sequences than they are to each other 2. It should ideally be as closely related as possible to the rest of the sequences while still satisfying condition 1 The root must be somewhere between the outgroup and the rest (either on the node or in a branch)
  • 92. How confident am I that my tree is correct? Bootstrap values Bootstrapping is a statistical technique that can use random resampling of data to determine sampling error for tree topologies
  • 93. Bootstrapping phylogenies • Characters are resampled with replacement to create many bootstrap replicate data sets • Each bootstrap replicate data set is analysed (e.g. with parsimony, distance, ML etc.) • Agreement among the resulting trees is summarized with a majority-rule consensus tree • Frequencies of occurrence of groups, bootstrap proportions (BPs), are a measure of support for those groups
  • 94. Bootstrapping - an example Ciliate SSUrDNA - parsimony bootstrap Ochromonas (1) Symbiodinium (2) 100 Prorocentrum (3) Euplotes (8) 84 Tetrahymena (9) 96 Loxodes (4) 100 Tracheloraphis (5) 100 Spirostomum (6) 100 Gruberia (7) Majority-rule consensus
  • 95. Bootstrap - interpretation • Bootstrapping is a very valuable and widely used technique (it is demanded by some journals) • BPs give an idea of how likely a given branch would be to be unaffected if additional data, with the same distribution, became available • BPs are not the same as confidence intervals. There is no simple mapping between bootstrap values and confidence intervals. There is no agreement about what constitutes a ‘good’ bootstrap value (> 70%, > 80%, > 85% ????) • Some theoretical work indicates that BPs can be a conservative estimate of confidence intervals • If the estimated tree is inconsistent all the bootstraps in the world won’t help you…..
  • 96. Jack-knifing • Jack-knifing is very similar to bootstrapping and differs only in the character resampling strategy • Jack-knifing is not as widely available or widely used as bootstrapping • Tends to produce broadly similar results
  • 97. Statistical evaluation of the obtained phylogenetic trees At present only sampling techniques allow testing the topology of a phylogenetic tree Bootstrapping » It consists of drawing columns from a sample of aligned sequences, with replacement, until one gets a data set of the same size as the original one. (usually some columns are sampled several times others left out) Half-Jacknife » This technique resamples half of the sequence sites considered and eliminates the rest. The final sample has half the number of initial number of sites without duplication.
  • 98. Weblems W6.1: The growth hormones in most mammals have very similar ammo acid sequences. (The growth hormones of the Alpaca, Dog Cat Horse, Rabbit, and Elephant each differ from that of the Pig at no more than 3 positions out of 191.) Human growth hormone is very different, differing at 62 positions. The evolution of growth hormone accelerated sharply in the line leading to humans. By retrieving and aligning growth hormone sequences from species closely related to humans and our ancestors, determine where in the evolutionary tree leading to humans the accelerated evolution of growth hormone took place. W6.2: Humans are primates, an order that we, apes and monkeys share with lemurs and tarsiers. On the basis of the Beta-globin gene cluster of human, a chimpanzee, an old-world monkey, a new-world monkey, a lemur, and a tarsier, derive a phylogenetic tree of these groups. W6.3: Primates are mammals, a class we share with marsupials and monotremes; Extant marsupials live primarily in Australia, except for the opossum, found also in North and South America. Extant monotremes are limited to two animals from Australia: the platypus and echidna. Using the complete mitochondnal genome from human, horse (Equus caballus), wallaroo (Macropus robustus), American opossum (Didelphis mrgimana), and platypus (Ormthorhynchus anatmus), draw an evolutionary tree, indicating branch lengths. Are monotremes more closely related to placental mammals or to marsupials? W6.4: Mammals are vertebrates, a subphylum that we share with fishes, sharks, birds and reptiles, amphibia, and primitive jawless fishes (example: lampreys). For the coelacanth (Latimeria chalumnae), the great white shark (Carcharodon carcharias), skipjack tuna (Katsuwonus pelamis), sea lamprey (Petromyzon marinus), frog (Rana Ripens), and Nile crocodile (Crocodylus niloticus), using sequences of cytochromes c and pancreatic ribonucleases, derive evolutionary trees of these species.