SlideShare a Scribd company logo
1 of 49
Bio305 Bacterial Genome Annotation and Analysis Professor Mark Pallen
Overview ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
General features of genomes ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Bacterial genome organisation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Overview of a genome project ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Whole-Genome Shotgun Sanger Sequencing Random shearing Size selection Cloning Sequence each insert  with two primers Pick colonies to create shotgun library bacterial  chromosome plasmid vector Plasmid preps
High-throughput Sequencing ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
High-Throughput Shotgun Sequencing Random shearing Size selection bacterial  chromosome Add adapters Amplify Sequence
Illumina Sequencing
The Sequence Assembly Problem ,[object Object],[object Object],[object Object]
The Repeat Problem ,[object Object],ATTTATGTGT GTGTGGTGTG GTGTGGTGTG CACTACTGCT ACTACTGCTGACTACT GTGTGGTGTG GTGTGGTGTG ATATCCCT ATTTATGTGT GTGTGGTGTG GTGTGGTGTG CACTACTGCT ACTACTGCTGACTACT GTGTGGTGTG GTGTGGTGTG ATATCCCT Correct Incorrect
Paired-end Sequencing Random shearing Size selection for 3kb or 8kb etc bacterial  chromosome Add linkers Circularise Shear and select on size and presence of linkers Add adapters Obtain sequences from either side of linker known distance apart in genome ,[object Object],[object Object],[object Object],[object Object]
Genome Assembly Scaffold Contig 3 Contig 2 Contig 1 Physical Gap Sequence Gap
Re-sequencing ,[object Object],[object Object],[object Object]
SNP calling ,[object Object],[object Object],[object Object]
Genome annotation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
How to go from this….? ,[object Object],[object Object]
… to this? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Or this?
Caveat ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Sources of information for annotation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Approaches to functional annotation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Base composition aids genome analysis GC skew (G-C)/G+C) Identifies origin of replication and leading lagging strands Genes coded by location & function %G+C Genes shared with E. coli Genes unique to S. typhi
Analysis of nucleotide sequence data ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Gene Finding in bacteria ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Identifying protein-coding sequences ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The problem of conflicting ORFs Non-coding ORFs CDSs  (note ORF can extend upstream of start codon)
The Problem of Frameshift Errors Actual sequence 10  20  30  40  50  60  70 |  |  |  |  |  |  | ATGAGTACCGCTAAATTAGTTAAATCAAAAGCGACCAATCTGCTTTATACCCGCAACGATGTCTCCGACAGCGAGAAA M  S  T  A  K  L  V  K  S  K  A  T  N  L  L  Y  T  R  N  D  V  S  D  S  E  K  •  V  P  L  N  •  L  N  Q  K  R  P  I  C  F  I  P  A  T  M  S  P  T  A  R  K  E  Y  R  •  I  S  •  I  K  S  D  Q  S  A  L  Y  P  Q  R  C  L  R  Q  R  E  K  10  20  30  40  50  60  70 |  |  |  |  |  |  | ATGAGTACCGCTAAATTAGTTAAATCAAAAAGCGACCAATCTGCTTTATACCCGCAACGATGTCTCCGACAGCGAGAA M  S  T  A  K  L  V  K  S  K   S  D  Q  S  A  L  Y  P  Q  R  C  L  R  Q  R  E  •  V  P  L  N  •  L  N  Q  K  A  T  N  L  L  Y  T  R  N  D  V  S  D  S  E  K   E  Y  R  •  I  S  •  I  K  K  R  P  I  C  F  I  P  A  T  M  S  P  T  A  R  K  Frameshifted sequence after single base error
CDS Prediction: Graphical Plots GC content by reading frame Amino-acid composition by reading frame, compared to average for globular proteins
CDS Prediction: Markov Models ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Annotation of protein-coding genes ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Homology ,[object Object],[object Object],[object Object]
Homology ,[object Object],[object Object],[object Object],the cat  sat  on  the mat  die Katze sass auf der Matte vge|GBant88-2  ITLITCVSVKDNSKRYVVAG vge|GEfae9-178  LTLITCDQATKTTGRIIVIA vge|GSpne1-403  MTLITCDPIPTFNKRLLVNF sortase_staur  LTLITCDDYNEKTGVWEKRK
Types of Homology ,[object Object],[object Object],[object Object],[object Object]
Homology Searches ,[object Object],[object Object],[object Object],[object Object],[object Object]
What is BLAST? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The several flavours of BLAST ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Chosing the right flavour ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Low complexity filtering ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Understanding BLAST Results ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Bit Scores high is good E-values low is good http://www.ncbi.nlm.nih.gov/BLAST/tutorial/
Typical Blast Output Sum Reading  High Probability Sequences producing High-scoring Segment Pairs:  Frame  Score  P(N)  N emb|X69337|ECDPS  E.coli dps gene for binding protein  +2  834  6.4e-109  1 gb|U04242|ECU04242  Escherichia coli core starvation p... +3  828  2.7e-106  1 emb|X14180|ECGLNHPQ  Escherichia coli glutamine permeas... +3  443  2.8e-53  1 gb|U18769|HDU18769  Haemophilus ducreyi fine tangled p... +1  150  4.0e-18  2 dbj|D01016|ANALTI46  Anabaena variabilis lti46 gene. >e... +2  129  4.8e-12  2 gb|M84990|P26BPO  Plasmid pOP2621 ORF1 gene, 5' end;... -2  131  6.7e-09  1 gb|U16121|HPU16121  Helicobacter pylori neutrophil act... +1  112  1.8e-06  1 gb|M32401|TRPTYF1  T.pallidum pallidum antigen TyF1 g... +3  101  5.6e-06  2 emb|X71436|RPNTRB  R.phaseoli ntrB gene +1  67  0.76  2 gb|L35598|DRODGC1A  Drosophila melanogaster receptor g... +1  48  0.97  3
Typical Blast Output gb|U18769|HDU18769  Haemophilus ducreyi fine tangled pili major pilin subunit gene Length = 780 Plus Strand HSPs: Score = 150 (68.0 bits), Expect = 4.0e-18, Sum P(2) = 4.0e-18 Identities = 36/89 (40%), Positives = 46/89 (51%), Frame = +1 Query:  30 ELLNRQVIQFIDLSLITKQAHWNMRGANFIAVHEMLDGFRTALIDHLDTMAERAVQLGGV 89 E L  ++  +L+LI K AHWN+ G  FIAVHEMLD  + D +D +AER  LG  Sbjct:  253 EALQMRLQGLNELALILKHAHWNVVGPQFIAVHEMLDSQVDEVRDFIDEIAERMATLGVA 432 Query:  90 ALGTTQVINSKTPLKSYPLDIHNVQDHLK 118 G +  +  YPL  QDHLK Sbjct:  433 PNGLSGNLVETRQSPEYPLGRATAQDHLK 519
Domain database searches ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Pfam domains
Pfam search results
The Annotation Catastrophe Signal Peptide A protease B Coiled coil domain C Homology lies in one domain Signal Peptide Protein A “ a protease” Protein B Protein C But functional assignment for whole of protein A comes from another domain, carried across in error, so proteins B and C get misannotated as proteases
Annotation: rules to consider ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Overview ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

More Related Content

What's hot

Lectut btn-202-ppt-l31. dna sequencing-i
Lectut btn-202-ppt-l31. dna sequencing-iLectut btn-202-ppt-l31. dna sequencing-i
Lectut btn-202-ppt-l31. dna sequencing-iRishabh Jain
 
Chapter 20 ppt
Chapter 20 pptChapter 20 ppt
Chapter 20 pptrehman2009
 
Third Generation Sequencing
Third Generation Sequencing Third Generation Sequencing
Third Generation Sequencing priyanka raviraj
 
chloroplast genome ppt.
chloroplast genome ppt.chloroplast genome ppt.
chloroplast genome ppt.dbskkv
 
Basic Concepts OF RFLP, VNTR, SINE, LINE
Basic Concepts OF RFLP, VNTR, SINE, LINEBasic Concepts OF RFLP, VNTR, SINE, LINE
Basic Concepts OF RFLP, VNTR, SINE, LINEMohit Roy
 
shotgun sequncing
 shotgun sequncing shotgun sequncing
shotgun sequncingSAIFALI444
 
2 whole genome sequencing and analysis
2 whole genome sequencing and analysis2 whole genome sequencing and analysis
2 whole genome sequencing and analysissaberhussain9
 
GENOMIC MAPPING:FISH(Fluorescent in situ hybridization )
GENOMIC MAPPING:FISH(Fluorescent in situ hybridization )GENOMIC MAPPING:FISH(Fluorescent in situ hybridization )
GENOMIC MAPPING:FISH(Fluorescent in situ hybridization )UTTARAN MODHUKALYA
 
Recombinant DNA Technology, Forensic DNA Analysis and Human Genome Project
Recombinant DNA Technology, Forensic DNA Analysis and Human Genome ProjectRecombinant DNA Technology, Forensic DNA Analysis and Human Genome Project
Recombinant DNA Technology, Forensic DNA Analysis and Human Genome ProjectNateneal Tamerat
 

What's hot (20)

Genome sequencing
Genome sequencingGenome sequencing
Genome sequencing
 
Molecular marker
Molecular markerMolecular marker
Molecular marker
 
Lectut btn-202-ppt-l31. dna sequencing-i
Lectut btn-202-ppt-l31. dna sequencing-iLectut btn-202-ppt-l31. dna sequencing-i
Lectut btn-202-ppt-l31. dna sequencing-i
 
Chapter 20 ppt
Chapter 20 pptChapter 20 ppt
Chapter 20 ppt
 
Third Generation Sequencing
Third Generation Sequencing Third Generation Sequencing
Third Generation Sequencing
 
Vntr marker
Vntr markerVntr marker
Vntr marker
 
chloroplast genome ppt.
chloroplast genome ppt.chloroplast genome ppt.
chloroplast genome ppt.
 
dna sequencing methods
 dna sequencing methods dna sequencing methods
dna sequencing methods
 
Probe labelling
Probe labellingProbe labelling
Probe labelling
 
Basic Concepts OF RFLP, VNTR, SINE, LINE
Basic Concepts OF RFLP, VNTR, SINE, LINEBasic Concepts OF RFLP, VNTR, SINE, LINE
Basic Concepts OF RFLP, VNTR, SINE, LINE
 
DNA Library
DNA LibraryDNA Library
DNA Library
 
PPT ON MICROBIAL GENOME
PPT ON MICROBIAL GENOMEPPT ON MICROBIAL GENOME
PPT ON MICROBIAL GENOME
 
shotgun sequncing
 shotgun sequncing shotgun sequncing
shotgun sequncing
 
2 whole genome sequencing and analysis
2 whole genome sequencing and analysis2 whole genome sequencing and analysis
2 whole genome sequencing and analysis
 
GENOMIC MAPPING:FISH(Fluorescent in situ hybridization )
GENOMIC MAPPING:FISH(Fluorescent in situ hybridization )GENOMIC MAPPING:FISH(Fluorescent in situ hybridization )
GENOMIC MAPPING:FISH(Fluorescent in situ hybridization )
 
Cloning vectors
Cloning vectorsCloning vectors
Cloning vectors
 
Shahbaz Str
Shahbaz StrShahbaz Str
Shahbaz Str
 
Non-PCR-based Molecular Methods
Non-PCR-based Molecular MethodsNon-PCR-based Molecular Methods
Non-PCR-based Molecular Methods
 
Lecture 2
Lecture 2Lecture 2
Lecture 2
 
Recombinant DNA Technology, Forensic DNA Analysis and Human Genome Project
Recombinant DNA Technology, Forensic DNA Analysis and Human Genome ProjectRecombinant DNA Technology, Forensic DNA Analysis and Human Genome Project
Recombinant DNA Technology, Forensic DNA Analysis and Human Genome Project
 

Viewers also liked

Bio305 2012 Lecture 1 on E. coli
Bio305 2012 Lecture 1 on E. coliBio305 2012 Lecture 1 on E. coli
Bio305 2012 Lecture 1 on E. coliMark Pallen
 
Escherichia coli
Escherichia coliEscherichia coli
Escherichia coliBritni Bell
 
Nikita rory dkoda
Nikita rory dkodaNikita rory dkoda
Nikita rory dkodamaths00001
 
What Is Web 2.0 ?
What Is Web 2.0 ? What Is Web 2.0 ?
What Is Web 2.0 ? Jeremaya
 
Grantsmanship: A personal view
Grantsmanship: A personal viewGrantsmanship: A personal view
Grantsmanship: A personal viewMark Pallen
 
Hum evolgen2011 scatterlingsofafrica
Hum evolgen2011 scatterlingsofafricaHum evolgen2011 scatterlingsofafrica
Hum evolgen2011 scatterlingsofafricaMark Pallen
 
Northern ireland interviewees
Northern ireland intervieweesNorthern ireland interviewees
Northern ireland intervieweeskatyfleury
 
EVO Jaarcongres 2014 - Presentatie shopping 2020
EVO Jaarcongres 2014 - Presentatie shopping 2020 EVO Jaarcongres 2014 - Presentatie shopping 2020
EVO Jaarcongres 2014 - Presentatie shopping 2020 evofenedex
 
Bio380 Human Evolution: Waking the dead
Bio380 Human Evolution: Waking the deadBio380 Human Evolution: Waking the dead
Bio380 Human Evolution: Waking the deadMark Pallen
 
2011-04-26_various-assemblers-presentation
2011-04-26_various-assemblers-presentation2011-04-26_various-assemblers-presentation
2011-04-26_various-assemblers-presentationmhaimel
 
Bio380 Cancer Phylogenomics
Bio380 Cancer PhylogenomicsBio380 Cancer Phylogenomics
Bio380 Cancer PhylogenomicsMark Pallen
 
Bio303 Lecture 2 Two Old Enemies, TB and Leprosy
Bio303 Lecture 2 Two Old Enemies, TB and LeprosyBio303 Lecture 2 Two Old Enemies, TB and Leprosy
Bio303 Lecture 2 Two Old Enemies, TB and LeprosyMark Pallen
 
EVO SCM-congres - Parbleu ronde 2
EVO SCM-congres - Parbleu ronde 2EVO SCM-congres - Parbleu ronde 2
EVO SCM-congres - Parbleu ronde 2evofenedex
 
Bio263 Lecture 2: Becoming human
Bio263 Lecture 2: Becoming humanBio263 Lecture 2: Becoming human
Bio263 Lecture 2: Becoming humanMark Pallen
 
Luke emmateaurere power point
Luke emmateaurere power pointLuke emmateaurere power point
Luke emmateaurere power pointmaths00001
 
Bio380 hum evolgen2011_major_populations
Bio380 hum evolgen2011_major_populationsBio380 hum evolgen2011_major_populations
Bio380 hum evolgen2011_major_populationsMark Pallen
 
Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Li Shen
 

Viewers also liked (20)

Bio305 2012 Lecture 1 on E. coli
Bio305 2012 Lecture 1 on E. coliBio305 2012 Lecture 1 on E. coli
Bio305 2012 Lecture 1 on E. coli
 
Escherichia coli
Escherichia coliEscherichia coli
Escherichia coli
 
Nikita rory dkoda
Nikita rory dkodaNikita rory dkoda
Nikita rory dkoda
 
What Is Web 2.0 ?
What Is Web 2.0 ? What Is Web 2.0 ?
What Is Web 2.0 ?
 
Postgresql 9.3-a4
Postgresql 9.3-a4Postgresql 9.3-a4
Postgresql 9.3-a4
 
Grantsmanship: A personal view
Grantsmanship: A personal viewGrantsmanship: A personal view
Grantsmanship: A personal view
 
Hum evolgen2011 scatterlingsofafrica
Hum evolgen2011 scatterlingsofafricaHum evolgen2011 scatterlingsofafrica
Hum evolgen2011 scatterlingsofafrica
 
Northern ireland interviewees
Northern ireland intervieweesNorthern ireland interviewees
Northern ireland interviewees
 
Ducky momo
Ducky momoDucky momo
Ducky momo
 
EVO Jaarcongres 2014 - Presentatie shopping 2020
EVO Jaarcongres 2014 - Presentatie shopping 2020 EVO Jaarcongres 2014 - Presentatie shopping 2020
EVO Jaarcongres 2014 - Presentatie shopping 2020
 
Bio380 Human Evolution: Waking the dead
Bio380 Human Evolution: Waking the deadBio380 Human Evolution: Waking the dead
Bio380 Human Evolution: Waking the dead
 
2011-04-26_various-assemblers-presentation
2011-04-26_various-assemblers-presentation2011-04-26_various-assemblers-presentation
2011-04-26_various-assemblers-presentation
 
Bio380 Cancer Phylogenomics
Bio380 Cancer PhylogenomicsBio380 Cancer Phylogenomics
Bio380 Cancer Phylogenomics
 
Bio303 Lecture 2 Two Old Enemies, TB and Leprosy
Bio303 Lecture 2 Two Old Enemies, TB and LeprosyBio303 Lecture 2 Two Old Enemies, TB and Leprosy
Bio303 Lecture 2 Two Old Enemies, TB and Leprosy
 
EVO SCM-congres - Parbleu ronde 2
EVO SCM-congres - Parbleu ronde 2EVO SCM-congres - Parbleu ronde 2
EVO SCM-congres - Parbleu ronde 2
 
Bio263 Lecture 2: Becoming human
Bio263 Lecture 2: Becoming humanBio263 Lecture 2: Becoming human
Bio263 Lecture 2: Becoming human
 
Luke emmateaurere power point
Luke emmateaurere power pointLuke emmateaurere power point
Luke emmateaurere power point
 
Bio380 hum evolgen2011_major_populations
Bio380 hum evolgen2011_major_populationsBio380 hum evolgen2011_major_populations
Bio380 hum evolgen2011_major_populations
 
Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2
 
Blast 2013 1
Blast 2013 1Blast 2013 1
Blast 2013 1
 

Similar to Bio305 genome analysis and annotation 2012

High-Throughput Sequencing
High-Throughput SequencingHigh-Throughput Sequencing
High-Throughput SequencingMark Pallen
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencingDayananda Salam
 
20150601 bio sb_assembly_course
20150601 bio sb_assembly_course20150601 bio sb_assembly_course
20150601 bio sb_assembly_coursehansjansen9999
 
Nextgenerationsequencing 120202015950-phpapp02
Nextgenerationsequencing 120202015950-phpapp02Nextgenerationsequencing 120202015950-phpapp02
Nextgenerationsequencing 120202015950-phpapp02t7260678
 
whole-genome-sequencing-guide-small-genomes.pdf.pdf
whole-genome-sequencing-guide-small-genomes.pdf.pdfwhole-genome-sequencing-guide-small-genomes.pdf.pdf
whole-genome-sequencing-guide-small-genomes.pdf.pdfCRISTIANALONSORODRIG1
 
Present status and recent developments on available molecular marker.pptx
Present status and recent developments on available molecular marker.pptxPresent status and recent developments on available molecular marker.pptx
Present status and recent developments on available molecular marker.pptxPrabhatSingh628463
 
Characterizing Alzheimer’s Disease candidate genes and transcripts with targe...
Characterizing Alzheimer’s Disease candidate genes and transcripts with targe...Characterizing Alzheimer’s Disease candidate genes and transcripts with targe...
Characterizing Alzheimer’s Disease candidate genes and transcripts with targe...Integrated DNA Technologies
 
Unilag workshop complex genome analysis
Unilag workshop   complex genome analysisUnilag workshop   complex genome analysis
Unilag workshop complex genome analysisDr. Olusoji Adewumi
 
Marker devt. workshop 27022012
Marker devt. workshop 27022012Marker devt. workshop 27022012
Marker devt. workshop 27022012Koppolu Ravi
 
Genome sequencing. ppt.pptx
Genome sequencing. ppt.pptxGenome sequencing. ppt.pptx
Genome sequencing. ppt.pptxGedifewGebrie
 
2014 whitney-research
2014 whitney-research2014 whitney-research
2014 whitney-researchc.titus.brown
 
20081216 06陳倩琪 紅麴菌基因體之定序與分析
20081216 06陳倩琪 紅麴菌基因體之定序與分析20081216 06陳倩琪 紅麴菌基因體之定序與分析
20081216 06陳倩琪 紅麴菌基因體之定序與分析Monascus2008
 
ECCMID 2015 - So I have sequenced my genome ... what now?
ECCMID 2015 - So I have sequenced my genome ... what now?ECCMID 2015 - So I have sequenced my genome ... what now?
ECCMID 2015 - So I have sequenced my genome ... what now?Nick Loman
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisdrelamuruganvet
 
2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issues2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issuesDongyan Zhao
 
next generation sequencing
next generation sequencingnext generation sequencing
next generation sequencingPeter Egorov
 
Molecular markers by tahura mariyam ansari
Molecular markers by tahura mariyam ansariMolecular markers by tahura mariyam ansari
Molecular markers by tahura mariyam ansariTahura Mariyam Ansari
 
Genomiclibrary 151004020241-lva1-app6891
Genomiclibrary 151004020241-lva1-app6891Genomiclibrary 151004020241-lva1-app6891
Genomiclibrary 151004020241-lva1-app6891saurabh verma
 

Similar to Bio305 genome analysis and annotation 2012 (20)

High-Throughput Sequencing
High-Throughput SequencingHigh-Throughput Sequencing
High-Throughput Sequencing
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
20150601 bio sb_assembly_course
20150601 bio sb_assembly_course20150601 bio sb_assembly_course
20150601 bio sb_assembly_course
 
Nextgenerationsequencing 120202015950-phpapp02
Nextgenerationsequencing 120202015950-phpapp02Nextgenerationsequencing 120202015950-phpapp02
Nextgenerationsequencing 120202015950-phpapp02
 
PCR
PCRPCR
PCR
 
whole-genome-sequencing-guide-small-genomes.pdf.pdf
whole-genome-sequencing-guide-small-genomes.pdf.pdfwhole-genome-sequencing-guide-small-genomes.pdf.pdf
whole-genome-sequencing-guide-small-genomes.pdf.pdf
 
Present status and recent developments on available molecular marker.pptx
Present status and recent developments on available molecular marker.pptxPresent status and recent developments on available molecular marker.pptx
Present status and recent developments on available molecular marker.pptx
 
Characterizing Alzheimer’s Disease candidate genes and transcripts with targe...
Characterizing Alzheimer’s Disease candidate genes and transcripts with targe...Characterizing Alzheimer’s Disease candidate genes and transcripts with targe...
Characterizing Alzheimer’s Disease candidate genes and transcripts with targe...
 
Unilag workshop complex genome analysis
Unilag workshop   complex genome analysisUnilag workshop   complex genome analysis
Unilag workshop complex genome analysis
 
Marker devt. workshop 27022012
Marker devt. workshop 27022012Marker devt. workshop 27022012
Marker devt. workshop 27022012
 
Genome sequencing. ppt.pptx
Genome sequencing. ppt.pptxGenome sequencing. ppt.pptx
Genome sequencing. ppt.pptx
 
2014 whitney-research
2014 whitney-research2014 whitney-research
2014 whitney-research
 
20081216 06陳倩琪 紅麴菌基因體之定序與分析
20081216 06陳倩琪 紅麴菌基因體之定序與分析20081216 06陳倩琪 紅麴菌基因體之定序與分析
20081216 06陳倩琪 紅麴菌基因體之定序與分析
 
ECCMID 2015 - So I have sequenced my genome ... what now?
ECCMID 2015 - So I have sequenced my genome ... what now?ECCMID 2015 - So I have sequenced my genome ... what now?
ECCMID 2015 - So I have sequenced my genome ... what now?
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysis
 
2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issues2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issues
 
next generation sequencing
next generation sequencingnext generation sequencing
next generation sequencing
 
Genomic library
Genomic libraryGenomic library
Genomic library
 
Molecular markers by tahura mariyam ansari
Molecular markers by tahura mariyam ansariMolecular markers by tahura mariyam ansari
Molecular markers by tahura mariyam ansari
 
Genomiclibrary 151004020241-lva1-app6891
Genomiclibrary 151004020241-lva1-app6891Genomiclibrary 151004020241-lva1-app6891
Genomiclibrary 151004020241-lva1-app6891
 

More from Mark Pallen

Nothing in Microbiology makes Sense except in the Light of Evolution
Nothing in Microbiology makes Sense except in the Light of EvolutionNothing in Microbiology makes Sense except in the Light of Evolution
Nothing in Microbiology makes Sense except in the Light of EvolutionMark Pallen
 
Bio305 Lecture on Genetics
Bio305 Lecture on Genetics Bio305 Lecture on Genetics
Bio305 Lecture on Genetics Mark Pallen
 
Bio305 Lecture on Gene Regulation in Bacterial Pathogens
Bio305 Lecture on Gene Regulation in Bacterial PathogensBio305 Lecture on Gene Regulation in Bacterial Pathogens
Bio305 Lecture on Gene Regulation in Bacterial PathogensMark Pallen
 
Bio305 pathogen biology_2012
Bio305 pathogen biology_2012Bio305 pathogen biology_2012
Bio305 pathogen biology_2012Mark Pallen
 
Bio303 laboratory diagnosis of infection
Bio303 laboratory diagnosis of infectionBio303 laboratory diagnosis of infection
Bio303 laboratory diagnosis of infectionMark Pallen
 
Bio303 Lecture Three: New Foes, Emerging Infections
Bio303 Lecture Three: New Foes, Emerging InfectionsBio303 Lecture Three: New Foes, Emerging Infections
Bio303 Lecture Three: New Foes, Emerging InfectionsMark Pallen
 
Bio263 Who is our Closest Relative
Bio263 Who is  our Closest RelativeBio263 Who is  our Closest Relative
Bio263 Who is our Closest RelativeMark Pallen
 
Bio303 Lecture 1 The Global Burden of Infection and an Old Enemy, Malaria
Bio303 Lecture 1 The Global Burden of Infection and an Old Enemy, MalariaBio303 Lecture 1 The Global Burden of Infection and an Old Enemy, Malaria
Bio303 Lecture 1 The Global Burden of Infection and an Old Enemy, MalariaMark Pallen
 
Bio380 2011 The Wandering Gene
Bio380 2011 The Wandering GeneBio380 2011 The Wandering Gene
Bio380 2011 The Wandering GeneMark Pallen
 

More from Mark Pallen (9)

Nothing in Microbiology makes Sense except in the Light of Evolution
Nothing in Microbiology makes Sense except in the Light of EvolutionNothing in Microbiology makes Sense except in the Light of Evolution
Nothing in Microbiology makes Sense except in the Light of Evolution
 
Bio305 Lecture on Genetics
Bio305 Lecture on Genetics Bio305 Lecture on Genetics
Bio305 Lecture on Genetics
 
Bio305 Lecture on Gene Regulation in Bacterial Pathogens
Bio305 Lecture on Gene Regulation in Bacterial PathogensBio305 Lecture on Gene Regulation in Bacterial Pathogens
Bio305 Lecture on Gene Regulation in Bacterial Pathogens
 
Bio305 pathogen biology_2012
Bio305 pathogen biology_2012Bio305 pathogen biology_2012
Bio305 pathogen biology_2012
 
Bio303 laboratory diagnosis of infection
Bio303 laboratory diagnosis of infectionBio303 laboratory diagnosis of infection
Bio303 laboratory diagnosis of infection
 
Bio303 Lecture Three: New Foes, Emerging Infections
Bio303 Lecture Three: New Foes, Emerging InfectionsBio303 Lecture Three: New Foes, Emerging Infections
Bio303 Lecture Three: New Foes, Emerging Infections
 
Bio263 Who is our Closest Relative
Bio263 Who is  our Closest RelativeBio263 Who is  our Closest Relative
Bio263 Who is our Closest Relative
 
Bio303 Lecture 1 The Global Burden of Infection and an Old Enemy, Malaria
Bio303 Lecture 1 The Global Burden of Infection and an Old Enemy, MalariaBio303 Lecture 1 The Global Burden of Infection and an Old Enemy, Malaria
Bio303 Lecture 1 The Global Burden of Infection and an Old Enemy, Malaria
 
Bio380 2011 The Wandering Gene
Bio380 2011 The Wandering GeneBio380 2011 The Wandering Gene
Bio380 2011 The Wandering Gene
 

Recently uploaded

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 

Bio305 genome analysis and annotation 2012

  • 1. Bio305 Bacterial Genome Annotation and Analysis Professor Mark Pallen
  • 2.
  • 3.
  • 4.
  • 5.
  • 6. Whole-Genome Shotgun Sanger Sequencing Random shearing Size selection Cloning Sequence each insert with two primers Pick colonies to create shotgun library bacterial chromosome plasmid vector Plasmid preps
  • 7.
  • 8. High-Throughput Shotgun Sequencing Random shearing Size selection bacterial chromosome Add adapters Amplify Sequence
  • 10.
  • 11.
  • 12.
  • 13. Genome Assembly Scaffold Contig 3 Contig 2 Contig 1 Physical Gap Sequence Gap
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 20.
  • 21.
  • 22.
  • 23. Base composition aids genome analysis GC skew (G-C)/G+C) Identifies origin of replication and leading lagging strands Genes coded by location & function %G+C Genes shared with E. coli Genes unique to S. typhi
  • 24.
  • 25.
  • 26.
  • 27. The problem of conflicting ORFs Non-coding ORFs CDSs (note ORF can extend upstream of start codon)
  • 28. The Problem of Frameshift Errors Actual sequence 10 20 30 40 50 60 70 | | | | | | | ATGAGTACCGCTAAATTAGTTAAATCAAAAGCGACCAATCTGCTTTATACCCGCAACGATGTCTCCGACAGCGAGAAA M S T A K L V K S K A T N L L Y T R N D V S D S E K • V P L N • L N Q K R P I C F I P A T M S P T A R K E Y R • I S • I K S D Q S A L Y P Q R C L R Q R E K 10 20 30 40 50 60 70 | | | | | | | ATGAGTACCGCTAAATTAGTTAAATCAAAAAGCGACCAATCTGCTTTATACCCGCAACGATGTCTCCGACAGCGAGAA M S T A K L V K S K S D Q S A L Y P Q R C L R Q R E • V P L N • L N Q K A T N L L Y T R N D V S D S E K E Y R • I S • I K K R P I C F I P A T M S P T A R K Frameshifted sequence after single base error
  • 29. CDS Prediction: Graphical Plots GC content by reading frame Amino-acid composition by reading frame, compared to average for globular proteins
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41. Bit Scores high is good E-values low is good http://www.ncbi.nlm.nih.gov/BLAST/tutorial/
  • 42. Typical Blast Output Sum Reading High Probability Sequences producing High-scoring Segment Pairs: Frame Score P(N) N emb|X69337|ECDPS E.coli dps gene for binding protein +2 834 6.4e-109 1 gb|U04242|ECU04242 Escherichia coli core starvation p... +3 828 2.7e-106 1 emb|X14180|ECGLNHPQ Escherichia coli glutamine permeas... +3 443 2.8e-53 1 gb|U18769|HDU18769 Haemophilus ducreyi fine tangled p... +1 150 4.0e-18 2 dbj|D01016|ANALTI46 Anabaena variabilis lti46 gene. >e... +2 129 4.8e-12 2 gb|M84990|P26BPO Plasmid pOP2621 ORF1 gene, 5' end;... -2 131 6.7e-09 1 gb|U16121|HPU16121 Helicobacter pylori neutrophil act... +1 112 1.8e-06 1 gb|M32401|TRPTYF1 T.pallidum pallidum antigen TyF1 g... +3 101 5.6e-06 2 emb|X71436|RPNTRB R.phaseoli ntrB gene +1 67 0.76 2 gb|L35598|DRODGC1A Drosophila melanogaster receptor g... +1 48 0.97 3
  • 43. Typical Blast Output gb|U18769|HDU18769 Haemophilus ducreyi fine tangled pili major pilin subunit gene Length = 780 Plus Strand HSPs: Score = 150 (68.0 bits), Expect = 4.0e-18, Sum P(2) = 4.0e-18 Identities = 36/89 (40%), Positives = 46/89 (51%), Frame = +1 Query: 30 ELLNRQVIQFIDLSLITKQAHWNMRGANFIAVHEMLDGFRTALIDHLDTMAERAVQLGGV 89 E L ++ +L+LI K AHWN+ G FIAVHEMLD + D +D +AER LG Sbjct: 253 EALQMRLQGLNELALILKHAHWNVVGPQFIAVHEMLDSQVDEVRDFIDEIAERMATLGVA 432 Query: 90 ALGTTQVINSKTPLKSYPLDIHNVQDHLK 118 G + + YPL QDHLK Sbjct: 433 PNGLSGNLVETRQSPEYPLGRATAQDHLK 519
  • 44.
  • 47. The Annotation Catastrophe Signal Peptide A protease B Coiled coil domain C Homology lies in one domain Signal Peptide Protein A “ a protease” Protein B Protein C But functional assignment for whole of protein A comes from another domain, carried across in error, so proteins B and C get misannotated as proteases
  • 48.
  • 49.

Editor's Notes

  1. 12
  2. 13