SlideShare una empresa de Scribd logo
1 de 55
MAKER
The Genome Annotation Pipeline
GMOD Summer Course
May 19, 2014
Barry Moore/Carson Holt
Yandell Lab
University of Utah
MAKER
• The Annotation Problem
• How MAKER Works
• Why Choose MAKER
• Working with MAKER
What are Annotations?
FunctionalStructural
Function
cAMP-dependent and sulfonylurea-sensitive anion transporter. Key
gatekeeper influencing intracellular cholesterol transport.
Subcellular
location Membrane; Multi-pass membrane protein Ref.13 Ref.14.
Domain
Multifunctional polypeptide with two homologous halves, each
containing a hydrophobic membrane-anchoring domain and an ATP
binding cassette (ABC) domain.
Genomes Online Database
http://www.genomesonline.org/
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
1998 2000 2002 2004 2006 2008 2010 2012
Genomes
Year
Genome Project Status
Incomplete
Complete
http://www.genome.gov/
http://www.genome.gov/
100
1,600
3,200
4,800
6,400
8,000
0
Next Gen Genome Annotation 2013-
14
• Coelacanth
• Pine
• Sacred Lotus
• Conus ballatus
• Pigeon
• King Cobra
• Hymenopterids
• Fusarium cirinatum
• Cardiocondyla
obscurior
• Burmese Python
• Sarcocystis neurona
• Spotted Gar
• Apple magot fly
The ‘NextGen’ Genome Project
Lab/Small Group Funding
Short-read Genome Sequencing
RNASeq Data
Genome/Transcriptome Assembly
Gene Annotation
Genome Database / Blast Server
Manual curation
New assembly
Reannotate/Merge annotations
• The Annotation Problem
• How MAKER Works
• Why Choose MAKER
• Working with MAKER
MAKER
The Source of Annotations
RNA and
Protein
Evidence
Accurate
Gene
Annotations
Ab Initio
Computational
Evidence
Annotating the Genome – Apollo View
current evidence
gene annotations
genome assembly
http://apollo.berkeleybop.org/
Identify and mask repetitive elements
current evidence
genome assembly
http://www.repeatmasker.org
Generate ab initio gene predictions
ab initio predictionsSNAP
GeneMark
Augustus
current evidence
genome assembly
http://korflab.ucdavis.edu/
Align RNA and protein evidence
ab initio predictions
protein - BLASTX
EST - BLASTN
altEST - TBLASTX
current evidence
genome assembly
http://blast.ncbi.nlm.nih.gov
Polish BLAST alignments with Exonerate
ab initio predictions
polished protein
polished EST
current evidence
genome assembly
http://www.ebi.ac.uk/~guy/exonerate/
current evidence
Pass gene-finders evidence-based ‘hints’
ab initio predictions
Hint-based SNAP
Hint-based Augustus
genome assembly
current evidence
Identify gene model most consistent with evidence
ab initio predictions*
Hint-based SNAP
Hint-based Augustus
genome assembly
current evidence
Revise further if necessary; create new annotation
ab initio predictions
genome assembly
Compute support for each portion of gene model
Eilbeck et al BMC Bioinformatics 2009
genome assembly
Compute support for each portion of gene model
Cantarel BL et al., Genome Res 2008
genome assembly
MAKER2 Workflow
MAKER2 Distributed Workflow
Paralellization
Efficiency
Holt C, Yandell M. BMC Bioinformatics. 2011 12:491.
30 GB Pine genome
annotated in 37 hrs on
6,000 CPUs at the
TACC
• The Annotation Problem
• How MAKER Works
• Why Choose MAKER
• Working with MAKER
MAKER
MAKER
The Genome Annotation Pipeline
GMOD Summer Course
May 19, 2014
Barry Moore/Carson Holt
Yandell Lab
University of Utah
MAKER2 Use Cases
1. De novo annotation providing quality metrics
2. Merging multiple annotation sets
3. Re-annotation with new evidence
4. Mapping annotations forward to a new
assembly
5. Generating GMOD Compliant Output
1. Gbrowse/JBrowse
2. Apollo
3. Tripal
Sensitivity, Specificity, Accuracy
As a Measure of Annotation Quality
Gold Standard Genes
SN SP AC
1.0 1.0 100%
Gold Standard Genes
Perfect Accuracy
Sensitivity, Specificity, Accuracy
As a Measure of Annotation Quality
SN SP AC
1.0 1.0 100%
1.0 0.5 80%
Gold Standard Genes
Perfect Accuracy
Poor Specificity
Sensitivity, Specificity, Accuracy
As a Measure of Annotation Quality
SN SP AC
1.0 1.0 100%
1.0 0.5 80%
0.5 1.0 80%
Gold Standard Genes
Perfect Accuracy
Poor Specificity
Poor Sensitivity
Sensitivity, Specificity, Accuracy
As a Measure of Annotation Quality
SN SP AC
1.0 1.0 100%
1.0 0.5 80%
0.5 1.0 80%
0.5 0.5 50%
Gold Standard Genes
Perfect Accuracy
Poor Specificity
Poor Sensitivity
Poor Specificity
and Sensitivity
Sensitivity, Specificity, Accuracy
As a Measure of Annotation Quality
Guigó R et al. Genome Biol. 2006
MAKER vs. Predictors
Holt C, Yandell M. BMC Bioinformatics. 2011
MAKER vs. Predictors
(the wrong HMM...)
Holt C, Yandell M. BMC Bioinformatics. 2011 12:491.
Annotation Edit Distance
Gold Standard GenesGold Standard
Evidence
Protein Alignments
EST Alignments
mRNASeq
Eilbeck et al BMC Bioinformatics 2009
Annotation Edit Distance
SN SP AED
1.0 1.0 0.0
1.0 0.5 0.2
0.5 1.0 0.2
0.5 0.5 0.5
Gold Standard
Evidence
Perfect Accuracy
Poor Specificity
Poor Sensitivity
Poor Specificity
and Sensitivity
Eilbeck et al BMC Bioinformatics 2009
AED as a Measure of Genome Wide Annotation
Quality
Eilbeck et al BMC Bioinformatics 2009
TAIR Star Rating System
http://www.arabidopsis.org/
AED Agrees well with the TAIR star system
Evidence: mRNA-seq (17 experiments), ESTs, full length cDNAs, Swiss-Prot (minus Arabidopsis)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.25 0.5 0.75 1
CumulativeFractionofAnnotations
AED
***** (7,880)
**** (12,654)
*** (2,087)
** (2,188)
* (1,788)
(604)
Holt C, Yandell M. BMC Bioinformatics. 2011
AED as a Measure of Annotation Quality
MAKER Annotations Match the
Evidence Well
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.25 0.5 0.75 1
CumulativeFractionofAnnotations
AED
TAIR10 rep transcripts (27,206)
MAKER de novo (25,956)
MAKER update of TAIR10
(26,885)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.25 0.5 0.75 1
CumulativeFractionofAnnotations
AED
chr10 rep transcripts (2,688)
MAKER de novo (3,056)
MAKER update of v3 (2,661)
A. thaliana Z. mays
Campbell et al, 2013 submitted
Protein Domain Content
As a Measure of Annotation Quality
Holt C, Yandell M. BMC Bioinformatics. 2011
MAKER vs. Predictors
Holt C, Yandell M. BMC Bioinformatics. 2011
• The Annotation Problem
• How MAKER Works
• Why Choose MAKER
• Working with MAKER
MAKER
http://derringer.genetics.utah.edu/cgi-bin/mwas/maker.cgi
MAKER Installation
• Automated query/answer based installation
script.
• Installs Perl prerequisites.
• Installs necessary executables
– RepeatMasker (RepBase)
– BLAST+
– Exonerate
– SNAP
• Even installs MWAS and MPICH2
MAKER Runtime Features
• Fill out a config file with input data and
parameters
• Parallelize:
– Running with MPI
– Simply start multiple instances in the same
directory.
• Re-run MAKER in the same directory and it
won't redo completed work.
• Restart aborted jobs without losing any work.
Accessory Scripts
Over 30 accessory scripts:
•cegma2zff
•chado2gff3
•cufflinks2gff3
•gff3_2_gtf
•gff3_preds2models
•gff3_to_eval_gtf
•maker2chado
•maker2jbrowse
•maker2zff
•tophat2gff3
•compare
•evaluator
•gff3_merge
•fasta_merge
•fasta_tool
•fix_fasta
•genemark_gtf2gff3
•ipr_update_gff
•iprscan2gff3iprscan_batch
•iprscan_wrap
•maker_functional
•maker_functional_fasta
•maker_functional_gff
•maker_map_ids
•map2assembly
•map_data_ids
•map_fasta_ids
•map_gff_ids
•split_fasta
• The Annotation Problem
• How MAKER Works
• Why Choose MAKER
• Working with MAKER
MAKER
Acknowledgements
• Mark Yandell
– Carson Holt
– Mike Campbell
– Daniel Ence
– Steven Flygare
– Zev Kronenberg
– Qing Li
– Marc Singleton
– Bretty Kennedy
– Brandi Cantarel
– Hadi Islam
– Shawn Reynearson
– Nicole Ruiz
– Keith Simmons
– Bret Heale
• Alejandro Alvarado
– Eric Ross
• Jason Stajich
• Sophia Robb
• Kevin Childs
• Shin-Han Shui
• Ning Jiang
• Yanni Sun

Más contenido relacionado

La actualidad más candente

2nd stripe rust izmir
2nd stripe rust izmir2nd stripe rust izmir
2nd stripe rust izmirICARDA
 
Synthetic Biology-Engaging Biology with Engineering
Synthetic Biology-Engaging Biology with EngineeringSynthetic Biology-Engaging Biology with Engineering
Synthetic Biology-Engaging Biology with EngineeringNavaneetha Krishnan J
 
Analysis of binning tool in metagenomics
Analysis of binning tool in metagenomicsAnalysis of binning tool in metagenomics
Analysis of binning tool in metagenomicsDr. sreeremya S
 
Crispr and superbugs
Crispr and superbugsCrispr and superbugs
Crispr and superbugsAmay Redkar
 
A guide to harnessing dna assembly for drug discovery
A guide to harnessing dna assembly for drug discoveryA guide to harnessing dna assembly for drug discovery
A guide to harnessing dna assembly for drug discoveryLinda Song
 
Discovery and Annotation of Novel Proteins from Rumen Gut Metagenomic Sequenc...
Discovery and Annotation of Novel Proteins from Rumen Gut Metagenomic Sequenc...Discovery and Annotation of Novel Proteins from Rumen Gut Metagenomic Sequenc...
Discovery and Annotation of Novel Proteins from Rumen Gut Metagenomic Sequenc...Mick Watson
 
Exploring the Versatility of Micro-flow Technology – From Peptide Biomarkers ...
Exploring the Versatility of Micro-flow Technology – From Peptide Biomarkers ...Exploring the Versatility of Micro-flow Technology – From Peptide Biomarkers ...
Exploring the Versatility of Micro-flow Technology – From Peptide Biomarkers ...Waters Corporation
 
Investigation of genetic modification in maize and soymilk
Investigation of genetic modification in maize and soymilkInvestigation of genetic modification in maize and soymilk
Investigation of genetic modification in maize and soymilkFrank Soto
 

La actualidad más candente (9)

2nd stripe rust izmir
2nd stripe rust izmir2nd stripe rust izmir
2nd stripe rust izmir
 
Synthetic Biology-Engaging Biology with Engineering
Synthetic Biology-Engaging Biology with EngineeringSynthetic Biology-Engaging Biology with Engineering
Synthetic Biology-Engaging Biology with Engineering
 
Analysis of binning tool in metagenomics
Analysis of binning tool in metagenomicsAnalysis of binning tool in metagenomics
Analysis of binning tool in metagenomics
 
Crispr and superbugs
Crispr and superbugsCrispr and superbugs
Crispr and superbugs
 
A guide to harnessing dna assembly for drug discovery
A guide to harnessing dna assembly for drug discoveryA guide to harnessing dna assembly for drug discovery
A guide to harnessing dna assembly for drug discovery
 
Discovery and Annotation of Novel Proteins from Rumen Gut Metagenomic Sequenc...
Discovery and Annotation of Novel Proteins from Rumen Gut Metagenomic Sequenc...Discovery and Annotation of Novel Proteins from Rumen Gut Metagenomic Sequenc...
Discovery and Annotation of Novel Proteins from Rumen Gut Metagenomic Sequenc...
 
Exploring the Versatility of Micro-flow Technology – From Peptide Biomarkers ...
Exploring the Versatility of Micro-flow Technology – From Peptide Biomarkers ...Exploring the Versatility of Micro-flow Technology – From Peptide Biomarkers ...
Exploring the Versatility of Micro-flow Technology – From Peptide Biomarkers ...
 
Poster
PosterPoster
Poster
 
Investigation of genetic modification in maize and soymilk
Investigation of genetic modification in maize and soymilkInvestigation of genetic modification in maize and soymilk
Investigation of genetic modification in maize and soymilk
 

Similar a GMOD 2014 MAKER Lecture

HIV Vaccines Process Development & Manufacturing - Pitfalls & Possibilities
HIV Vaccines Process Development & Manufacturing - Pitfalls & PossibilitiesHIV Vaccines Process Development & Manufacturing - Pitfalls & Possibilities
HIV Vaccines Process Development & Manufacturing - Pitfalls & PossibilitiesKBI Biopharma
 
Molecular quantitative genetics for plant breeding roundtable 2010x
Molecular quantitative genetics for plant breeding roundtable 2010xMolecular quantitative genetics for plant breeding roundtable 2010x
Molecular quantitative genetics for plant breeding roundtable 2010xFOODCROPS
 
Target Inducing Local Lesions In Genome (Tilling)
Target Inducing Local Lesions In Genome (Tilling)Target Inducing Local Lesions In Genome (Tilling)
Target Inducing Local Lesions In Genome (Tilling)Ankit R. Chaudhary
 
Genomic Cytometry: Using Multi-Omic Approaches to Increase Dimensionality in ...
Genomic Cytometry: Using Multi-Omic Approaches to Increase Dimensionality in ...Genomic Cytometry: Using Multi-Omic Approaches to Increase Dimensionality in ...
Genomic Cytometry: Using Multi-Omic Approaches to Increase Dimensionality in ...Robert (Rob) Salomon
 
Tilling @ sid
Tilling @ sidTilling @ sid
Tilling @ sidsidjena70
 
Joe parker-benchmarking-bioinformatics
Joe parker-benchmarking-bioinformaticsJoe parker-benchmarking-bioinformatics
Joe parker-benchmarking-bioinformaticsJoe Parker
 
Expanding Your Research Capabilities Using Targeted NGS
Expanding Your Research Capabilities Using Targeted NGSExpanding Your Research Capabilities Using Targeted NGS
Expanding Your Research Capabilities Using Targeted NGSIntegrated DNA Technologies
 
Flow Cytometry Training : Introduction day 1 session 1
Flow Cytometry Training : Introduction day 1 session 1Flow Cytometry Training : Introduction day 1 session 1
Flow Cytometry Training : Introduction day 1 session 1Robert (Rob) Salomon
 
Learning Biologically Relevant Features Using Convolutional Neural Networks f...
Learning Biologically Relevant Features Using Convolutional Neural Networks f...Learning Biologically Relevant Features Using Convolutional Neural Networks f...
Learning Biologically Relevant Features Using Convolutional Neural Networks f...Wesley De Neve
 
How to Create CRISPR-Edited T Cells More Efficiently for Tomorrow's Cell Ther...
How to Create CRISPR-Edited T Cells More Efficiently for Tomorrow's Cell Ther...How to Create CRISPR-Edited T Cells More Efficiently for Tomorrow's Cell Ther...
How to Create CRISPR-Edited T Cells More Efficiently for Tomorrow's Cell Ther...InsideScientific
 
Sustaining and projecting genetic diversity: Potatoes adapted to changing needs
Sustaining and projecting genetic diversity: Potatoes adapted to changing needsSustaining and projecting genetic diversity: Potatoes adapted to changing needs
Sustaining and projecting genetic diversity: Potatoes adapted to changing needsInternational Potato Center
 
How we've made a global search engine for genetic data
How we've made a global search engine for genetic dataHow we've made a global search engine for genetic data
How we've made a global search engine for genetic dataMiro Cupak
 
4ISMET, Melanie PIERRA
4ISMET, Melanie PIERRA4ISMET, Melanie PIERRA
4ISMET, Melanie PIERRAmelaniepierra
 

Similar a GMOD 2014 MAKER Lecture (20)

HIV Vaccines Process Development & Manufacturing - Pitfalls & Possibilities
HIV Vaccines Process Development & Manufacturing - Pitfalls & PossibilitiesHIV Vaccines Process Development & Manufacturing - Pitfalls & Possibilities
HIV Vaccines Process Development & Manufacturing - Pitfalls & Possibilities
 
Molecular quantitative genetics for plant breeding roundtable 2010x
Molecular quantitative genetics for plant breeding roundtable 2010xMolecular quantitative genetics for plant breeding roundtable 2010x
Molecular quantitative genetics for plant breeding roundtable 2010x
 
Target Inducing Local Lesions In Genome (Tilling)
Target Inducing Local Lesions In Genome (Tilling)Target Inducing Local Lesions In Genome (Tilling)
Target Inducing Local Lesions In Genome (Tilling)
 
Biofuels and Biomaterial Research in the Porter Alliance
Biofuels and Biomaterial Research in the Porter AllianceBiofuels and Biomaterial Research in the Porter Alliance
Biofuels and Biomaterial Research in the Porter Alliance
 
Genomic Cytometry: Using Multi-Omic Approaches to Increase Dimensionality in ...
Genomic Cytometry: Using Multi-Omic Approaches to Increase Dimensionality in ...Genomic Cytometry: Using Multi-Omic Approaches to Increase Dimensionality in ...
Genomic Cytometry: Using Multi-Omic Approaches to Increase Dimensionality in ...
 
Tilling @ sid
Tilling @ sidTilling @ sid
Tilling @ sid
 
Joe parker-benchmarking-bioinformatics
Joe parker-benchmarking-bioinformaticsJoe parker-benchmarking-bioinformatics
Joe parker-benchmarking-bioinformatics
 
Expanding Your Research Capabilities Using Targeted NGS
Expanding Your Research Capabilities Using Targeted NGSExpanding Your Research Capabilities Using Targeted NGS
Expanding Your Research Capabilities Using Targeted NGS
 
Bio-Nanotechnology
Bio-NanotechnologyBio-Nanotechnology
Bio-Nanotechnology
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Flow Cytometry Training : Introduction day 1 session 1
Flow Cytometry Training : Introduction day 1 session 1Flow Cytometry Training : Introduction day 1 session 1
Flow Cytometry Training : Introduction day 1 session 1
 
Aptamer as therapeutic
Aptamer as therapeuticAptamer as therapeutic
Aptamer as therapeutic
 
Learning Biologically Relevant Features Using Convolutional Neural Networks f...
Learning Biologically Relevant Features Using Convolutional Neural Networks f...Learning Biologically Relevant Features Using Convolutional Neural Networks f...
Learning Biologically Relevant Features Using Convolutional Neural Networks f...
 
GRM 2011: Improving cowpea productivity in Africa - J Ehlers
GRM 2011: Improving cowpea productivity in Africa - J EhlersGRM 2011: Improving cowpea productivity in Africa - J Ehlers
GRM 2011: Improving cowpea productivity in Africa - J Ehlers
 
How to Create CRISPR-Edited T Cells More Efficiently for Tomorrow's Cell Ther...
How to Create CRISPR-Edited T Cells More Efficiently for Tomorrow's Cell Ther...How to Create CRISPR-Edited T Cells More Efficiently for Tomorrow's Cell Ther...
How to Create CRISPR-Edited T Cells More Efficiently for Tomorrow's Cell Ther...
 
Sustaining and projecting genetic diversity: Potatoes adapted to changing needs
Sustaining and projecting genetic diversity: Potatoes adapted to changing needsSustaining and projecting genetic diversity: Potatoes adapted to changing needs
Sustaining and projecting genetic diversity: Potatoes adapted to changing needs
 
How we've made a global search engine for genetic data
How we've made a global search engine for genetic dataHow we've made a global search engine for genetic data
How we've made a global search engine for genetic data
 
4ISMET, Melanie PIERRA
4ISMET, Melanie PIERRA4ISMET, Melanie PIERRA
4ISMET, Melanie PIERRA
 
BioSB meeting 2015
BioSB meeting 2015BioSB meeting 2015
BioSB meeting 2015
 
I-ironic
I-ironicI-ironic
I-ironic
 

Último

User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Good agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptxGood agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptxSimeonChristian
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingNetHelix
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx023NiWayanAnggiSriWa
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayupadhyaymani499
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)itwameryclare
 

Último (20)

User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
Good agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptxGood agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptx
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyay
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)
 

GMOD 2014 MAKER Lecture

  • 1. MAKER The Genome Annotation Pipeline GMOD Summer Course May 19, 2014 Barry Moore/Carson Holt Yandell Lab University of Utah
  • 2. MAKER • The Annotation Problem • How MAKER Works • Why Choose MAKER • Working with MAKER
  • 3. What are Annotations? FunctionalStructural Function cAMP-dependent and sulfonylurea-sensitive anion transporter. Key gatekeeper influencing intracellular cholesterol transport. Subcellular location Membrane; Multi-pass membrane protein Ref.13 Ref.14. Domain Multifunctional polypeptide with two homologous halves, each containing a hydrophobic membrane-anchoring domain and an ATP binding cassette (ABC) domain.
  • 4. Genomes Online Database http://www.genomesonline.org/ 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 1998 2000 2002 2004 2006 2008 2010 2012 Genomes Year Genome Project Status Incomplete Complete
  • 7.
  • 8. Next Gen Genome Annotation 2013- 14 • Coelacanth • Pine • Sacred Lotus • Conus ballatus • Pigeon • King Cobra • Hymenopterids • Fusarium cirinatum • Cardiocondyla obscurior • Burmese Python • Sarcocystis neurona • Spotted Gar • Apple magot fly
  • 9. The ‘NextGen’ Genome Project Lab/Small Group Funding Short-read Genome Sequencing RNASeq Data Genome/Transcriptome Assembly Gene Annotation Genome Database / Blast Server Manual curation New assembly Reannotate/Merge annotations
  • 10. • The Annotation Problem • How MAKER Works • Why Choose MAKER • Working with MAKER MAKER
  • 11. The Source of Annotations RNA and Protein Evidence Accurate Gene Annotations Ab Initio Computational Evidence
  • 12. Annotating the Genome – Apollo View current evidence gene annotations genome assembly http://apollo.berkeleybop.org/
  • 13. Identify and mask repetitive elements current evidence genome assembly http://www.repeatmasker.org
  • 14. Generate ab initio gene predictions ab initio predictionsSNAP GeneMark Augustus current evidence genome assembly http://korflab.ucdavis.edu/
  • 15. Align RNA and protein evidence ab initio predictions protein - BLASTX EST - BLASTN altEST - TBLASTX current evidence genome assembly http://blast.ncbi.nlm.nih.gov
  • 16. Polish BLAST alignments with Exonerate ab initio predictions polished protein polished EST current evidence genome assembly http://www.ebi.ac.uk/~guy/exonerate/
  • 17. current evidence Pass gene-finders evidence-based ‘hints’ ab initio predictions Hint-based SNAP Hint-based Augustus genome assembly
  • 18. current evidence Identify gene model most consistent with evidence ab initio predictions* Hint-based SNAP Hint-based Augustus genome assembly
  • 19. current evidence Revise further if necessary; create new annotation ab initio predictions genome assembly
  • 20. Compute support for each portion of gene model Eilbeck et al BMC Bioinformatics 2009 genome assembly
  • 21. Compute support for each portion of gene model Cantarel BL et al., Genome Res 2008 genome assembly
  • 22.
  • 23.
  • 26. Paralellization Efficiency Holt C, Yandell M. BMC Bioinformatics. 2011 12:491. 30 GB Pine genome annotated in 37 hrs on 6,000 CPUs at the TACC
  • 27. • The Annotation Problem • How MAKER Works • Why Choose MAKER • Working with MAKER MAKER
  • 28. MAKER The Genome Annotation Pipeline GMOD Summer Course May 19, 2014 Barry Moore/Carson Holt Yandell Lab University of Utah
  • 29. MAKER2 Use Cases 1. De novo annotation providing quality metrics 2. Merging multiple annotation sets 3. Re-annotation with new evidence 4. Mapping annotations forward to a new assembly 5. Generating GMOD Compliant Output 1. Gbrowse/JBrowse 2. Apollo 3. Tripal
  • 30. Sensitivity, Specificity, Accuracy As a Measure of Annotation Quality Gold Standard Genes
  • 31. SN SP AC 1.0 1.0 100% Gold Standard Genes Perfect Accuracy Sensitivity, Specificity, Accuracy As a Measure of Annotation Quality
  • 32. SN SP AC 1.0 1.0 100% 1.0 0.5 80% Gold Standard Genes Perfect Accuracy Poor Specificity Sensitivity, Specificity, Accuracy As a Measure of Annotation Quality
  • 33. SN SP AC 1.0 1.0 100% 1.0 0.5 80% 0.5 1.0 80% Gold Standard Genes Perfect Accuracy Poor Specificity Poor Sensitivity Sensitivity, Specificity, Accuracy As a Measure of Annotation Quality
  • 34. SN SP AC 1.0 1.0 100% 1.0 0.5 80% 0.5 1.0 80% 0.5 0.5 50% Gold Standard Genes Perfect Accuracy Poor Specificity Poor Sensitivity Poor Specificity and Sensitivity Sensitivity, Specificity, Accuracy As a Measure of Annotation Quality Guigó R et al. Genome Biol. 2006
  • 35. MAKER vs. Predictors Holt C, Yandell M. BMC Bioinformatics. 2011
  • 36. MAKER vs. Predictors (the wrong HMM...) Holt C, Yandell M. BMC Bioinformatics. 2011 12:491.
  • 37. Annotation Edit Distance Gold Standard GenesGold Standard Evidence Protein Alignments EST Alignments mRNASeq Eilbeck et al BMC Bioinformatics 2009
  • 38. Annotation Edit Distance SN SP AED 1.0 1.0 0.0 1.0 0.5 0.2 0.5 1.0 0.2 0.5 0.5 0.5 Gold Standard Evidence Perfect Accuracy Poor Specificity Poor Sensitivity Poor Specificity and Sensitivity Eilbeck et al BMC Bioinformatics 2009
  • 39. AED as a Measure of Genome Wide Annotation Quality Eilbeck et al BMC Bioinformatics 2009
  • 40. TAIR Star Rating System http://www.arabidopsis.org/
  • 41. AED Agrees well with the TAIR star system Evidence: mRNA-seq (17 experiments), ESTs, full length cDNAs, Swiss-Prot (minus Arabidopsis) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.25 0.5 0.75 1 CumulativeFractionofAnnotations AED ***** (7,880) **** (12,654) *** (2,087) ** (2,188) * (1,788) (604)
  • 42. Holt C, Yandell M. BMC Bioinformatics. 2011 AED as a Measure of Annotation Quality
  • 43. MAKER Annotations Match the Evidence Well 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.25 0.5 0.75 1 CumulativeFractionofAnnotations AED TAIR10 rep transcripts (27,206) MAKER de novo (25,956) MAKER update of TAIR10 (26,885) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.25 0.5 0.75 1 CumulativeFractionofAnnotations AED chr10 rep transcripts (2,688) MAKER de novo (3,056) MAKER update of v3 (2,661) A. thaliana Z. mays Campbell et al, 2013 submitted
  • 44. Protein Domain Content As a Measure of Annotation Quality Holt C, Yandell M. BMC Bioinformatics. 2011
  • 45. MAKER vs. Predictors Holt C, Yandell M. BMC Bioinformatics. 2011
  • 46. • The Annotation Problem • How MAKER Works • Why Choose MAKER • Working with MAKER MAKER
  • 48.
  • 49. MAKER Installation • Automated query/answer based installation script. • Installs Perl prerequisites. • Installs necessary executables – RepeatMasker (RepBase) – BLAST+ – Exonerate – SNAP • Even installs MWAS and MPICH2
  • 50. MAKER Runtime Features • Fill out a config file with input data and parameters • Parallelize: – Running with MPI – Simply start multiple instances in the same directory. • Re-run MAKER in the same directory and it won't redo completed work. • Restart aborted jobs without losing any work.
  • 51. Accessory Scripts Over 30 accessory scripts: •cegma2zff •chado2gff3 •cufflinks2gff3 •gff3_2_gtf •gff3_preds2models •gff3_to_eval_gtf •maker2chado •maker2jbrowse •maker2zff •tophat2gff3 •compare •evaluator •gff3_merge •fasta_merge •fasta_tool •fix_fasta •genemark_gtf2gff3 •ipr_update_gff •iprscan2gff3iprscan_batch •iprscan_wrap •maker_functional •maker_functional_fasta •maker_functional_gff •maker_map_ids •map2assembly •map_data_ids •map_fasta_ids •map_gff_ids •split_fasta
  • 52. • The Annotation Problem • How MAKER Works • Why Choose MAKER • Working with MAKER MAKER
  • 53.
  • 54.
  • 55. Acknowledgements • Mark Yandell – Carson Holt – Mike Campbell – Daniel Ence – Steven Flygare – Zev Kronenberg – Qing Li – Marc Singleton – Bretty Kennedy – Brandi Cantarel – Hadi Islam – Shawn Reynearson – Nicole Ruiz – Keith Simmons – Bret Heale • Alejandro Alvarado – Eric Ross • Jason Stajich • Sophia Robb • Kevin Childs • Shin-Han Shui • Ning Jiang • Yanni Sun