SlideShare una empresa de Scribd logo
1 de 27
Descargar para leer sin conexión
Improvements in the Tomato Reference
Genome (SL3.0) and Annotation
(ITAG3.0)
Prashant S Hosmani, Surya Saha, Mirella Flores, Stephane
Rombauts, Florian Maumus, Henri van de Geest, Gabino Sanchez-
Perez and Lukas Mueller
Boyce Thompson Institute, Ithaca, NY
VIB Department of Plant Systems Biology, Ghent University, Gent, Belgium
URGI, INRA, Université Paris-Saclay, Versailles, France
Wageningen Plant Research, Wageningen University, Netherlands
psh65@cornell.edu
Acknowledgements
Gabino Sanchez
Henri van de Geest
SGN Community (You!)
RNAseq data contributors
Stephane Rombauts
Florian Maumus
SL3.0
Solanum lycopersicum
Heinz 1706
BAC Integration Workflow
Automatic
integration of BACs
Manual validation NCBI validation
https://github.com/solgenomics/Bio-GenomeUpdate
BAC
assemblies
Align to SL2.50
• 500bp BAC ends
• 100% identity
Place
BACs
1,069 full-length phase htgs3 BACs integrated and
~11Mb of contig gaps removed
BioNano Workflow
Assemble molecules
into CMaps
Hybrid assembly with
NGS scaffolds
Manual validation
Hybrid assembly statistics
Scaffolds: 57
Total Genome Map Length: 779.789 Mb
Avg. Genome Map Length: 13.681 Mb
Genome Map N50: 25.384 Mb
Chr00 Integration
Chr00
Chr02
Cmap 84
• Chr00 contig NW_004194391.1 (203,142bp) inserted in chr09 150kb scaffold gap
• Two Inversions on chromosome 12
• 19 gaps resized
Chr00 contig NW_004194387.1 (561,203bp) integrated in 1.4Mb scaffold gap
ITAG3.0
Annotation
Structural annotation pipeline
Repeat masking
genome
Evidence – RNA
and protein
ITAG 2.40 gene
models
Post-processing
• Genes with functional domain support
• Assign Solyc-ID to novel genes
Repeat identification and masking the
genome
• Generated custom repeat
libraryRepeatModeler
• Exclusion of repeats with
similarity with known proteins
(SwissProt)
ProtExcluder
• Masked 56.39% genomeRepeatMasker
Repeat identification and classification
Extensive identification and classification of repeats using
REPET, which masks 61% of the SL3.0 reference
genome.
Florian Maumus
ITAG 2.40 processing
• ITAG2.40 protein-coding
genes34,725
• Webapollo curated genes
• Removed contamination (56)
• Removed transposon (2,244)
32,425
• ITAG2.40 mapped - GMAP
• Mapped to SL3.0 repeat
masked genome
31,309
Expression evidence for annotation
Expression data evidence
• 8 billion RNAseq reads
• Tissue and treatment specific RNAseq
• 5’ and 3’ UTR enriched RNAseq
• RENseq for NBS-LRR genes
• Pacbio Iso-seq data
• SwissProt plant proteins
Mapped on to SL3.0 and transcriptome was assembled
Mapping rate ~85%
RNAseq data sources
• Jim Giovannoni (BTI/USDA)
• Jocelyn Rose (Cornell)
• Greg Martin (BTI)
• Zhangjun Fei (BTI/USDA)
• Jonathan Jones (The Sainsbury Laboratory)
• Asaph Aharoni (Weizmann Institute of Science)
• Neelima Sinha (University of California, Davis)
MAKER pipeline
Ab-initio gene prediction methods
• Augustus (Training using BRAKER1)
• SNAP (MAKER based training)
• GeneMark (with high quality genes)
• Eugene (Stephane Rombauts)
Updating legacy annotation (ITAG2.40)
Post-processing
Added genes only with functional domain support (Pfam) ~800 genes
Removed genes with 70% overlap with repeats (674 genes).
Assigned Solyc ID to novel genes with ITAG convention.
Novel genes are assigned Solyc ID between existing Solyc ID.
Improvements in ITAG 3.0 compared with
ITAG 2.40
ITAG 2.40 ITAG 3.0
# of genes 34,725 34,769
Avg. gene length 1,209 bp 1,529 bp
Exons per gene 4.61 5.10
5’ UTR per gene 0.39 0.63
3’ UTR per gene 0.44 0.62
Novel genes in ITAG3.0 – 5,822
Gene structure improvement example
ITAG3.0
ITAG2.40
ITAG3.0
ITAG2.40
Correct fusion example
UTR example
RNAseq
XY plot
RNAseq
XY plot
Quality check - Annotation Edit Distance
(AED)
AED= 0 complete support
AED =1 lack of support
AED
Functional annotation
Automated Assignment of Human Readable Descriptions
(AHRD)
Swissprot plant protein database
TrEMBL plant protein database
Araport 11 (Arabidopsis latest annotation)
User curated locus information from solgenomics.net (2000+)
Unknown proteins
In ITAG 3.0, 409 have a functional description of “Unknown
proteins” compared to 7,689 in ITAG2.40
Functional annotation
Automated Assignment of Human Readable Descriptions (AHRD)
AHRD-Version 3.3.2
Quality score (***)
Solyc08g081780.1.1 Dirigent protein (***)
Solyc01g008960.2.1 Argonaute family protein (***)
Solyc01g013880.1.1 Leucine-rich repeat receptor-like protein kinase family protein (*-*)
Position Criteria
1 Bit score of the blast result is >50 and e-value is <e-10
2 Alignment of the blast result is >60%
3 Human Readable Description score is >0.5
“AHRD’s quality-code consists of a three character string, where each
character is either ‘*’ if the respective criteria is met or ‘-’ otherwise.”
Novel genes in ITAG3.0
5,822 novel gens in ITAG 3.0
Future work
Genome
Improving genome assembly by sequencing with Pacbio
technology
Annotation
tRNA, non-coding RNA annotation
Multiple isoforms
Co-expression network based functional annotation
Workshop: SGN and RTB Databases
Tuesday, Jan 17 10:30 AM
Posters
Surya Saha: Improved Tomato Genome
Reference (SL3.0) using Full-Length BACs,
BioNano Optical Maps and SGN Community
Resources (P0798)
Prashant Hosmani: ITAG3.0 Annotation for the
New Tomato Reference Genome SL3.0 (P0797)
Thank you!!
Questions??
Data available to download from
FTP
• ITAG 3.0
• GFF, proteins, transcripts, CDS
• List of fused genes
SGN Workshop, SOL 2016
Gap Reduction
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
0
50
100
150
200
250
300
350
400
450
500
1 2 3 4 5 6 7 8 9 10 11 12
BACs Reduction in contig gaps
BACsIntegrated
Repeat classification
SGN Workshop, SOL 2016
LTR retrotransposon
Copia 64840935
Gypsy 260719161
TRIM/LARD 671571
Non-LTR retrotransposon LINE 9871924
Putative_retrotransposon Putative_RT 528982
DNA DNA 20712725
Helitron Helitron 1210271
TIR TIR 12144035
Confused Confused 48373586
Unclassified Unclassified 70850157
Hostgene Endogenous virus 5839457
Tandem repeats Hostgene 5044454
Tandem repeats 8901715
Ns SUM repeats 509708973
Mapping rates for different RNAseq data
RNAseq data # of reads in
Millions
REPET light RepeatModeler
light
AC_Jim 637 86.87% 88.03%
epigenome 82 60.77% 64.35%
UTR seq 87 85.88% 86.57%
TEA part A 4,295 84.41% 84.39%
TEA part B 2,449 84.40% 84.71%
RENseq 15 32.91% 39.83%
Yang 331 79.94% 80.28%
Total reads 7,930

Más contenido relacionado

La actualidad más candente

Genome editing as a tool for enhancing disease resistance in crops - Vladimir...
Genome editing as a tool for enhancing disease resistance in crops - Vladimir...Genome editing as a tool for enhancing disease resistance in crops - Vladimir...
Genome editing as a tool for enhancing disease resistance in crops - Vladimir...
OECD Environment
 

La actualidad más candente (20)

Delivering User Behavior Analytics at Apache Hadoop Scale : A new perspective...
Delivering User Behavior Analytics at Apache Hadoop Scale : A new perspective...Delivering User Behavior Analytics at Apache Hadoop Scale : A new perspective...
Delivering User Behavior Analytics at Apache Hadoop Scale : A new perspective...
 
Sensor-based phenotyping technology facilitates science and breeding
Sensor-based phenotyping technology facilitates science and breeding Sensor-based phenotyping technology facilitates science and breeding
Sensor-based phenotyping technology facilitates science and breeding
 
Drought monitoring & prediction in India_Vimal Mishra,IIT Gandhinagar_ 16 Oct...
Drought monitoring & prediction in India_Vimal Mishra,IIT Gandhinagar_ 16 Oct...Drought monitoring & prediction in India_Vimal Mishra,IIT Gandhinagar_ 16 Oct...
Drought monitoring & prediction in India_Vimal Mishra,IIT Gandhinagar_ 16 Oct...
 
Agriculture big data
Agriculture big dataAgriculture big data
Agriculture big data
 
SEED IMAGE ANALYSIS
SEED IMAGE ANALYSISSEED IMAGE ANALYSIS
SEED IMAGE ANALYSIS
 
Pre breeding and crop improvement using cwr and lr
Pre breeding and crop improvement using cwr and lrPre breeding and crop improvement using cwr and lr
Pre breeding and crop improvement using cwr and lr
 
M.S. Swaminathan presents: Achieving the Zero Hunger Challenge & the Role of ...
M.S. Swaminathan presents: Achieving the Zero Hunger Challenge & the Role of ...M.S. Swaminathan presents: Achieving the Zero Hunger Challenge & the Role of ...
M.S. Swaminathan presents: Achieving the Zero Hunger Challenge & the Role of ...
 
4 Ways Artificial Intelligence Can Help Save the Planet
4 Ways Artificial Intelligence Can Help Save the Planet4 Ways Artificial Intelligence Can Help Save the Planet
4 Ways Artificial Intelligence Can Help Save the Planet
 
Biofortification of vegetable crops
Biofortification of vegetable cropsBiofortification of vegetable crops
Biofortification of vegetable crops
 
APPLICATION OF ARTIFICIAL INTELLIGENCE TO TRACK PLANT DISEASES
APPLICATION OF ARTIFICIAL INTELLIGENCE TO TRACK PLANT DISEASESAPPLICATION OF ARTIFICIAL INTELLIGENCE TO TRACK PLANT DISEASES
APPLICATION OF ARTIFICIAL INTELLIGENCE TO TRACK PLANT DISEASES
 
Jauhar ali. vol 4. screening for abiotic and biotic stress tolerances
Jauhar ali. vol 4. screening for abiotic and biotic stress tolerancesJauhar ali. vol 4. screening for abiotic and biotic stress tolerances
Jauhar ali. vol 4. screening for abiotic and biotic stress tolerances
 
Genome editing as a tool for enhancing disease resistance in crops - Vladimir...
Genome editing as a tool for enhancing disease resistance in crops - Vladimir...Genome editing as a tool for enhancing disease resistance in crops - Vladimir...
Genome editing as a tool for enhancing disease resistance in crops - Vladimir...
 
Artificial intelligence : Basics and application in Agriculture
Artificial intelligence : Basics and application in AgricultureArtificial intelligence : Basics and application in Agriculture
Artificial intelligence : Basics and application in Agriculture
 
Next Generation Phenotyping Technologies in Breeding for Abiotic Stress Toler...
Next Generation Phenotyping Technologies in Breeding for Abiotic Stress Toler...Next Generation Phenotyping Technologies in Breeding for Abiotic Stress Toler...
Next Generation Phenotyping Technologies in Breeding for Abiotic Stress Toler...
 
Plant Phenotyping, a new scientific discipline to quantify plant traits
Plant Phenotyping, a new scientific discipline to quantify plant traitsPlant Phenotyping, a new scientific discipline to quantify plant traits
Plant Phenotyping, a new scientific discipline to quantify plant traits
 
High-throughput field-based phenotyping in maize
High-throughput field-based phenotyping in maizeHigh-throughput field-based phenotyping in maize
High-throughput field-based phenotyping in maize
 
GENOMICS OF STAY GREEN TRAITS AND THEIR UTILITY IN CROP IMPROVEMENT
GENOMICS OF STAY GREEN TRAITS AND THEIR UTILITY IN CROP IMPROVEMENTGENOMICS OF STAY GREEN TRAITS AND THEIR UTILITY IN CROP IMPROVEMENT
GENOMICS OF STAY GREEN TRAITS AND THEIR UTILITY IN CROP IMPROVEMENT
 
Advances in hyb seed prod.
Advances in hyb seed prod.Advances in hyb seed prod.
Advances in hyb seed prod.
 
The development of two sweet corn populations resistance to northern corn lea...
The development of two sweet corn populations resistance to northern corn lea...The development of two sweet corn populations resistance to northern corn lea...
The development of two sweet corn populations resistance to northern corn lea...
 
Fine QTL Mapping- A step towards Marker Assisted Selection (II)
Fine QTL Mapping- A step towards Marker Assisted Selection  (II)Fine QTL Mapping- A step towards Marker Assisted Selection  (II)
Fine QTL Mapping- A step towards Marker Assisted Selection (II)
 

Similar a Improvements in the Tomato Reference Genome (SL3.0) and Annotation (ITAG3.0)

NUGEN-X-Gen_2011_poster_trancriptome_sequencing_RNA-Seq
NUGEN-X-Gen_2011_poster_trancriptome_sequencing_RNA-SeqNUGEN-X-Gen_2011_poster_trancriptome_sequencing_RNA-Seq
NUGEN-X-Gen_2011_poster_trancriptome_sequencing_RNA-Seq
Himanshu Sethi
 
Next generation seqencing tecnologies and application vegetable crops
Next generation seqencing tecnologies and application vegetable cropsNext generation seqencing tecnologies and application vegetable crops
Next generation seqencing tecnologies and application vegetable crops
Pulipati Gangadhara Rao
 
140127 abrf interlaboratory study proposal
140127 abrf interlaboratory study proposal140127 abrf interlaboratory study proposal
140127 abrf interlaboratory study proposal
GenomeInABottle
 
High throughput approaches to understanding gene function and mapping archite...
High throughput approaches to understanding gene function and mapping archite...High throughput approaches to understanding gene function and mapping archite...
High throughput approaches to understanding gene function and mapping archite...
Tintumann
 

Similar a Improvements in the Tomato Reference Genome (SL3.0) and Annotation (ITAG3.0) (20)

NUGEN-X-Gen_2011_poster_trancriptome_sequencing_RNA-Seq
NUGEN-X-Gen_2011_poster_trancriptome_sequencing_RNA-SeqNUGEN-X-Gen_2011_poster_trancriptome_sequencing_RNA-Seq
NUGEN-X-Gen_2011_poster_trancriptome_sequencing_RNA-Seq
 
Functional annotation of invertebrate genomes
Functional annotation of invertebrate genomesFunctional annotation of invertebrate genomes
Functional annotation of invertebrate genomes
 
RNA Seq Data Analysis
RNA Seq Data AnalysisRNA Seq Data Analysis
RNA Seq Data Analysis
 
20081216 06陳倩琪 紅麴菌基因體之定序與分析
20081216 06陳倩琪 紅麴菌基因體之定序與分析20081216 06陳倩琪 紅麴菌基因體之定序與分析
20081216 06陳倩琪 紅麴菌基因體之定序與分析
 
Bio305 genome analysis and annotation 2012
Bio305 genome analysis and annotation 2012Bio305 genome analysis and annotation 2012
Bio305 genome analysis and annotation 2012
 
A next generation sequencing based sample-to-result pharmacogenomics research...
A next generation sequencing based sample-to-result pharmacogenomics research...A next generation sequencing based sample-to-result pharmacogenomics research...
A next generation sequencing based sample-to-result pharmacogenomics research...
 
BioSmalltalk
BioSmalltalkBioSmalltalk
BioSmalltalk
 
2015 pag-chicken
2015 pag-chicken2015 pag-chicken
2015 pag-chicken
 
2 md2016 annotation
2 md2016 annotation2 md2016 annotation
2 md2016 annotation
 
Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.
Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.
Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.
 
Assembly and gene_prediction
Assembly and gene_predictionAssembly and gene_prediction
Assembly and gene_prediction
 
SNp mining in crops
SNp mining in cropsSNp mining in crops
SNp mining in crops
 
Assign 2.0 software for the analysis of Phred quality values for quality con...
Assign 2.0  software for the analysis of Phred quality values for quality con...Assign 2.0  software for the analysis of Phred quality values for quality con...
Assign 2.0 software for the analysis of Phred quality values for quality con...
 
Genotyping by Sequencing
Genotyping by SequencingGenotyping by Sequencing
Genotyping by Sequencing
 
Next generation seqencing tecnologies and application vegetable crops
Next generation seqencing tecnologies and application vegetable cropsNext generation seqencing tecnologies and application vegetable crops
Next generation seqencing tecnologies and application vegetable crops
 
Dna microarrays
Dna microarraysDna microarrays
Dna microarrays
 
140127 abrf interlaboratory study proposal
140127 abrf interlaboratory study proposal140127 abrf interlaboratory study proposal
140127 abrf interlaboratory study proposal
 
NCBI
NCBINCBI
NCBI
 
Chigot poster2007
Chigot poster2007Chigot poster2007
Chigot poster2007
 
High throughput approaches to understanding gene function and mapping archite...
High throughput approaches to understanding gene function and mapping archite...High throughput approaches to understanding gene function and mapping archite...
High throughput approaches to understanding gene function and mapping archite...
 

Más de solgenomics

Más de solgenomics (20)

Cassavabase-PhenoApps demo ISTRC 2018
Cassavabase-PhenoApps demo ISTRC 2018Cassavabase-PhenoApps demo ISTRC 2018
Cassavabase-PhenoApps demo ISTRC 2018
 
Cassavabase-PhenoApp sample tracking
Cassavabase-PhenoApp sample trackingCassavabase-PhenoApp sample tracking
Cassavabase-PhenoApp sample tracking
 
breeding informatics solutions at SGN
breeding informatics solutions at SGNbreeding informatics solutions at SGN
breeding informatics solutions at SGN
 
Musabase PAG 2018
Musabase PAG 2018Musabase PAG 2018
Musabase PAG 2018
 
Cassavabase workshop ibadan March17
Cassavabase workshop ibadan March17Cassavabase workshop ibadan March17
Cassavabase workshop ibadan March17
 
SolGS Hyderabad conference 2016
SolGS Hyderabad conference 2016SolGS Hyderabad conference 2016
SolGS Hyderabad conference 2016
 
Musa base phenotyping workflow demo
Musa base phenotyping workflow demoMusa base phenotyping workflow demo
Musa base phenotyping workflow demo
 
SolGS workshop 2016
SolGS workshop 2016SolGS workshop 2016
SolGS workshop 2016
 
Cassavabase workshop IITA oct2016
Cassavabase workshop IITA oct2016Cassavabase workshop IITA oct2016
Cassavabase workshop IITA oct2016
 
Sql cheat sheet
Sql cheat sheetSql cheat sheet
Sql cheat sheet
 
Introduction to SQL
Introduction to SQLIntroduction to SQL
Introduction to SQL
 
YamBase phenotyping workflow demo
YamBase phenotyping workflow demoYamBase phenotyping workflow demo
YamBase phenotyping workflow demo
 
Introduction to YamBase
Introduction to YamBaseIntroduction to YamBase
Introduction to YamBase
 
Cassavabase general presentation PAG 2016
Cassavabase general presentation PAG 2016Cassavabase general presentation PAG 2016
Cassavabase general presentation PAG 2016
 
Cassavabase SolGS presentation PAG 2016
Cassavabase SolGS presentation PAG 2016Cassavabase SolGS presentation PAG 2016
Cassavabase SolGS presentation PAG 2016
 
Cassavabase SolGS poster PAG 2016
Cassavabase SolGS poster PAG 2016Cassavabase SolGS poster PAG 2016
Cassavabase SolGS poster PAG 2016
 
1 introduction to cassavabase
1  introduction to cassavabase 1  introduction to cassavabase
1 introduction to cassavabase
 
2 Cassavabase workshop: search menu
2  Cassavabase workshop: search menu2  Cassavabase workshop: search menu
2 Cassavabase workshop: search menu
 
3a Cassavabase worksop: manage breeding-program ands locations
3a  Cassavabase worksop: manage breeding-program ands locations3a  Cassavabase worksop: manage breeding-program ands locations
3a Cassavabase worksop: manage breeding-program ands locations
 
3b Cassavabase workshop: manage accessions
3b  Cassavabase workshop: manage accessions3b  Cassavabase workshop: manage accessions
3b Cassavabase workshop: manage accessions
 

Último

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
AnaAcapella
 

Último (20)

SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 

Improvements in the Tomato Reference Genome (SL3.0) and Annotation (ITAG3.0)

  • 1. Improvements in the Tomato Reference Genome (SL3.0) and Annotation (ITAG3.0) Prashant S Hosmani, Surya Saha, Mirella Flores, Stephane Rombauts, Florian Maumus, Henri van de Geest, Gabino Sanchez- Perez and Lukas Mueller Boyce Thompson Institute, Ithaca, NY VIB Department of Plant Systems Biology, Ghent University, Gent, Belgium URGI, INRA, Université Paris-Saclay, Versailles, France Wageningen Plant Research, Wageningen University, Netherlands psh65@cornell.edu
  • 2. Acknowledgements Gabino Sanchez Henri van de Geest SGN Community (You!) RNAseq data contributors Stephane Rombauts Florian Maumus
  • 4. BAC Integration Workflow Automatic integration of BACs Manual validation NCBI validation https://github.com/solgenomics/Bio-GenomeUpdate BAC assemblies Align to SL2.50 • 500bp BAC ends • 100% identity Place BACs 1,069 full-length phase htgs3 BACs integrated and ~11Mb of contig gaps removed
  • 5. BioNano Workflow Assemble molecules into CMaps Hybrid assembly with NGS scaffolds Manual validation Hybrid assembly statistics Scaffolds: 57 Total Genome Map Length: 779.789 Mb Avg. Genome Map Length: 13.681 Mb Genome Map N50: 25.384 Mb
  • 6. Chr00 Integration Chr00 Chr02 Cmap 84 • Chr00 contig NW_004194391.1 (203,142bp) inserted in chr09 150kb scaffold gap • Two Inversions on chromosome 12 • 19 gaps resized Chr00 contig NW_004194387.1 (561,203bp) integrated in 1.4Mb scaffold gap
  • 8. Structural annotation pipeline Repeat masking genome Evidence – RNA and protein ITAG 2.40 gene models Post-processing • Genes with functional domain support • Assign Solyc-ID to novel genes
  • 9. Repeat identification and masking the genome • Generated custom repeat libraryRepeatModeler • Exclusion of repeats with similarity with known proteins (SwissProt) ProtExcluder • Masked 56.39% genomeRepeatMasker
  • 10. Repeat identification and classification Extensive identification and classification of repeats using REPET, which masks 61% of the SL3.0 reference genome. Florian Maumus
  • 11. ITAG 2.40 processing • ITAG2.40 protein-coding genes34,725 • Webapollo curated genes • Removed contamination (56) • Removed transposon (2,244) 32,425 • ITAG2.40 mapped - GMAP • Mapped to SL3.0 repeat masked genome 31,309
  • 12. Expression evidence for annotation Expression data evidence • 8 billion RNAseq reads • Tissue and treatment specific RNAseq • 5’ and 3’ UTR enriched RNAseq • RENseq for NBS-LRR genes • Pacbio Iso-seq data • SwissProt plant proteins Mapped on to SL3.0 and transcriptome was assembled Mapping rate ~85%
  • 13. RNAseq data sources • Jim Giovannoni (BTI/USDA) • Jocelyn Rose (Cornell) • Greg Martin (BTI) • Zhangjun Fei (BTI/USDA) • Jonathan Jones (The Sainsbury Laboratory) • Asaph Aharoni (Weizmann Institute of Science) • Neelima Sinha (University of California, Davis)
  • 14. MAKER pipeline Ab-initio gene prediction methods • Augustus (Training using BRAKER1) • SNAP (MAKER based training) • GeneMark (with high quality genes) • Eugene (Stephane Rombauts) Updating legacy annotation (ITAG2.40) Post-processing Added genes only with functional domain support (Pfam) ~800 genes Removed genes with 70% overlap with repeats (674 genes). Assigned Solyc ID to novel genes with ITAG convention. Novel genes are assigned Solyc ID between existing Solyc ID.
  • 15. Improvements in ITAG 3.0 compared with ITAG 2.40 ITAG 2.40 ITAG 3.0 # of genes 34,725 34,769 Avg. gene length 1,209 bp 1,529 bp Exons per gene 4.61 5.10 5’ UTR per gene 0.39 0.63 3’ UTR per gene 0.44 0.62 Novel genes in ITAG3.0 – 5,822
  • 16. Gene structure improvement example ITAG3.0 ITAG2.40 ITAG3.0 ITAG2.40 Correct fusion example UTR example RNAseq XY plot RNAseq XY plot
  • 17. Quality check - Annotation Edit Distance (AED) AED= 0 complete support AED =1 lack of support AED
  • 18. Functional annotation Automated Assignment of Human Readable Descriptions (AHRD) Swissprot plant protein database TrEMBL plant protein database Araport 11 (Arabidopsis latest annotation) User curated locus information from solgenomics.net (2000+) Unknown proteins In ITAG 3.0, 409 have a functional description of “Unknown proteins” compared to 7,689 in ITAG2.40
  • 19. Functional annotation Automated Assignment of Human Readable Descriptions (AHRD) AHRD-Version 3.3.2 Quality score (***) Solyc08g081780.1.1 Dirigent protein (***) Solyc01g008960.2.1 Argonaute family protein (***) Solyc01g013880.1.1 Leucine-rich repeat receptor-like protein kinase family protein (*-*) Position Criteria 1 Bit score of the blast result is >50 and e-value is <e-10 2 Alignment of the blast result is >60% 3 Human Readable Description score is >0.5 “AHRD’s quality-code consists of a three character string, where each character is either ‘*’ if the respective criteria is met or ‘-’ otherwise.”
  • 20. Novel genes in ITAG3.0 5,822 novel gens in ITAG 3.0
  • 21. Future work Genome Improving genome assembly by sequencing with Pacbio technology Annotation tRNA, non-coding RNA annotation Multiple isoforms Co-expression network based functional annotation
  • 22. Workshop: SGN and RTB Databases Tuesday, Jan 17 10:30 AM Posters Surya Saha: Improved Tomato Genome Reference (SL3.0) using Full-Length BACs, BioNano Optical Maps and SGN Community Resources (P0798) Prashant Hosmani: ITAG3.0 Annotation for the New Tomato Reference Genome SL3.0 (P0797)
  • 24. Data available to download from FTP • ITAG 3.0 • GFF, proteins, transcripts, CDS • List of fused genes SGN Workshop, SOL 2016
  • 26. Repeat classification SGN Workshop, SOL 2016 LTR retrotransposon Copia 64840935 Gypsy 260719161 TRIM/LARD 671571 Non-LTR retrotransposon LINE 9871924 Putative_retrotransposon Putative_RT 528982 DNA DNA 20712725 Helitron Helitron 1210271 TIR TIR 12144035 Confused Confused 48373586 Unclassified Unclassified 70850157 Hostgene Endogenous virus 5839457 Tandem repeats Hostgene 5044454 Tandem repeats 8901715 Ns SUM repeats 509708973
  • 27. Mapping rates for different RNAseq data RNAseq data # of reads in Millions REPET light RepeatModeler light AC_Jim 637 86.87% 88.03% epigenome 82 60.77% 64.35% UTR seq 87 85.88% 86.57% TEA part A 4,295 84.41% 84.39% TEA part B 2,449 84.40% 84.71% RENseq 15 32.91% 39.83% Yang 331 79.94% 80.28% Total reads 7,930