SlideShare una empresa de Scribd logo
1 de 52
Descargar para leer sin conexión
Whole genome restriction maps for
nonmodel organisms: genomic
resources where there were none.
Sue Brown
Division of Biology
Kansas State University

Tuesday, February 25, 14
Outline
• de novo genome assembly and i5K
• Improving assemblies with Bionano genome
maps
▫ Irys system
▫ File formats
▫ Assembly pipeline
▫ Alignment filtering

• Results

Tuesday, February 25, 14
Genomes
• Genomes come in many sizes
• Genome assemblies come in many qualities
• Draft Assemblies
▫ Most genomes sequenced today (nonmodel)

• Finished Assemblies

▫ Model organisms (lots of resources)
 Human
 Computational
 Genetic and genomic tools

• Genomic resources increase the value of the
genome sequence
▫ Reverse genetic approaches

Tuesday, February 25, 14
Many initiatives to sequence genomes
• 1,000 human genomes
▫ To provide a deep catalog of human genetic
variation

• Genome 10K -started as an intiative to
sequence 10,000 vertebrate genomes. Database
currently catalogs specimens from over 16,000
organisms
▫ To understand how complex animal life evolved
through changes in DNA and use this knowledge to
become better stewards of the planet
Tuesday, February 25, 14
Letter to Science Announces i5k in 2011

Tuesday, February 25, 14
Why sequence 5,000 insect genomes?
• 53% of all living species
• Maintenance and productivity of natural and agricultural
ecosystems
• Consume or damage 25% of all agricultural, forestry and
livestock production
▫ >$30 Billion in annual loss

• Vector plant, animal and human disease
▫ >$50 Billion cost world wide

• Just as human and veterinary medicine now rely on
personal or animal genome info, revealing info stored in
their genomes will transform our ability to manage
insects that threaten our health, food supply and
economic security
• Improve our lives
Tuesday, February 25, 14
Standard Draft Genome Assemblies
• Highly fragmented, even at deep coverage
• Scaffolds terminate in repetitive regions
• Relatively low N50 values
• Example:
• 7x Sanger-based Tribolium castaneum
genome assembly

Tuesday, February 25, 14
Tribolium castaneum genomics
• Cot analysis
▫ Genome ~200Mb
▫ Long stretches of unique sequence
▫ Low methylation

• 9 autosomes, X and Y

Jeff Stuart, Purdue

Tuesday, February 25, 14
Standard Draft


Minimally or unfiltered data, from any number of
different sequencing platforms, that are assembled
into contiguous strings of bases (AGTC), with no gaps
(contigs).

This is the minimum
standard for submission
to public databases.
Science Oct 9, 2009 pp236-237

http://compbio.pbworks.com
Tuesday, February 25, 14
Molecular linkage map used to anchor scaffolds
in chromosome builds (ChLG)

Low X coverage, no Y, marker density varies
Tuesday, February 25, 14
Molecular linkage map used to anchor scaffolds
in chromosome builds (ChLG)

Low X coverage, no Y, marker density varies
Tuesday, February 25, 14
T. castaneum assembly stats
•
•
•
•
•
•
•

Number of contigs! !
!
8,814
Contig N50! !
!
43,511
Number of scaffolds!!
! 481
Scaffold N50! !
!
975,455
Total number of chromosomes!
10 (-Y)
Unmapped scaffolds!!
!
352
Single contig scaffolds
1835

• (481 + 1830 = 2321 scaffolds total)
Tuesday, February 25, 14
Scaffold structure of the Tribolium
genome assembly
ChLG
NW

NW

NW

AAJJ

300K Ns
Unanchored
DS
AAJJ

DS
AAJJ

DS
AAJJ

DS
AAJJ
Tuesday, February 25, 14

300K Ns
Outline
• de novo genome assembly and i5K
• Improving assemblies with Bionano genome
maps
▫ Irys system
▫ File formats
▫ Assembly pipeline
▫ Alignment filtering

• Results

Tuesday, February 25, 14
genetic recombination map to the assembly scaffolds,
anchoring greater than 90% of the assembled
sequence1 (fig1). 
To improve this draft assembly, we constructed
physical maps of the T. castaneum genome. Using the

orientation of scaffolds have been corrected, and
scaffolds have been extended by spanning repetitive
regions.  
Nature 2008 452:949-55.

Genome assembly improvements
Figure 2 Genome refinements
T. castaneum 3.0
Baylor Sanger 7x draft assembly
and molecular genetic map

T. castaneum 4.0

length (Mb): 160.466
scaffolds:
2321
scaffold N50 (Mb): 0.98

multicontig
scaffolds
481

Illumina long distance jumplibraries extended scaffolds into
gaps and capturing gaps with
Atlas gap-link and gap-filler.

length (Mb): 160.862
scaffolds:
2219
scaffold N50 (Mb): 1.16

411

T. castaneum 4.0
and gam-ngs

Gam-ngs merged Illumina
assembly and T.cas 4.0
extending several unknowns and
an LGX scaffold.

length (Mb): 160.864
scaffolds:
2219
scaffold N50 (Mb): 1.16

411

T. castaneum 4.0
and gam-ngs plus
BioNano maps

Sequence scaffolds were aligned
to maps with IrysView the
alignment was filtered and used
to create new scaffolds.

Figure Assembly
length (Mb): 189.629
scaffolds:
2153
scaffold N50 (Mb): 3.31

341

An independent platform to validate and improve genomes
Figure Mis-assemblies
Tuesday, February 25, 14

ChLG3

Validate and expa
Three scaffolds fr
scaffolded with ca
How to validate a de novo assembly?
• Describe assembly
 # contigs, # scaffolds, total bases, N50 lengths
 coverage, # ESTs, # orthologs found

• But is the assembly accurate?
▫ Compare to BAC sequences
▫ If you have the resources

• Need independent (reasonably priced) method

Tuesday, February 25, 14
Genome maps based on
landmarks
• BioNanos Genomics
▫ San Diego, California

• Imaging ultra-long molecules of DNA
• Labeled at restriction sites

Tuesday, February 25, 14
Outline
• de novo genome assembly and i5K
• Improving assemblies with Bionano genome
maps
▫ Irys system
▫ File formats
▫ Assembly pipeline
▫ Alignment filtering

• Results

Tuesday, February 25, 14
Introducing the irys system

Tuesday, February 25, 14
Labeling schema
BspQ1 nicks at GCTCTTCN
CGAGAAGN

10 sites /100 Kb

Tuesday, February 25, 14
Chip Design

Tuesday, February 25, 14
Samples loaded into 2 flow cells per chip

3 lasers 3 detection channels
Detect yoyo 1 in DNA backbone
Fluorescent nucleotides at labeled sites
Tuesday, February 25, 14
DNA molecules entering channels

Tuesday, February 25, 14
DNA molecules entering channels

Tuesday, February 25, 14
A long repeat in the Tribolium genome

Tuesday, February 25, 14
24

Mapping individual images back to map
• hthe

Regions flanking repeat are unique
Some sites are polymorphic

Tuesday, February 25, 14
Limitations of the Irys system
• Sample prep is very specific
• Requires gram amounts of starting material
• Bacterial cells, tissue culture cells, eukaryotic
nuclei
• Less complex tissue is best
▫ Blood
▫ Embryos

• Not applicable to transcriptomics projects
• contig N50 >30Kb (5 restriction sites)
Tuesday, February 25, 14
Outline
• de novo genome assembly and i5K
• Improving assemblies with Bionano genome
maps
▫ Irys system
▫ File formats
▫ Assembly pipeline
▫ Alignment filtering

• Results

Tuesday, February 25, 14
Assembly images into genomic maps
.tiff
.bnx

.cmap
Tuesday, February 25, 14
28

Align BNG maps to in silico maps
(.xmap)

Tuesday, February 25, 14
29

File formats are similar to
generating sequence data...
basecall

Image files

de novo assemble

fastq

fasta

@SRR014849.2 EIXKN4201AKDUH/2
TCAAGTGGTGAACGGCAGAAA
+
<=B:==B:=<?6=B;<;=B=)

Image files

call labels

Tuesday, February 25, 14

21!
202146.4
1096.2!
8973.8
10.0565!
11.7966
0.0187!
0.0604

sam
HWI-ST330_C0NEHACXX:
2:1101:17113:52802#0
!
69!
contig1
!
2578! 0!
*
!
=!
2578! 0
!
ATTACGGCCCATGGTTCAGAATAATGACGAA
TAGAAATACTAGTACTATATCCCCTAAAAAA!
<@CFFFFFHHGFHJHIJJJJJJJJJFJJJFG
FHEHIHGHJGIJHIIIJJJJJJJJIJIIJIH!
YT:Z:UP

>conitg1
TCAAGTGGTGAACGGCAGAAA

de novo assemble

bnx
0!
1!
QX11!
QX12!

align

align

cmap
#h CMapId
ContigLength
NumSites
SiteID
LabelChannel
Position
StdDev
Coverage
Occurrence
#f int float
int
int
int
float
float
int
int
393
225073.2
21
1
1
20.0 0.0
3
3

xmap

#h XmapEntryID!QryContigID
!
RefcontigID!
QryStartPos
!
QryEndPos!
RefStartPos
!
RefEndPos!
Orientation
!
Confidence!
HitEnum
#f int
!int
!
int
!
float
!
float
!
float
!
float
!
string
!
float
!
string
1!
94!
1!
444392.7
!
5839.8! 57024.0
!
550038.8!
-!
28.87
!
1M1D2M3I4D1M3I2M1I7M1I1M1I9M1I1M1I2M1I3M1D
2M
Visualizing an xmap
contig id
sequence-based
scaffold
label alignment
BioNano contig map
coverage

Tuesday, February 25, 14
Outline
• de novo genome assembly and i5K
• Improving assemblies with Bionano genome
maps
▫ Irys system
▫ File formats
▫ Assembly pipeline
▫ Alignment filtering

• Results

Tuesday, February 25, 14
Outline
• de novo genome assembly and i5K
• Improving assemblies with Bionano genome
maps
▫ Irys system
▫ File formats
▫ Assembly pipeline
▫ Alignment filtering

• Results

Tuesday, February 25, 14
K-INBRE i5K Github scripts:
Irys Scaffolding scripts and manuals written by Jennifer Shelton and Nic Herndon
Assembly workflow was developed by Ernest Lam (BioNano)
git pull https://github.com/i5K-KINBRE-script-share/Irysscaffolding

Tuesday, February 25, 14
34

Assembly pipeline
developed with Ernest Lam (BioNano)

scripts available at: i5k-KINBRE script share at GitHub: Irys-scaffolding
https://github.com/i5K-KINBRE-script-share/Irys-scaffolding
Tuesday, February 25, 14
Filtering alignments
Label density varies throughout the
genome so we created scripts to filter in
two passes:
Pass 1: looks for high confidence
score over at least ~30% of the total
possible alignment
Pass 2: looks for low confidence
score over the majority of the total
possible alignment (~90%)
Pass 1 finds most high quality
alignments.
Pass 2 finds high-quality low-density
alignments.

Tuesday, February 25, 14
Filtering alignments
Super-scaffolded scaffolds are joined
in a new reference fasta file.
Overlaping scaffolds have a 30bp
spacing gap between them
If a scaffold aligns more than once
only the longest alignment is used
If two alignments have the same
length only the highest confidence
alignment is used

Tuesday, February 25, 14
Outline
• de novo genome assembly and i5K
• Improving assemblies with Bionano genome
maps
▫ Irys system
▫ File formats
▫ Assembly pipeline
▫ Alignment filtering

• Results

Tuesday, February 25, 14
38

BNG restriction maps for
Tcastaneum
• Dual nicked Bsp.QI and BbvCI
• 28.6Gb = ~143x coverage of 200Mb Tribolium genome
(>150 Kb)

• N contigs: 216
• Total Contig Len (Mb):   200.473
• Avg. Contig Len  (Mb):     0.928
• Contig N50       (Mb):    1.350
• Total Ref Len    (Mb):   157.186
• Total Contig Len / Ref Len  : 1.275
Tuesday, February 25, 14
ChLG X
ChLGX had 13 scaffolds. Alignment to BioNano maps
captured gaps and validated order for 11 of 13 scaffolds,
incorporated 2 unplaced scaffolds and identified a
potential misplaced scaffold (scaffold 2 aligns with another
linkage group).

Tuesday, February 25, 14
ChLG X
ChLGX had 13 scaffolds. Alignment to BioNano maps
captured gaps and validated order for 11 of 13 scaffolds,
incorporated 2 unplaced scaffolds and identified a
potential misplaced scaffold (scaffold 2 aligns with another
linkage group).

Tuesday, February 25, 14
ChLG X
ChLGX had 13 scaffolds. Alignment to BioNano maps
captured gaps and validated order for 11 of 13 scaffolds,
incorporated 2 unplaced scaffolds and identified a
potential misplaced scaffold (scaffold 2 aligns with another
linkage group).

Tuesday, February 25, 14
ChLG 7
Alignment to BioNano maps captured gaps and validated
order for 13 of 15 scaffolds. Scaffold 14 needs to be
reversed in the super-scaffold.

Tuesday, February 25, 14
ChLG 7
Alignment to BioNano maps captured gaps and validated
order for 13 of 15 scaffolds. Scaffold 14 needs to be
reversed in the super-scaffold.

Tuesday, February 25, 14
Additional chromosome linkage groups.

Tuesday, February 25, 14
Additional chromosome linkage groups.
ChLG 3

Tuesday, February 25, 14
Additional chromosome linkage groups.
ChLG 3

ChLG 9

Tuesday, February 25, 14
Additional chromosome linkage groups.
ChLG 3

ChLG 9

ChLG 2

Tuesday, February 25, 14
42

what does it cost?
• 100-500Mb genome <$5,000
▫ 70-100x coverage

• 1Gb genome <$8,000
▫ 70-100x coverage

• completely dependent on homogeneity of starting
material
• assembly and analysis software is included in
price

Tuesday, February 25, 14
43

Summary
•
•
•
•
•
•
•
•
•

Standard Draft Genomes are highly fragmented
BNG provides independent platform
Whole genome restriction maps
Validate assembly
Extend scaffolds/Size Gaps
Identify structural variants
Identify haplotypes
Comprehensive view of repetitive DNA (HORs)
A validated genome assembly improves
downstream analyses

Tuesday, February 25, 14
Thanks to:
• Michelle Gordon
▫ Research Assistant: optimizing sample preps

• Jennifer Shelton
▫ Biologist turned Bioinformaticist

• Nic Herndon
▫ Computer scientist turned Bioinformaticist

• BioNano Genomics
▫ Ernest Lam
▫ Weiping Wang

Tuesday, February 25, 14

Más contenido relacionado

La actualidad más candente

Evaluation of Pool-Seq as a cost-effective alternative to GWAS
Evaluation of Pool-Seq as a cost-effective alternative to GWASEvaluation of Pool-Seq as a cost-effective alternative to GWAS
Evaluation of Pool-Seq as a cost-effective alternative to GWASAmin Mohamed
 
Toward A Better Understanding Of Plant Genome Structure: Combining NGS, Optic...
Toward A Better Understanding Of Plant Genome Structure: Combining NGS, Optic...Toward A Better Understanding Of Plant Genome Structure: Combining NGS, Optic...
Toward A Better Understanding Of Plant Genome Structure: Combining NGS, Optic...Fabio Caligaris
 
Cross-Kingdom Standards in Genomics, Epigenomics and Metagenomics
Cross-Kingdom Standards in Genomics, Epigenomics and MetagenomicsCross-Kingdom Standards in Genomics, Epigenomics and Metagenomics
Cross-Kingdom Standards in Genomics, Epigenomics and Metagenomics Christopher Mason
 
Sequencing 2016
Sequencing 2016Sequencing 2016
Sequencing 2016Surya Saha
 
Phylogenomic methods for comparative evolutionary biology - University Colleg...
Phylogenomic methods for comparative evolutionary biology - University Colleg...Phylogenomic methods for comparative evolutionary biology - University Colleg...
Phylogenomic methods for comparative evolutionary biology - University Colleg...Joe Parker
 
Genetic Analysis Solutions for Plant Sciences
Genetic Analysis Solutions for Plant SciencesGenetic Analysis Solutions for Plant Sciences
Genetic Analysis Solutions for Plant SciencesThermo Fisher Scientific
 
2013 stamps-intro-assembly
2013 stamps-intro-assembly2013 stamps-intro-assembly
2013 stamps-intro-assemblyc.titus.brown
 
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...VHIR Vall d’Hebron Institut de Recerca
 
How to Standardise and Assemble Raw Data into Sequences: What Does it Mean fo...
How to Standardise and Assemble Raw Data into Sequences: What Does it Mean fo...How to Standardise and Assemble Raw Data into Sequences: What Does it Mean fo...
How to Standardise and Assemble Raw Data into Sequences: What Does it Mean fo...Joseph Hughes
 
Techniques of-biotechnology-mcclean-good
Techniques of-biotechnology-mcclean-goodTechniques of-biotechnology-mcclean-good
Techniques of-biotechnology-mcclean-goodana_isa_barbosa
 
Telomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesTelomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesGenome Reference Consortium
 
2013 stamps-intro-assembly
2013 stamps-intro-assembly2013 stamps-intro-assembly
2013 stamps-intro-assemblyc.titus.brown
 
London Calling 2019: Karen Miga
London Calling 2019: Karen MigaLondon Calling 2019: Karen Miga
London Calling 2019: Karen MigaKaren Hayden Miga
 
Previewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRCPreviewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRCGenome Reference Consortium
 
Ngs microbiome
Ngs microbiomeNgs microbiome
Ngs microbiomejukais
 

La actualidad más candente (20)

Evaluation of Pool-Seq as a cost-effective alternative to GWAS
Evaluation of Pool-Seq as a cost-effective alternative to GWASEvaluation of Pool-Seq as a cost-effective alternative to GWAS
Evaluation of Pool-Seq as a cost-effective alternative to GWAS
 
Toward A Better Understanding Of Plant Genome Structure: Combining NGS, Optic...
Toward A Better Understanding Of Plant Genome Structure: Combining NGS, Optic...Toward A Better Understanding Of Plant Genome Structure: Combining NGS, Optic...
Toward A Better Understanding Of Plant Genome Structure: Combining NGS, Optic...
 
Cross-Kingdom Standards in Genomics, Epigenomics and Metagenomics
Cross-Kingdom Standards in Genomics, Epigenomics and MetagenomicsCross-Kingdom Standards in Genomics, Epigenomics and Metagenomics
Cross-Kingdom Standards in Genomics, Epigenomics and Metagenomics
 
Sequencing 2016
Sequencing 2016Sequencing 2016
Sequencing 2016
 
Phylogenomic methods for comparative evolutionary biology - University Colleg...
Phylogenomic methods for comparative evolutionary biology - University Colleg...Phylogenomic methods for comparative evolutionary biology - University Colleg...
Phylogenomic methods for comparative evolutionary biology - University Colleg...
 
Genetic Analysis Solutions for Plant Sciences
Genetic Analysis Solutions for Plant SciencesGenetic Analysis Solutions for Plant Sciences
Genetic Analysis Solutions for Plant Sciences
 
Basics of Genome Assembly
Basics of Genome Assembly Basics of Genome Assembly
Basics of Genome Assembly
 
2013 stamps-intro-assembly
2013 stamps-intro-assembly2013 stamps-intro-assembly
2013 stamps-intro-assembly
 
Future of metagenomics
Future of metagenomicsFuture of metagenomics
Future of metagenomics
 
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
 
How to Standardise and Assemble Raw Data into Sequences: What Does it Mean fo...
How to Standardise and Assemble Raw Data into Sequences: What Does it Mean fo...How to Standardise and Assemble Raw Data into Sequences: What Does it Mean fo...
How to Standardise and Assemble Raw Data into Sequences: What Does it Mean fo...
 
Techniques of-biotechnology-mcclean-good
Techniques of-biotechnology-mcclean-goodTechniques of-biotechnology-mcclean-good
Techniques of-biotechnology-mcclean-good
 
Telomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesTelomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomes
 
2013 stamps-intro-assembly
2013 stamps-intro-assembly2013 stamps-intro-assembly
2013 stamps-intro-assembly
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
London Calling 2019: Karen Miga
London Calling 2019: Karen MigaLondon Calling 2019: Karen Miga
London Calling 2019: Karen Miga
 
Previewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRCPreviewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRC
 
Ngs microbiome
Ngs microbiomeNgs microbiome
Ngs microbiome
 
Big data nebraska
Big data nebraskaBig data nebraska
Big data nebraska
 
New generation Sequencing
New generation Sequencing New generation Sequencing
New generation Sequencing
 

Similar a Bionano genome maps_feb2014

What should Bioinformatics do for EvoDevo?
What should Bioinformatics do for EvoDevo?What should Bioinformatics do for EvoDevo?
What should Bioinformatics do for EvoDevo?ylog
 
High Throughput Sequencing Technologies: What We Can Know
High Throughput Sequencing Technologies: What We Can KnowHigh Throughput Sequencing Technologies: What We Can Know
High Throughput Sequencing Technologies: What We Can KnowBrian Krueger
 
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle
RNA-Seq transcriptome analysis of Gonium pectorale cell cycleRNA-Seq transcriptome analysis of Gonium pectorale cell cycle
RNA-Seq transcriptome analysis of Gonium pectorale cell cycleJennifer Shelton
 
Microbial Phylogenomics (EVE161) Class 10-11: Genome Sequencing
Microbial Phylogenomics (EVE161) Class 10-11: Genome SequencingMicrobial Phylogenomics (EVE161) Class 10-11: Genome Sequencing
Microbial Phylogenomics (EVE161) Class 10-11: Genome SequencingJonathan Eisen
 
High Throughput Sequencing Technologies: On the path to the $0* genome
High Throughput Sequencing Technologies: On the path to the $0* genomeHigh Throughput Sequencing Technologies: On the path to the $0* genome
High Throughput Sequencing Technologies: On the path to the $0* genomeBrian Krueger
 
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.Jennifer Shelton
 
Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128GenomeInABottle
 
20150601 bio sb_assembly_course
20150601 bio sb_assembly_course20150601 bio sb_assembly_course
20150601 bio sb_assembly_coursehansjansen9999
 
AdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
AdamAmeur_SciLife_Bioinfo_course_Nov2015.pptAdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
AdamAmeur_SciLife_Bioinfo_course_Nov2015.pptRuthMWinnie
 
AdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
AdamAmeur_SciLife_Bioinfo_course_Nov2015.pptAdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
AdamAmeur_SciLife_Bioinfo_course_Nov2015.pptEdizonJambormias2
 
Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy
Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremyTowards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy
Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremyShaojun Xie
 
Review of Liao et al - A draft human pangenome reference - Nature (2023)
Review of Liao et al - A draft human pangenome reference - Nature (2023)Review of Liao et al - A draft human pangenome reference - Nature (2023)
Review of Liao et al - A draft human pangenome reference - Nature (2023)Stuart MacGowan
 
How we revealed genomes secrets?
How we revealed genomes secrets? How we revealed genomes secrets?
How we revealed genomes secrets? ehsan sepahi
 
What I learned at CSHL SynBio 2013.
What I learned at CSHL SynBio 2013.What I learned at CSHL SynBio 2013.
What I learned at CSHL SynBio 2013.Kevin Spring
 
Genome10K & Genome Science gEVAL Talk (Earlham Institute/Norwich)
Genome10K & Genome Science gEVAL Talk (Earlham Institute/Norwich)Genome10K & Genome Science gEVAL Talk (Earlham Institute/Norwich)
Genome10K & Genome Science gEVAL Talk (Earlham Institute/Norwich)William Chow
 
2011 jeroen vanhoudt_ngs
2011 jeroen vanhoudt_ngs2011 jeroen vanhoudt_ngs
2011 jeroen vanhoudt_ngsDin Apellidos
 
Analyzing the exome—focusing your NGS analysis with high performance target c...
Analyzing the exome—focusing your NGS analysis with high performance target c...Analyzing the exome—focusing your NGS analysis with high performance target c...
Analyzing the exome—focusing your NGS analysis with high performance target c...Integrated DNA Technologies
 

Similar a Bionano genome maps_feb2014 (20)

Jan2016 pac bio giab
Jan2016 pac bio giabJan2016 pac bio giab
Jan2016 pac bio giab
 
What should Bioinformatics do for EvoDevo?
What should Bioinformatics do for EvoDevo?What should Bioinformatics do for EvoDevo?
What should Bioinformatics do for EvoDevo?
 
High Throughput Sequencing Technologies: What We Can Know
High Throughput Sequencing Technologies: What We Can KnowHigh Throughput Sequencing Technologies: What We Can Know
High Throughput Sequencing Technologies: What We Can Know
 
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle
RNA-Seq transcriptome analysis of Gonium pectorale cell cycleRNA-Seq transcriptome analysis of Gonium pectorale cell cycle
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle
 
Microbial Phylogenomics (EVE161) Class 10-11: Genome Sequencing
Microbial Phylogenomics (EVE161) Class 10-11: Genome SequencingMicrobial Phylogenomics (EVE161) Class 10-11: Genome Sequencing
Microbial Phylogenomics (EVE161) Class 10-11: Genome Sequencing
 
High Throughput Sequencing Technologies: On the path to the $0* genome
High Throughput Sequencing Technologies: On the path to the $0* genomeHigh Throughput Sequencing Technologies: On the path to the $0* genome
High Throughput Sequencing Technologies: On the path to the $0* genome
 
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
 
Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128
 
20150601 bio sb_assembly_course
20150601 bio sb_assembly_course20150601 bio sb_assembly_course
20150601 bio sb_assembly_course
 
AdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
AdamAmeur_SciLife_Bioinfo_course_Nov2015.pptAdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
AdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
 
AdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
AdamAmeur_SciLife_Bioinfo_course_Nov2015.pptAdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
AdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
 
Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy
Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremyTowards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy
Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy
 
Review of Liao et al - A draft human pangenome reference - Nature (2023)
Review of Liao et al - A draft human pangenome reference - Nature (2023)Review of Liao et al - A draft human pangenome reference - Nature (2023)
Review of Liao et al - A draft human pangenome reference - Nature (2023)
 
How we revealed genomes secrets?
How we revealed genomes secrets? How we revealed genomes secrets?
How we revealed genomes secrets?
 
What I learned at CSHL SynBio 2013.
What I learned at CSHL SynBio 2013.What I learned at CSHL SynBio 2013.
What I learned at CSHL SynBio 2013.
 
Ashg grc workshop2014_tg
Ashg grc workshop2014_tgAshg grc workshop2014_tg
Ashg grc workshop2014_tg
 
Alignment Approaches II: Long Reads
Alignment Approaches II: Long ReadsAlignment Approaches II: Long Reads
Alignment Approaches II: Long Reads
 
Genome10K & Genome Science gEVAL Talk (Earlham Institute/Norwich)
Genome10K & Genome Science gEVAL Talk (Earlham Institute/Norwich)Genome10K & Genome Science gEVAL Talk (Earlham Institute/Norwich)
Genome10K & Genome Science gEVAL Talk (Earlham Institute/Norwich)
 
2011 jeroen vanhoudt_ngs
2011 jeroen vanhoudt_ngs2011 jeroen vanhoudt_ngs
2011 jeroen vanhoudt_ngs
 
Analyzing the exome—focusing your NGS analysis with high performance target c...
Analyzing the exome—focusing your NGS analysis with high performance target c...Analyzing the exome—focusing your NGS analysis with high performance target c...
Analyzing the exome—focusing your NGS analysis with high performance target c...
 

Más de Jennifer Shelton

Bioinformatic core facilities discussion
Bioinformatic core facilities discussionBioinformatic core facilities discussion
Bioinformatic core facilities discussionJennifer Shelton
 
Using BioNano Maps to Improve an Insect Genome Assembly​
Using BioNano Maps to Improve an Insect Genome Assembly​Using BioNano Maps to Improve an Insect Genome Assembly​
Using BioNano Maps to Improve an Insect Genome Assembly​Jennifer Shelton
 
Structural Variation Detection
Structural Variation DetectionStructural Variation Detection
Structural Variation DetectionJennifer Shelton
 
Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for ...
Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for ...Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for ...
Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for ...Jennifer Shelton
 
Journal club slides to discuss "Differential analysis of gene regulation at t...
Journal club slides to discuss "Differential analysis of gene regulation at t...Journal club slides to discuss "Differential analysis of gene regulation at t...
Journal club slides to discuss "Differential analysis of gene regulation at t...Jennifer Shelton
 
Applied Bioinformatics Journal Club Pacbio RNA-Seq
Applied Bioinformatics Journal Club Pacbio RNA-SeqApplied Bioinformatics Journal Club Pacbio RNA-Seq
Applied Bioinformatics Journal Club Pacbio RNA-SeqJennifer Shelton
 
RNASeq DE methods review Applied Bioinformatics Journal Club
RNASeq DE methods review Applied Bioinformatics Journal ClubRNASeq DE methods review Applied Bioinformatics Journal Club
RNASeq DE methods review Applied Bioinformatics Journal ClubJennifer Shelton
 
Translocation detection in lung cancer using mate-pair sequencing and iVIGS
Translocation detection in lung cancer using mate-pair sequencing and iVIGSTranslocation detection in lung cancer using mate-pair sequencing and iVIGS
Translocation detection in lung cancer using mate-pair sequencing and iVIGSJennifer Shelton
 
Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technic...
Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technic...Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technic...
Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technic...Jennifer Shelton
 
Multi-k-mer de novo transcriptome assembly and assembly of assemblies using 4...
Multi-k-mer de novo transcriptome assembly and assembly of assemblies using 4...Multi-k-mer de novo transcriptome assembly and assembly of assemblies using 4...
Multi-k-mer de novo transcriptome assembly and assembly of assemblies using 4...Jennifer Shelton
 
Param selection phase1summary_v2
Param selection phase1summary_v2Param selection phase1summary_v2
Param selection phase1summary_v2Jennifer Shelton
 
Bioinformatic jc 08_14_2013_formal
Bioinformatic jc 08_14_2013_formalBioinformatic jc 08_14_2013_formal
Bioinformatic jc 08_14_2013_formalJennifer Shelton
 

Más de Jennifer Shelton (14)

Bioinformatic core facilities discussion
Bioinformatic core facilities discussionBioinformatic core facilities discussion
Bioinformatic core facilities discussion
 
Using BioNano Maps to Improve an Insect Genome Assembly​
Using BioNano Maps to Improve an Insect Genome Assembly​Using BioNano Maps to Improve an Insect Genome Assembly​
Using BioNano Maps to Improve an Insect Genome Assembly​
 
Structural Variation Detection
Structural Variation DetectionStructural Variation Detection
Structural Variation Detection
 
Bng presentation draft
Bng presentation draftBng presentation draft
Bng presentation draft
 
Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for ...
Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for ...Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for ...
Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for ...
 
Journal club slides to discuss "Differential analysis of gene regulation at t...
Journal club slides to discuss "Differential analysis of gene regulation at t...Journal club slides to discuss "Differential analysis of gene regulation at t...
Journal club slides to discuss "Differential analysis of gene regulation at t...
 
Hub gene selection_ds
Hub gene selection_dsHub gene selection_ds
Hub gene selection_ds
 
Applied Bioinformatics Journal Club Pacbio RNA-Seq
Applied Bioinformatics Journal Club Pacbio RNA-SeqApplied Bioinformatics Journal Club Pacbio RNA-Seq
Applied Bioinformatics Journal Club Pacbio RNA-Seq
 
RNASeq DE methods review Applied Bioinformatics Journal Club
RNASeq DE methods review Applied Bioinformatics Journal ClubRNASeq DE methods review Applied Bioinformatics Journal Club
RNASeq DE methods review Applied Bioinformatics Journal Club
 
Translocation detection in lung cancer using mate-pair sequencing and iVIGS
Translocation detection in lung cancer using mate-pair sequencing and iVIGSTranslocation detection in lung cancer using mate-pair sequencing and iVIGS
Translocation detection in lung cancer using mate-pair sequencing and iVIGS
 
Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technic...
Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technic...Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technic...
Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technic...
 
Multi-k-mer de novo transcriptome assembly and assembly of assemblies using 4...
Multi-k-mer de novo transcriptome assembly and assembly of assemblies using 4...Multi-k-mer de novo transcriptome assembly and assembly of assemblies using 4...
Multi-k-mer de novo transcriptome assembly and assembly of assemblies using 4...
 
Param selection phase1summary_v2
Param selection phase1summary_v2Param selection phase1summary_v2
Param selection phase1summary_v2
 
Bioinformatic jc 08_14_2013_formal
Bioinformatic jc 08_14_2013_formalBioinformatic jc 08_14_2013_formal
Bioinformatic jc 08_14_2013_formal
 

Último

Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...RKavithamani
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 

Último (20)

Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 

Bionano genome maps_feb2014

  • 1. Whole genome restriction maps for nonmodel organisms: genomic resources where there were none. Sue Brown Division of Biology Kansas State University Tuesday, February 25, 14
  • 2. Outline • de novo genome assembly and i5K • Improving assemblies with Bionano genome maps ▫ Irys system ▫ File formats ▫ Assembly pipeline ▫ Alignment filtering • Results Tuesday, February 25, 14
  • 3. Genomes • Genomes come in many sizes • Genome assemblies come in many qualities • Draft Assemblies ▫ Most genomes sequenced today (nonmodel) • Finished Assemblies ▫ Model organisms (lots of resources)  Human  Computational  Genetic and genomic tools • Genomic resources increase the value of the genome sequence ▫ Reverse genetic approaches Tuesday, February 25, 14
  • 4. Many initiatives to sequence genomes • 1,000 human genomes ▫ To provide a deep catalog of human genetic variation • Genome 10K -started as an intiative to sequence 10,000 vertebrate genomes. Database currently catalogs specimens from over 16,000 organisms ▫ To understand how complex animal life evolved through changes in DNA and use this knowledge to become better stewards of the planet Tuesday, February 25, 14
  • 5. Letter to Science Announces i5k in 2011 Tuesday, February 25, 14
  • 6. Why sequence 5,000 insect genomes? • 53% of all living species • Maintenance and productivity of natural and agricultural ecosystems • Consume or damage 25% of all agricultural, forestry and livestock production ▫ >$30 Billion in annual loss • Vector plant, animal and human disease ▫ >$50 Billion cost world wide • Just as human and veterinary medicine now rely on personal or animal genome info, revealing info stored in their genomes will transform our ability to manage insects that threaten our health, food supply and economic security • Improve our lives Tuesday, February 25, 14
  • 7. Standard Draft Genome Assemblies • Highly fragmented, even at deep coverage • Scaffolds terminate in repetitive regions • Relatively low N50 values • Example: • 7x Sanger-based Tribolium castaneum genome assembly Tuesday, February 25, 14
  • 8. Tribolium castaneum genomics • Cot analysis ▫ Genome ~200Mb ▫ Long stretches of unique sequence ▫ Low methylation • 9 autosomes, X and Y Jeff Stuart, Purdue Tuesday, February 25, 14
  • 9. Standard Draft  Minimally or unfiltered data, from any number of different sequencing platforms, that are assembled into contiguous strings of bases (AGTC), with no gaps (contigs). This is the minimum standard for submission to public databases. Science Oct 9, 2009 pp236-237 http://compbio.pbworks.com Tuesday, February 25, 14
  • 10. Molecular linkage map used to anchor scaffolds in chromosome builds (ChLG) Low X coverage, no Y, marker density varies Tuesday, February 25, 14
  • 11. Molecular linkage map used to anchor scaffolds in chromosome builds (ChLG) Low X coverage, no Y, marker density varies Tuesday, February 25, 14
  • 12. T. castaneum assembly stats • • • • • • • Number of contigs! ! ! 8,814 Contig N50! ! ! 43,511 Number of scaffolds!! ! 481 Scaffold N50! ! ! 975,455 Total number of chromosomes! 10 (-Y) Unmapped scaffolds!! ! 352 Single contig scaffolds 1835 • (481 + 1830 = 2321 scaffolds total) Tuesday, February 25, 14
  • 13. Scaffold structure of the Tribolium genome assembly ChLG NW NW NW AAJJ 300K Ns Unanchored DS AAJJ DS AAJJ DS AAJJ DS AAJJ Tuesday, February 25, 14 300K Ns
  • 14. Outline • de novo genome assembly and i5K • Improving assemblies with Bionano genome maps ▫ Irys system ▫ File formats ▫ Assembly pipeline ▫ Alignment filtering • Results Tuesday, February 25, 14
  • 15. genetic recombination map to the assembly scaffolds, anchoring greater than 90% of the assembled sequence1 (fig1).  To improve this draft assembly, we constructed physical maps of the T. castaneum genome. Using the orientation of scaffolds have been corrected, and scaffolds have been extended by spanning repetitive regions.   Nature 2008 452:949-55. Genome assembly improvements Figure 2 Genome refinements T. castaneum 3.0 Baylor Sanger 7x draft assembly and molecular genetic map T. castaneum 4.0 length (Mb): 160.466 scaffolds: 2321 scaffold N50 (Mb): 0.98 multicontig scaffolds 481 Illumina long distance jumplibraries extended scaffolds into gaps and capturing gaps with Atlas gap-link and gap-filler. length (Mb): 160.862 scaffolds: 2219 scaffold N50 (Mb): 1.16 411 T. castaneum 4.0 and gam-ngs Gam-ngs merged Illumina assembly and T.cas 4.0 extending several unknowns and an LGX scaffold. length (Mb): 160.864 scaffolds: 2219 scaffold N50 (Mb): 1.16 411 T. castaneum 4.0 and gam-ngs plus BioNano maps Sequence scaffolds were aligned to maps with IrysView the alignment was filtered and used to create new scaffolds. Figure Assembly length (Mb): 189.629 scaffolds: 2153 scaffold N50 (Mb): 3.31 341 An independent platform to validate and improve genomes Figure Mis-assemblies Tuesday, February 25, 14 ChLG3 Validate and expa Three scaffolds fr scaffolded with ca
  • 16. How to validate a de novo assembly? • Describe assembly  # contigs, # scaffolds, total bases, N50 lengths  coverage, # ESTs, # orthologs found • But is the assembly accurate? ▫ Compare to BAC sequences ▫ If you have the resources • Need independent (reasonably priced) method Tuesday, February 25, 14
  • 17. Genome maps based on landmarks • BioNanos Genomics ▫ San Diego, California • Imaging ultra-long molecules of DNA • Labeled at restriction sites Tuesday, February 25, 14
  • 18. Outline • de novo genome assembly and i5K • Improving assemblies with Bionano genome maps ▫ Irys system ▫ File formats ▫ Assembly pipeline ▫ Alignment filtering • Results Tuesday, February 25, 14
  • 19. Introducing the irys system Tuesday, February 25, 14
  • 20. Labeling schema BspQ1 nicks at GCTCTTCN CGAGAAGN 10 sites /100 Kb Tuesday, February 25, 14
  • 22. Samples loaded into 2 flow cells per chip 3 lasers 3 detection channels Detect yoyo 1 in DNA backbone Fluorescent nucleotides at labeled sites Tuesday, February 25, 14
  • 23. DNA molecules entering channels Tuesday, February 25, 14
  • 24. DNA molecules entering channels Tuesday, February 25, 14
  • 25. A long repeat in the Tribolium genome Tuesday, February 25, 14
  • 26. 24 Mapping individual images back to map • hthe Regions flanking repeat are unique Some sites are polymorphic Tuesday, February 25, 14
  • 27. Limitations of the Irys system • Sample prep is very specific • Requires gram amounts of starting material • Bacterial cells, tissue culture cells, eukaryotic nuclei • Less complex tissue is best ▫ Blood ▫ Embryos • Not applicable to transcriptomics projects • contig N50 >30Kb (5 restriction sites) Tuesday, February 25, 14
  • 28. Outline • de novo genome assembly and i5K • Improving assemblies with Bionano genome maps ▫ Irys system ▫ File formats ▫ Assembly pipeline ▫ Alignment filtering • Results Tuesday, February 25, 14
  • 29. Assembly images into genomic maps .tiff .bnx .cmap Tuesday, February 25, 14
  • 30. 28 Align BNG maps to in silico maps (.xmap) Tuesday, February 25, 14
  • 31. 29 File formats are similar to generating sequence data... basecall Image files de novo assemble fastq fasta @SRR014849.2 EIXKN4201AKDUH/2 TCAAGTGGTGAACGGCAGAAA + <=B:==B:=<?6=B;<;=B=) Image files call labels Tuesday, February 25, 14 21! 202146.4 1096.2! 8973.8 10.0565! 11.7966 0.0187! 0.0604 sam HWI-ST330_C0NEHACXX: 2:1101:17113:52802#0 ! 69! contig1 ! 2578! 0! * ! =! 2578! 0 ! ATTACGGCCCATGGTTCAGAATAATGACGAA TAGAAATACTAGTACTATATCCCCTAAAAAA! <@CFFFFFHHGFHJHIJJJJJJJJJFJJJFG FHEHIHGHJGIJHIIIJJJJJJJJIJIIJIH! YT:Z:UP >conitg1 TCAAGTGGTGAACGGCAGAAA de novo assemble bnx 0! 1! QX11! QX12! align align cmap #h CMapId ContigLength NumSites SiteID LabelChannel Position StdDev Coverage Occurrence #f int float int int int float float int int 393 225073.2 21 1 1 20.0 0.0 3 3 xmap #h XmapEntryID!QryContigID ! RefcontigID! QryStartPos ! QryEndPos! RefStartPos ! RefEndPos! Orientation ! Confidence! HitEnum #f int !int ! int ! float ! float ! float ! float ! string ! float ! string 1! 94! 1! 444392.7 ! 5839.8! 57024.0 ! 550038.8! -! 28.87 ! 1M1D2M3I4D1M3I2M1I7M1I1M1I9M1I1M1I2M1I3M1D 2M
  • 32. Visualizing an xmap contig id sequence-based scaffold label alignment BioNano contig map coverage Tuesday, February 25, 14
  • 33. Outline • de novo genome assembly and i5K • Improving assemblies with Bionano genome maps ▫ Irys system ▫ File formats ▫ Assembly pipeline ▫ Alignment filtering • Results Tuesday, February 25, 14
  • 34. Outline • de novo genome assembly and i5K • Improving assemblies with Bionano genome maps ▫ Irys system ▫ File formats ▫ Assembly pipeline ▫ Alignment filtering • Results Tuesday, February 25, 14
  • 35. K-INBRE i5K Github scripts: Irys Scaffolding scripts and manuals written by Jennifer Shelton and Nic Herndon Assembly workflow was developed by Ernest Lam (BioNano) git pull https://github.com/i5K-KINBRE-script-share/Irysscaffolding Tuesday, February 25, 14
  • 36. 34 Assembly pipeline developed with Ernest Lam (BioNano) scripts available at: i5k-KINBRE script share at GitHub: Irys-scaffolding https://github.com/i5K-KINBRE-script-share/Irys-scaffolding Tuesday, February 25, 14
  • 37. Filtering alignments Label density varies throughout the genome so we created scripts to filter in two passes: Pass 1: looks for high confidence score over at least ~30% of the total possible alignment Pass 2: looks for low confidence score over the majority of the total possible alignment (~90%) Pass 1 finds most high quality alignments. Pass 2 finds high-quality low-density alignments. Tuesday, February 25, 14
  • 38. Filtering alignments Super-scaffolded scaffolds are joined in a new reference fasta file. Overlaping scaffolds have a 30bp spacing gap between them If a scaffold aligns more than once only the longest alignment is used If two alignments have the same length only the highest confidence alignment is used Tuesday, February 25, 14
  • 39. Outline • de novo genome assembly and i5K • Improving assemblies with Bionano genome maps ▫ Irys system ▫ File formats ▫ Assembly pipeline ▫ Alignment filtering • Results Tuesday, February 25, 14
  • 40. 38 BNG restriction maps for Tcastaneum • Dual nicked Bsp.QI and BbvCI • 28.6Gb = ~143x coverage of 200Mb Tribolium genome (>150 Kb) • N contigs: 216 • Total Contig Len (Mb):   200.473 • Avg. Contig Len  (Mb):     0.928 • Contig N50       (Mb):    1.350 • Total Ref Len    (Mb):   157.186 • Total Contig Len / Ref Len  : 1.275 Tuesday, February 25, 14
  • 41. ChLG X ChLGX had 13 scaffolds. Alignment to BioNano maps captured gaps and validated order for 11 of 13 scaffolds, incorporated 2 unplaced scaffolds and identified a potential misplaced scaffold (scaffold 2 aligns with another linkage group). Tuesday, February 25, 14
  • 42. ChLG X ChLGX had 13 scaffolds. Alignment to BioNano maps captured gaps and validated order for 11 of 13 scaffolds, incorporated 2 unplaced scaffolds and identified a potential misplaced scaffold (scaffold 2 aligns with another linkage group). Tuesday, February 25, 14
  • 43. ChLG X ChLGX had 13 scaffolds. Alignment to BioNano maps captured gaps and validated order for 11 of 13 scaffolds, incorporated 2 unplaced scaffolds and identified a potential misplaced scaffold (scaffold 2 aligns with another linkage group). Tuesday, February 25, 14
  • 44. ChLG 7 Alignment to BioNano maps captured gaps and validated order for 13 of 15 scaffolds. Scaffold 14 needs to be reversed in the super-scaffold. Tuesday, February 25, 14
  • 45. ChLG 7 Alignment to BioNano maps captured gaps and validated order for 13 of 15 scaffolds. Scaffold 14 needs to be reversed in the super-scaffold. Tuesday, February 25, 14
  • 46. Additional chromosome linkage groups. Tuesday, February 25, 14
  • 47. Additional chromosome linkage groups. ChLG 3 Tuesday, February 25, 14
  • 48. Additional chromosome linkage groups. ChLG 3 ChLG 9 Tuesday, February 25, 14
  • 49. Additional chromosome linkage groups. ChLG 3 ChLG 9 ChLG 2 Tuesday, February 25, 14
  • 50. 42 what does it cost? • 100-500Mb genome <$5,000 ▫ 70-100x coverage • 1Gb genome <$8,000 ▫ 70-100x coverage • completely dependent on homogeneity of starting material • assembly and analysis software is included in price Tuesday, February 25, 14
  • 51. 43 Summary • • • • • • • • • Standard Draft Genomes are highly fragmented BNG provides independent platform Whole genome restriction maps Validate assembly Extend scaffolds/Size Gaps Identify structural variants Identify haplotypes Comprehensive view of repetitive DNA (HORs) A validated genome assembly improves downstream analyses Tuesday, February 25, 14
  • 52. Thanks to: • Michelle Gordon ▫ Research Assistant: optimizing sample preps • Jennifer Shelton ▫ Biologist turned Bioinformaticist • Nic Herndon ▫ Computer scientist turned Bioinformaticist • BioNano Genomics ▫ Ernest Lam ▫ Weiping Wang Tuesday, February 25, 14