SlideShare una empresa de Scribd logo
1 de 50
Assembly – before and after
Lex Nederbragt
lex.nederbragt@ibv.uio.no
@lexnederbragt
A warning
The list is by no means complete
Nor do we have experience with all the programs mentioned
Sample DNA Reads
Genome
assembly
Sequencing AssemblyDNA isolation
QC QCQC
Reads
Genome
assembly
Assembly
QC
Fastqc
Prinseq
Many others…
www.nipgr.res.in/ngsqctoolkit.html
preqc (sga)
http://arxiv.org/abs/1307.8026
Reads
Genome
assembly
Assembly
Grooming
Format conversion
http://en.wikipedia.org/wiki/FASTQ_format
Fastq format hell
Adapter/quality trimming
http://www.biostars.org/p/53528/
Celera assembler
Overlap based trimming
Fastx Toolkit
Seqtk
PrinSeq
NGS QC Toolkit
Trimmomatic
BioPieces
Cutadapt
…
…
Mate pair splitting and orientation
150 – 600 bases
Illumina paired end reads
2 – 40 kilobases
Illumina mate pair reads
2 – 40 kilobases
454 mate pair reads
linker
Mate pair splitting and orientation
Illumina paired end reads
Illumina mate pair reads
454 mate pair reads
linker
junctionjunction
+ +
paired end reads
‘contamination’
Mate pair splitting and orientation
Illumina paired end reads
Illumina mate pair reads
454 mate pair reads
linker
junctionjunction
+ +
paired end reads
‘contamination’
Check what orientation
your assembler expects
for the reads!
Reads
Genome
assembly
Assembly
Preparing
Error-correction
Stand-alone or built into assembler
Merging pairs
List from Torsten Seeman’s blog
http://thegenomefactory.blogspot.no/2012/11/tools-to-merge-overlapping-paired-end.html
COPE http://sourceforge.net/projects/coperead/
SeqPrep https://github.com/jstjohn/SeqPrep
FLASH http://www.cbcb.umd.edu/software/flash
fastq-join http://code.google.com/p/ea-utils/wiki/FastqJoin
PANDAseq https://github.com/neufeld/pandaseq
mergePairs.py http://code.google.com/p/standardized-velvet-assembly-report/source/browse/trunk/mergePairs.py
Recent addition
Extend reads
http://140.116.235.124/~tliu/arf-pe/
Digital normalisation
http://arxiv.org/abs/1203.4802
Estimate kmer to use
preqc (SGA)
http://arxiv.org/abs/1307.8026
Reads
Genome
assembly
Assembly
What can the reads tell us about the genome
kmer-based
preqc (SGA)
Kmerspectrumanalyzer
http://arxiv.org/abs/1307.8026
Khmer from Titus
Reads
Genome
assembly
Assembly
This
talk
Reads
Genome
assembly
Assembly
QC
Genome
assembly
Comparing to each other
Metrics
Merging
Improvement
Visualization
Validation
Comparing to reference
Genome
assembly
Comparing to each other
Metrics
Merging
Improvement
Visualization
Validation
Comparing to reference
Assemblathon stats
http://korflab.ucdavis.edu/datasets/Assemblathon/Assemblathon2/Basic_metrics/assembla
thon_stats.pl
OR
https://github.com/lexnederbragt/sequencetools/
Genome
assembly
Comparing to each other
Metrics
Merging
Improvement
Visualization
Validation
Comparing to reference
Gap closing
IMAGE2
Correcting bases
Quiver from Pacific Biosciences
Separate scaffolding
Genome
assembly
Comparing to each other
Metrics
Merging
Improvement
Visualization
Validation
Comparing to reference
Assembly merging/reconciliation
Genome
assembly
Comparing to each other
Metrics
Merging
Improvement
Visualization
Validation
Comparing to reference
Mapped genomic reads
FRCBAM
Mapped transcriptomic reads
Gene finding
Binning
Bacteroides
Proteobacteria
Cyanobacteria
Per-con g read depth
Nederbragt et al, 2010
Genome
assembly
Comparing to each other
Metrics
Merging
Improvement
Visualization
Validation
Comparing to reference
Genome browser(s)
IGV
Genome
assembly
Comparing to each other
Metrics
Merging
Improvement
Visualization
Validation
Comparing to reference
Comparative measures
Log Average Probability (LAP)
Assembly Likelihood Evaluation (ALE)
See also Howison, Zapata2 and Dunn (2013) Toward a
statistically explicit understanding of de novo sequence
assembly doi: 10.1093/bioinformatics/btt525
Genome
assembly
Comparing to each other
Metrics
Merging
Improvement
Visualization
Validation
Comparing to reference
Reference comparison
Mauve assembly metrics
Review
Too many tools…
http://seqanswers.com/wiki/Software/list
Too many tools…
http://wwwdev.ebi.ac.uk/fg/hts_mappers
88 short-read mappers
Embargo!
Benchmarking, anyone?
All-in-one assembly pipeline
doi:10.1186/1471-2105-15-126

Más contenido relacionado

Destacado

Genome resource databases in horticutural crops
Genome resource databases in horticutural cropsGenome resource databases in horticutural crops
Genome resource databases in horticutural crops
Pulipati Gangadhara Rao
 
Bioinformatics and functional genomics
Bioinformatics and functional genomicsBioinformatics and functional genomics
Bioinformatics and functional genomics
Aisha Kalsoom
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysis
drelamuruganvet
 

Destacado (20)

Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015
 
Genome resource databases in horticutural crops
Genome resource databases in horticutural cropsGenome resource databases in horticutural crops
Genome resource databases in horticutural crops
 
Genome Database Systems
Genome Database Systems Genome Database Systems
Genome Database Systems
 
Gene expression introduction
Gene expression introductionGene expression introduction
Gene expression introduction
 
Talk by Jonathan Eisen for GSAC2000 on "Phylogenomics"
Talk by Jonathan Eisen for GSAC2000 on "Phylogenomics"Talk by Jonathan Eisen for GSAC2000 on "Phylogenomics"
Talk by Jonathan Eisen for GSAC2000 on "Phylogenomics"
 
Linkers
LinkersLinkers
Linkers
 
2015 12-09 nmdd
2015 12-09 nmdd2015 12-09 nmdd
2015 12-09 nmdd
 
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
 
Bioinformatics and functional genomics
Bioinformatics and functional genomicsBioinformatics and functional genomics
Bioinformatics and functional genomics
 
Kogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysisKogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysis
 
Genome Assembly
Genome AssemblyGenome Assembly
Genome Assembly
 
Genome analysis
Genome analysisGenome analysis
Genome analysis
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysis
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
Illumina Sequencing
Illumina SequencingIllumina Sequencing
Illumina Sequencing
 
Parks kmer metagenomics
Parks kmer metagenomicsParks kmer metagenomics
Parks kmer metagenomics
 
Dynamic Linker
Dynamic LinkerDynamic Linker
Dynamic Linker
 
Genomics seminar
Genomics seminarGenomics seminar
Genomics seminar
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
Types of genomics ppt
Types of genomics pptTypes of genomics ppt
Types of genomics ppt
 

Similar a Assembly: before and after

2013 pag-equine-workshop
2013 pag-equine-workshop2013 pag-equine-workshop
2013 pag-equine-workshop
c.titus.brown
 
Anne_Vaittinen_advanced_seminar_presentation
Anne_Vaittinen_advanced_seminar_presentationAnne_Vaittinen_advanced_seminar_presentation
Anne_Vaittinen_advanced_seminar_presentation
Anne Vaittinen
 
Summary of Journal_ShenLu_Summer2013
Summary of Journal_ShenLu_Summer2013Summary of Journal_ShenLu_Summer2013
Summary of Journal_ShenLu_Summer2013
Shen Lu
 
New_resume_v2
New_resume_v2New_resume_v2
New_resume_v2
Keyang Fu
 
Assembly and finishing
Assembly and finishingAssembly and finishing
Assembly and finishing
Nikolay Vyahhi
 

Similar a Assembly: before and after (20)

2013 pag-equine-workshop
2013 pag-equine-workshop2013 pag-equine-workshop
2013 pag-equine-workshop
 
Hackingtomcat
HackingtomcatHackingtomcat
Hackingtomcat
 
Hacking Tomcat
Hacking TomcatHacking Tomcat
Hacking Tomcat
 
Anne_Vaittinen_advanced_seminar_presentation
Anne_Vaittinen_advanced_seminar_presentationAnne_Vaittinen_advanced_seminar_presentation
Anne_Vaittinen_advanced_seminar_presentation
 
Sfu ngs course_workshop tutorial_2.1
Sfu ngs course_workshop tutorial_2.1Sfu ngs course_workshop tutorial_2.1
Sfu ngs course_workshop tutorial_2.1
 
Building tungsten-clusters-with-postgre sql-hot-standby-and-streaming-replica...
Building tungsten-clusters-with-postgre sql-hot-standby-and-streaming-replica...Building tungsten-clusters-with-postgre sql-hot-standby-and-streaming-replica...
Building tungsten-clusters-with-postgre sql-hot-standby-and-streaming-replica...
 
Gwas.emes.comp
Gwas.emes.compGwas.emes.comp
Gwas.emes.comp
 
Summary of Journal_ShenLu_Summer2013
Summary of Journal_ShenLu_Summer2013Summary of Journal_ShenLu_Summer2013
Summary of Journal_ShenLu_Summer2013
 
Assignment-2 -upload.pptx
Assignment-2 -upload.pptxAssignment-2 -upload.pptx
Assignment-2 -upload.pptx
 
BioMake BOSC 2004
BioMake BOSC 2004BioMake BOSC 2004
BioMake BOSC 2004
 
Elk-slides-pdf.pdf
Elk-slides-pdf.pdfElk-slides-pdf.pdf
Elk-slides-pdf.pdf
 
Spark to Production @Windward
Spark to Production @WindwardSpark to Production @Windward
Spark to Production @Windward
 
Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...
Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...
Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...
 
New_resume_v2
New_resume_v2New_resume_v2
New_resume_v2
 
vectorQC: 'A pipeline for assembling and annotation of vectors'
vectorQC: 'A pipeline for assembling and annotation of vectors'vectorQC: 'A pipeline for assembling and annotation of vectors'
vectorQC: 'A pipeline for assembling and annotation of vectors'
 
RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2
 
Assembly and finishing
Assembly and finishingAssembly and finishing
Assembly and finishing
 
LCA13: Hadoop DFS Performance
LCA13: Hadoop DFS PerformanceLCA13: Hadoop DFS Performance
LCA13: Hadoop DFS Performance
 
Panama.pdf
Panama.pdfPanama.pdf
Panama.pdf
 
Pasteur deep seq_analysis_theory_2016
Pasteur deep seq_analysis_theory_2016Pasteur deep seq_analysis_theory_2016
Pasteur deep seq_analysis_theory_2016
 

Más de Lex Nederbragt

Combining PacBio with short read technology for improved de novo genome assembly
Combining PacBio with short read technology for improved de novo genome assemblyCombining PacBio with short read technology for improved de novo genome assembly
Combining PacBio with short read technology for improved de novo genome assembly
Lex Nederbragt
 

Más de Lex Nederbragt (13)

Coding & Best Practice in Programming in the NGS era
Coding & Best Practice in Programming in the NGS eraCoding & Best Practice in Programming in the NGS era
Coding & Best Practice in Programming in the NGS era
 
Why of version control
Why of version controlWhy of version control
Why of version control
 
Improving and validating the Atlantic Cod genome assembly using PacBio
Improving and validating the Atlantic Cod genome assembly using PacBioImproving and validating the Atlantic Cod genome assembly using PacBio
Improving and validating the Atlantic Cod genome assembly using PacBio
 
Repeat after me: Is our research reproducible (enough)?
Repeat after me: Is our research reproducible (enough)? Repeat after me: Is our research reproducible (enough)?
Repeat after me: Is our research reproducible (enough)?
 
A different kettle of fish entirely: bioinformatic challenges and solutions f...
A different kettle of fish entirely: bioinformatic challenges and solutions f...A different kettle of fish entirely: bioinformatic challenges and solutions f...
A different kettle of fish entirely: bioinformatic challenges and solutions f...
 
Combining PacBio with short read technology for improved de novo genome assembly
Combining PacBio with short read technology for improved de novo genome assemblyCombining PacBio with short read technology for improved de novo genome assembly
Combining PacBio with short read technology for improved de novo genome assembly
 
Updated: New High Throughput Sequencing technologies at the Norwegian Sequenc...
Updated: New High Throughput Sequencing technologies at the Norwegian Sequenc...Updated: New High Throughput Sequencing technologies at the Norwegian Sequenc...
Updated: New High Throughput Sequencing technologies at the Norwegian Sequenc...
 
New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...
New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...
New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...
 
How and why I use blogging
How and why I use bloggingHow and why I use blogging
How and why I use blogging
 
How to sequence a large eukaryotic genome
How to sequence a large eukaryotic genomeHow to sequence a large eukaryotic genome
How to sequence a large eukaryotic genome
 
Assembly of metagenomes
Assembly of metagenomesAssembly of metagenomes
Assembly of metagenomes
 
NGS techniques and data
NGS techniques and data NGS techniques and data
NGS techniques and data
 
NGS: bioinformatic challenges
NGS: bioinformatic challengesNGS: bioinformatic challenges
NGS: bioinformatic challenges
 

Último

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Último (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 

Assembly: before and after