SlideShare una empresa de Scribd logo
1 de 43
Descargar para leer sin conexión
Error correction
for next generation sequencing
Wu Chihua (Gigi)
Matsuyama Lab M2
Bioinformatics Group
October 1st, 2013

13年11月5⽇日星期⼆二
Agenda
Background
Existing research
Toy Experiment
Future work
References

2
13年11月5⽇日星期⼆二
Background
why & what

3
13年11月5⽇日星期⼆二
DNA Sequencing
Angelina Jolie tested for one gene, what
about the other 20,000?

4
13年11月5⽇日星期⼆二
1

20,000
full genome sequence

5
13年11月5⽇日星期⼆二
Genomeof DNA
An organism's complete set

6
13年11月5⽇日星期⼆二
Chromosome
=

DNA
 +
 protein

7
13年11月5⽇日星期⼆二
A base pair
G
(bp)
T
C

Chromosome
=

DNA
 +
 protein

8
13年11月5⽇日星期⼆二
Chromosome

Gene

20,000+

13年11月5⽇日星期⼆二
Human genome
3 billion bps

Human DNA
50 ~ 250 Mbps

Human gene
average : 3,000 bps
largest : 2,400,000 bps

10
13年11月5⽇日星期⼆二
Next Generation

Sequencing
 cheaper
roughput
 
high
 th
rt
 reads
utput
 sho
o

11
13年11月5⽇日星期⼆二
Elaine R. Mardis. A decade’s perspective on DNA sequencing technology. Figure 1.

12
13年11月5⽇日星期⼆二
wikipedia. http://en.wikipedia.org/wiki/DNA_sequencing#cite_note-quail2012-37

13
13年11月5⽇日星期⼆二
14
13年11月5⽇日星期⼆二
Error Correction

highly accurate sequenced reads will likely lead to higher quality results.

15
13年11月5⽇日星期⼆二
Existing Research

16
13年11月5⽇日星期⼆二
17
13年11月5⽇日星期⼆二
Possible direction
To handle large genomes and larger datasets.
To handle insertion and deletion errors.
To correct hybrid datasets from multiple next generation
platforms.
To develop error correction methods for datasets in population
studies.

18
13年11月5⽇日星期⼆二
Toy experiment

19
13年11月5⽇日星期⼆二
short read
find similar pairs of reads by
SlideSort
vote each position by paired
read

decide the new base
correct the erroneous
bases

13年11月5⽇日星期⼆二
Slidesort
• All pairs similarity search (APSS) for
sequence dataset.

• APSS: find all similar pairs in a
dataset.

• Performance of SlideSort
•
13年11月5⽇日星期⼆二

• 10 minutes for 10 million reads.
• 2~3G byte for 10 million reads.
Complexity of SlideSort
• Time: O(N+α)
• Equivalence classes are found in O(N).
• α is a number of neighbor pairs.
21
Slidesort
Output

Input

Alignments and distances
of all similar pairs.

• A set of short reads
• Distance threshold d
ATGCATA ATTCATT
ATGCTCA ATGCCCA

SlideSort

AAGTCGG ATGTATT
AAGGTCG ATGCTTA

22
13年11月5⽇日星期⼆二

ATGCATA ed= 1
ATGCTTA
ATGCATA ed= 2
ATGCTCA
AAG-TCGG ed= 2
AAGGTCG-
Naive approach:
2)
O(N
ACGC.….

ATGC…….

AAGT…….
*Animation by Prof. Shimizu

13年11月5⽇日星期⼆二

How to reduce
computational
cost?
Naive approach:
2)
O(N
ACGC.….

ATGC…….

AAGT…….
*Animation by Prof. Shimizu

13年11月5⽇日星期⼆二

How to reduce
computational
cost?
ATGC…….

AAGT…….
*Animation by Prof. Shimizu

Basic strategy:
1. Filtering stage
Find subsets sharing common substring(s)

2. Pair-wise comparison stage
Compares all pairs for each subset.

13年11月5⽇日星期⼆二
ATGC…….

AAGT…….
*Animation by Prof. Shimizu

Basic strategy:
1. Filtering stage
Find subsets sharing common substring(s)

2. Pair-wise comparison stage
Compares all pairs for each subset.

13年11月5⽇日星期⼆二
ATGC…….

AAGT…….
*Animation by Prof. Shimizu

Basic strategy:
1. Filtering stage
Find subsets sharing common substring(s)

2. Pair-wise comparison stage
Compares all pairs for each subset.

13年11月5⽇日星期⼆二
ATGC…….

AAGT…….
*Animation by Prof. Shimizu

Basic strategy:
1. Filtering stage
Find subsets sharing common substring(s)

2. Pair-wise comparison stage
Compares all pairs for each subset.

13年11月5⽇日星期⼆二
ACGC.….

ATGC…….

AAGT…….
*Animation by Prof. Shimizu

Basic strategy:
1. Filtering stage
Find subsets sharing common substring(s)

2. Pair-wise comparison stage
Compares all pairs for each subset.

13年11月5⽇日星期⼆二
ACGC.….

ATGC…….

AAGT…….
*Animation by Prof. Shimizu

Basic strategy:
1. Filtering stage
Find subsets sharing common substring(s)

2. Pair-wise comparison stage
Compares all pairs for each subset.

13年11月5⽇日星期⼆二
ACGC.….

ATGC…….

AAGT…….
*Animation by Prof. Shimizu

Basic strategy:
1. Filtering stage
Find subsets sharing common substring(s)

2. Pair-wise comparison stage
Compares all pairs for each subset.

13年11月5⽇日星期⼆二
ACGC.….

ATGC…….

AAGT…….
*Animation by Prof. Shimizu

Basic strategy:
1. Filtering stage
Find subsets sharing common substring(s)

2. Pair-wise comparison stage
Compares all pairs for each subset.

13年11月5⽇日星期⼆二
ACGC.….

ATGC…….

ATGC…….

AAGT…….
*Animation by Prof. Shimizu

Basic strategy:
1. Filtering stage
Find subsets sharing common substring(s)

2. Pair-wise comparison stage
Compares all pairs for each subset.

13年11月5⽇日星期⼆二
ACGC.….

ATGC…….

ATGC…….

AAGT…….
*Animation by Prof. Shimizu

Basic strategy:
1. Filtering stage
Find subsets sharing common substring(s)

2. Pair-wise comparison stage
Compares all pairs for each subset.

13年11月5⽇日星期⼆二

Más contenido relacionado

Destacado

Payroll services dubai
Payroll services dubaiPayroll services dubai
Payroll services dubaiHLB Hamt
 
6 easy pieces group 06 (1)
6 easy pieces group 06 (1)6 easy pieces group 06 (1)
6 easy pieces group 06 (1)digitallearning
 
So, How's It Going? Effective Assessment for Learning
So, How's It Going? Effective Assessment for LearningSo, How's It Going? Effective Assessment for Learning
So, How's It Going? Effective Assessment for LearningJoanne Pettis
 
Expressions which use_the_verb_avoir
Expressions which use_the_verb_avoirExpressions which use_the_verb_avoir
Expressions which use_the_verb_avoirshzahedi
 
Business valuation services dubai uae
Business valuation services dubai uaeBusiness valuation services dubai uae
Business valuation services dubai uaeHLB Hamt
 
Conjugating regular "er" verbs
Conjugating regular "er" verbsConjugating regular "er" verbs
Conjugating regular "er" verbsshzahedi
 
Notes From Velocity Conference Europe
Notes From Velocity Conference EuropeNotes From Velocity Conference Europe
Notes From Velocity Conference EuropeSiriusWay
 
How i learned i wanted to teach
How i learned i wanted to teachHow i learned i wanted to teach
How i learned i wanted to teachdigitallearning
 

Destacado (12)

for a beautiful life
for a beautiful lifefor a beautiful life
for a beautiful life
 
Civics,CC, dan Ce
Civics,CC, dan CeCivics,CC, dan Ce
Civics,CC, dan Ce
 
Payroll services dubai
Payroll services dubaiPayroll services dubai
Payroll services dubai
 
6 easy pieces group 06 (1)
6 easy pieces group 06 (1)6 easy pieces group 06 (1)
6 easy pieces group 06 (1)
 
So, How's It Going? Effective Assessment for Learning
So, How's It Going? Effective Assessment for LearningSo, How's It Going? Effective Assessment for Learning
So, How's It Going? Effective Assessment for Learning
 
Expressions which use_the_verb_avoir
Expressions which use_the_verb_avoirExpressions which use_the_verb_avoir
Expressions which use_the_verb_avoir
 
Business valuation services dubai uae
Business valuation services dubai uaeBusiness valuation services dubai uae
Business valuation services dubai uae
 
วิดีโอ
วิดีโอวิดีโอ
วิดีโอ
 
Conjugating regular "er" verbs
Conjugating regular "er" verbsConjugating regular "er" verbs
Conjugating regular "er" verbs
 
Notes From Velocity Conference Europe
Notes From Velocity Conference EuropeNotes From Velocity Conference Europe
Notes From Velocity Conference Europe
 
How i learned i wanted to teach
How i learned i wanted to teachHow i learned i wanted to teach
How i learned i wanted to teach
 
Missing link is laughter
Missing link is laughterMissing link is laughter
Missing link is laughter
 

Similar a 20131001 lab meeting

How DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of GoHow DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of GoTim Riser
 
Taming Pythons with ZooKeeper
Taming Pythons with ZooKeeperTaming Pythons with ZooKeeper
Taming Pythons with ZooKeeperJyrki Pulliainen
 
Bioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searchingBioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searchingProf. Wim Van Criekinge
 
Citython presentation
Citython presentationCitython presentation
Citython presentationAnkit Tewari
 
Neural Nets Deconstructed
Neural Nets DeconstructedNeural Nets Deconstructed
Neural Nets DeconstructedPaul Sterk
 
Introduction to Julia
Introduction to JuliaIntroduction to Julia
Introduction to Julia岳華 杜
 
Video summarization using clustering
Video summarization using clusteringVideo summarization using clustering
Video summarization using clusteringSahil Biswas
 
Using Topological Data Analysis on your BigData
Using Topological Data Analysis on your BigDataUsing Topological Data Analysis on your BigData
Using Topological Data Analysis on your BigDataAnalyticsWeek
 
Clustering
ClusteringClustering
Clusteringbutest
 
BLAST_CSS2.ppt
BLAST_CSS2.pptBLAST_CSS2.ppt
BLAST_CSS2.pptSilpa87
 
NIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph ConvolutionNIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph ConvolutionKazuki Fujikawa
 
Comp7404 ai group_project_15apr2018_v2.1
Comp7404 ai group_project_15apr2018_v2.1Comp7404 ai group_project_15apr2018_v2.1
Comp7404 ai group_project_15apr2018_v2.1paul0001
 
Если нашлась одна ошибка — есть и другие. Один способ выявить «наследуемые» у...
Если нашлась одна ошибка — есть и другие. Один способ выявить «наследуемые» у...Если нашлась одна ошибка — есть и другие. Один способ выявить «наследуемые» у...
Если нашлась одна ошибка — есть и другие. Один способ выявить «наследуемые» у...Positive Hack Days
 
Getting Started with Machine Learning
Getting Started with Machine LearningGetting Started with Machine Learning
Getting Started with Machine LearningHumberto Marchezi
 
Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)Julien SIMON
 
DagdelenSiriwardaneY..
DagdelenSiriwardaneY..DagdelenSiriwardaneY..
DagdelenSiriwardaneY..butest
 

Similar a 20131001 lab meeting (20)

How DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of GoHow DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of Go
 
Spot Sigs
Spot SigsSpot Sigs
Spot Sigs
 
Taming Pythons with ZooKeeper
Taming Pythons with ZooKeeperTaming Pythons with ZooKeeper
Taming Pythons with ZooKeeper
 
Bioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searchingBioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searching
 
Citython presentation
Citython presentationCitython presentation
Citython presentation
 
Neural Nets Deconstructed
Neural Nets DeconstructedNeural Nets Deconstructed
Neural Nets Deconstructed
 
Introduction to Julia
Introduction to JuliaIntroduction to Julia
Introduction to Julia
 
Video summarization using clustering
Video summarization using clusteringVideo summarization using clustering
Video summarization using clustering
 
Using Topological Data Analysis on your BigData
Using Topological Data Analysis on your BigDataUsing Topological Data Analysis on your BigData
Using Topological Data Analysis on your BigData
 
Clustering
ClusteringClustering
Clustering
 
BLAST_CSS2.ppt
BLAST_CSS2.pptBLAST_CSS2.ppt
BLAST_CSS2.ppt
 
NIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph ConvolutionNIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph Convolution
 
Comp7404 ai group_project_15apr2018_v2.1
Comp7404 ai group_project_15apr2018_v2.1Comp7404 ai group_project_15apr2018_v2.1
Comp7404 ai group_project_15apr2018_v2.1
 
Alignments
AlignmentsAlignments
Alignments
 
Если нашлась одна ошибка — есть и другие. Один способ выявить «наследуемые» у...
Если нашлась одна ошибка — есть и другие. Один способ выявить «наследуемые» у...Если нашлась одна ошибка — есть и другие. Один способ выявить «наследуемые» у...
Если нашлась одна ошибка — есть и другие. Один способ выявить «наследуемые» у...
 
Getting Started with Machine Learning
Getting Started with Machine LearningGetting Started with Machine Learning
Getting Started with Machine Learning
 
Kaggle kenneth
Kaggle kennethKaggle kenneth
Kaggle kenneth
 
Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)
 
DagdelenSiriwardaneY..
DagdelenSiriwardaneY..DagdelenSiriwardaneY..
DagdelenSiriwardaneY..
 
Decision tree
Decision treeDecision tree
Decision tree
 

Último

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 

Último (20)

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 

20131001 lab meeting