Alignment of raw reads in Avadis NGS

Pioneering
Scientific Intelligence

DNA/Small RNA Alignment
in Avadis NGS 1.3

Strictly Confidential © Strand Life Sciences

How does CoBWeb compare with other
What is an Alignment algorithm? algorithms?

What issues must an Alignment How is CoBWeb exposed in Avadis
algorithm consider? NGS?

What is the future evolution of
How do Alignment algorithms work? CoBWeb?

How does CoBWeb work?

Questions we will seek to answer in this presentation

© Strand

What is an Alignment algorithm?

© Strand

Subject’s
Genome
AGGCTACGCATTTCCCATAAAGACCCACGCTTAAGTTC

AGGCTACGCATGTCCCATAATGACCCACACTTAAGTTC
Reference
Genome, close
but not quite
the same as the
Subject’s
Genome

© Strand

What issues must an Alignment
algorithm consider?

© Strand

Mismatches and
Gaps
Reference
Genome

Deletion

Reads
SNP
© Strand

Handling paired
reads
Subject’s
Genome

×

Reference
Genome
Repeat Repeat
Region Region

© Strand

A variety of
Read Lengths

Short reads
~50, few
mismatches
and gaps

Long
reads, few
hundreds to
thousands, ma
ny more
mismatches
and gaps

© Strand

Speed and
Memory

Run in 4GB
RAM Allow use of
multiple
Billions of cores/process
reads. ors
Scale speed
with more
memory

© Strand

How do Alignment algorithms work?

© Strand

Indexing the
Genome to find
Seed Matches Scanning the
Reference for
each Read
takes too long

The Reference
Index
The Index very
quickly yields
locations in the
Reference where
some part (seed) of
the Read matches.
This Seed occurs at This Seed occurs at
Reference locations Reference locations
x1, x2… x3, x4…

© Strand

Detailed
Alignment at
Seed Match
Locations

Seed
Reference Match

Read

How many Mismatches
and Gaps are needed
for the Read to match
around the Seed?
Smith-Waterman or
Dynamic Programming

© Strand

The Burrows-
Wheeler based
Index

The original
Reference
C G A C $
All its circular
shifts, sorted A C $ C G This column is
2 the BWT
lexicographically
0 C G A C $
3 C $ C G A
1 G A C $ C
Circular Shift
Indices 4 $ C G A C

The Index
These can be sampled comprises these
to fit into reduced along with some
memory at the expense housekeeping data
of speed without structures
sacrificing correctness

© Strand

The Burrows-
Wheeler based
Index

EXACT
Reference Match

Read

All Exact Matches of a Read (NO
Mismatches or Gaps) in the
Reference can be found in time
proportional to the length of the
Read and largely independent of
the size of the Reference.

© Strand

How does CoBWeb work?

© Strand

Seeding
Strategy

This 15-mer occurs This 15-mer occurs
at locations at locations
x1, x2… x3, x4… This whole 30-mer
occurs at location
x5
Use the BW based
index, augmented
with additional data
structures for
speed, to find one or
more Long Seed
Matches in the
Reference
Justification: Most long
Reads do not have
Mismatches and Gaps
strewn across their length; And Long Seeds
there are usually long will have few
stretches that match matching locations.
exactly.
© Strand

Advantages

Separating the Smith-
Seed length is not Waterman phase from
specified in advance, so the BW Index search
Long and Short reads can allows an unlimited
be handled seamlessly. number of gaps and
mismatches.

© Strand

Comparison
with BWA CoBWeb:
94% BWA: 4%
Alignment error + 1 gap
Read Score with up of possibly
Length 50 to 2 Gaps multiple length

Read
Length 150

A little faster than
BWA with
comparable results

© Strand

Specify number of
Alignment Mismatches and
Parameters Gaps, and handling of
Multiple Matching.

Specify Adaptor
Trimming (only for Small
RNA) and 3’,5’ trimming
based on quality

Screen against
Contaminant Databases.

© Strand

Alignment of raw reads in Avadis NGS

Recommended

Recommended

More Related Content

More from Strand Life Sciences Pvt Ltd

More from Strand Life Sciences Pvt Ltd (12)

Recently uploaded

Recently uploaded (20)

Alignment of raw reads in Avadis NGS