Top Rated Hyderabad Call Girls Chintal ⟟ 9332606886 ⟟ Call Me For Genuine Se...
Aug2014 spiral genetics anchored assembly
1. SV Detection via
Anchored Assembly
How can we best call structural variants?
Becky Drees,Jeremy Bruestle, Cheinan Marks
2. SV Detection via Anchored Assembly
Brief Description of Anchored Assembly Method
Testing vs GIAB Variant Set & Validated SV Sets
How Do We Describe SVs from Detected Breakpoints?
Please do not distribute without permission.
!
3. Input data
Any Species
with a draft genome
Existing NGS Data
No special library prep
~20x per ploidy
Please do not distribute without permission.
4. Step 1: Read Correction
A* error correction
1000 2000 3000 4000 5000
0
K-mer Quality Score Distribution
0 200 400 600 800 1000 1200
K-mer Count
Please do not distribute without permission.
Total K-mer Quality Score
!
• Similar to Euler or Quake
• Corrects the read without
using reference
information
• Reduces error from 1% to
0.01%
5. Step 2: Remove Reference Matches
Please do not distribute without permission.
!
• Remove reads that are an
exact match to reference
• Significantly reduces the
complexity of the graph
• Reduces required
memory usage (40GB for
whole human genome)
6. Step 3: Read Overlap Graph
Read overlap
assembly
R7 R8
R3 R6 R9
8 9 8 9
Please do not distribute without permission.
!
• Construct a read overlap
graph with the remaining
reads
• Provides more context
than a kmer-based de
Bruijn graph
7 7 7
7
7
8
7
R1 R2
R3 R5
7. Step 4: Anchoring
Please do not distribute without permission.
!
• Anchor assemblies to
reference coordinates
• Provide breakpoint
information while keeping
reference bias low
Anchoring
8. Step 5: Variant Validation
Variant validation
T T A G A T A A C A
Please do not distribute without permission.
!
• Assemble variant sequence
from read overlap graph
• Computes minimal cost
variation (similar to Smith-
Waterman)
• Calls variants and QC to
remove likely false positives
A A T G A C T T A G . . A
G A C T T A G A T A
A C
C T T A G A T A A C
A T T
A G A T A A C A T T
G
G A T A A C A T T G
G A C T T A G A T A A C A T T G
T A G
Reference
Assembled
R2
R3
R4
R5
R6
9. NA12878 SNP Detection vs GIAB
Please do not distribute without permission.
Anchored)Assembly)only)
13,307)
Genome)in)a)Bo8le)only)
144,463)
!
2,596,897)
Sensi@vity:))95%)
Precision:))99.5%)
13. How to describe SVs from breakpoints?
#CHROM
POS
ID
REF
ALT
QUAL
FILTER
1
1500000
bnd_A
T
T[1:1501108[
100
PASS
INFO
FORMAT
SAMPLE
DP=26;NS=1;SVTYPE=BND;MATEID=bnd_B;AID=1234
DP:ED:OV
26:72:89
#CHROM
POS
ID
REF
ALT
QUAL
FILTER
1
1501108
bnd_B
G
]1:1500000]G
100
PASS
INFO
FORMAT
SAMPLE
DP=26;NS=1;SVTYPE=BND;MATEID=bnd_A;AID=1234
DP:ED:OV
26:72:89
Please do not distribute without permission.
As breakend records:
As SV events:
14. How to describe SVs from breakpoints?
Assembled breakpoints can reveal variation that is hard to categorize
• Different events can produce similar breakpoints
• Multiple breakpoints can represent a single rearrangement event
Please do not distribute without permission.
CHR$1$
bnd_K$ bnd_L$ bnd_M$ bnd_N$
200000$ 190000$ 197000$200231$
15. How to describe SVs from breakpoints?
A single breakpoint can contain multiple sequence changes:
!
• Inserted sequence at deletion breakpoints
• Deleted or duplicated sequence at insert breakpoints
• Deleted or duplicated sequence at inversion breakpoints
deleted sequence duplicated sequence
Please do not distribute without permission.
CHR$1$
1700000$ 1704100$
1700100$ 1704250$
Inverted(sequence(
16. How to describe SVs from breakpoints?
Many assemblies anchor to multiple genome locations
• Variation in duplicated genome regions
• Variation in repetitive elements
• Transposons
anchors to multiple places
Please do not distribute without permission.
CHR$1$
Alu$
unique anchor
17. Contact
• More information
• Trial on own data
!
becky@spiralgenetics.com
niranjan@spiralgenetics.com
!
info@spiralgenetics.com
Please do not distribute without permission.