Exome sequencing has emerged as an economical way of focusing DNA sequencing efforts on the most functionally understood regions of the genome. Pre-capture pooling, where one bait library is used to pull down the exonic regions of several pooled samples simultaneously is a further financial improvement.
However, rare alleles in the pool might not be able to attract baits at the same rate as reference conform sequences can, and may hence be underrepresented. We investigated this potential issue by sequencing a hapmap family (4 individuals) using the pre-capture protocol from Illumina and Nimblegen. We did not observe clear evidence that heterozygote variants are missed but noted a trend for indels to be imbalanced.
Our findings do not provide clear evidence to rule out allelic imbalance or bias having an impact on research findings, this may be especially critical for low cellular cancer tissue where rare alleles are more ubiquitous.
Allelic Imbalance for Pre-capture Whole Exome Sequencing
1. Assessment of allelic bias in pre-capture platforms
for exome sequencing
Back to the future?
Denis Bauer | Research Scientist
28 March 2012
CMIS
2. Part 1:
My Background
and a selection of bioinformatics tools developed in
Brisbane in the
Bailey Group and Boden Group
Exon Capture Comparison | Denis.Bauer@CSIRO.au | Page 2
3. My Background
Brisbane
Neustadt
Berlin
IMB Institute for Molecular Bioscience
QBI Queensland Brain Institute
Timothy Mikael
Bailey Bodén
Sumoylation
Predictor
Fabian Chikako
NorahDesk Buske Ragan
http://meme.sdsc.edu/meme/intro.html
Exon Capture Comparison | Denis.Bauer@CSIRO.au | Page 3
4. Stream
Quantitative model of transcriptional regulation
Bauer, D.C., Buske, F.A., Bailey, T.L., “Dual-functioning
transcription factors in the developmental gene network of
Drosophila melanogaster”; BMC Bioinformatics 11 (1),
366; PMID: 20594356. Cited: 4
Bauer, D.C., Bailey, T.L., “Optimizing static thermodynamic
models of transcriptional regulation.”, Bioinformatics,
2009, 25, 1640-1646. PMID:19398449. Cited: 5
Bauer, D.C., Bailey, T.L., “STREAM: Static Thermodynamic
REgulAtory Model of transcription.”, Bioinformatics 2008
24: 2544-2545. PMID:18776194. Cited: 1
Bauer, D.C., Bailey T.L., “Studying the functional
conservation of cis-regulatory modules and their
transcriptional output.”, BMC Bioinformatics, Apr
29;9(1):220. PMID: 18442418. Cited: 10
http://www.bioinformatics.org.au/stream/
Exon Capture Comparison | Denis.Bauer@CSIRO.au | Page 4
5. Triplexator
Sneak Preview
Search/Design tool nucleic acid triple helices
Fabian A. Buske et al., "Triplexator:
Detecting nucleic acid triple helices in
genomic and transcriptomic data",
Genome Research 2012, accepted
Fabian A. Buske et al., "Potential in vivo
...
roles of nucleic acid triple-helices", RNA
biology, 2011, PMID: 21525785
Coming soon to
http://www.bioinformatics.org.au/triplexator/
Exon Capture Comparison | Denis.Bauer@CSIRO.au | Page 5
6. NORAHDESK
Detecting ncRNA in sequencing data
Ragan, C., Mowry, B.J. and Bauer, D.C. “Hybridization based
reconstruction of small non-coding RNA transcripts from
deep sequencing data”, NAR, 2012, review received.
Specifically useful for miRNA-
and piRNA-clusters that
are transcribed together
http://www.bioinformatics.org.au/norahdesk/
Exon Capture Comparison | Denis.Bauer@CSIRO.au | Page 6
7. Part 2: Back to the future
unbiased
Exon capture is the economical way for an
genome wide analysis.
However, extensive sample manipulation can introduce biases that
we might not be aware of.
Is less sophistication saver?
Exon Capture Comparison | Denis.Bauer@CSIRO.au | Page 7
8. Pre-capture pooling for exome capture
The business side
Economical way of focusing 2GS efforts on the most functionally
understood regions.
Whole DNA sample
Sonicate
Pull out fragments corresponding to the sequence of known “exons”
However, with sequencing cost going down the capture reaction
becomes the bottleneck.
Solution: “Pre-capture pooling”
Apply Bait Library to more than one sample
Clark MJ, et al., Nat Biotechnol. 2011 PMID: 21947028.
Exon Capture Comparison | Denis.Bauer@CSIRO.au | Page 8
9. Pre-capture pooling for exome capture
The technical side
Bait library design
NG: empirically optimized
AG: overlapping RNA-baites
IL: Gapped tiles
What is an “exon” ?
Everything that is known
to be transcribed/has
function …
trust company
Now AG: 72Mb
Clark MJ, et al., Nat Biotechnol. 2011 PMID: 21947028.
Exon Capture Comparison | Denis.Bauer@CSIRO.au | Page 9
10. Oddities:
targeted exons not follow the same length distribution as RefSeq exons
Presentation title | Presenter name | Page 10
11. Oddities: cont’
Theoretical vs actual capture efficiency of longest exon
Exon Capture Comparison | Denis.Bauer@CSIRO.au | Page 11
12. Pre-capture pooling for exome capture
The potential problem
Potential issue: “Allelic Bias”/ “Allelic imbalance” ?
Bait
Potentially
underrepresented
allele
Reference conform + hom Het sample 4
bar-coded samples 1-3
Sequence hapmap family (4 individuals) with
• AG: Post capture
• Ill: Precapture
• NG: Precapture
Exon Capture Comparison | Denis.Bauer@CSIRO.au | Page 12
13. Allelic Bias ?
• If Het-variances are not captured reliably in pre-capture the
het/hom ratio would be lower and they would not overlap with DBs
NG: More Hets in post NG: slighly lower overlap
Ill: More Hets in pre Ill: no difference
Fraction of overlap
Het/hom ratio
known novel Hapmap 1000G
Exon Capture Comparison | Denis.Bauer@CSIRO.au | Page 13
14. Allelic Bias ?
SNPs Com Post Pre
... they would have lower
coverage
coverage
INDELS
Com Post Pre
Com Post Pre
Asan, Xu Y et al. Genome Biol. 2011 PMID: 21955857
Illumina Nimblegen
Exon Capture Comparison | Denis.Bauer@CSIRO.au | Page 14
15. Conclusion
1. We (and others) did not detect any obvious allelic imbalance,
however no one tested samples with really rare alleles (e.g. Low
cellularity in cancer)
2. To be on the save side (BACK TO THE FUTURE): we go for post-
capture whole-exom-sequencing
Exon Capture Comparison | Denis.Bauer@CSIRO.au | Page 15
16. Institute for Molecular Bioscience, UQ
Timothy Bailey (MEME)
School of Chemistry and Molecular
Biosciences, UQ
Mikael Bodén (Machine Learning)
Queensland Brain Institute, UQ
Vikki Marshall
Thank you Joon-Yong An
Sam Lukowski
Chikako Ragan (NorahDesk)
CMIS
Denis C. Bauer
Exon Capture Comparison Garvan Institute, UNSW
t +61 2 9325 3174 John Mattick
e denis.bauer@csiro.au Fabian Buske (Triplexator)
w www.csiro.au/cmis
CMIS
Notas del editor
Figure S5 Reference-allele biases at heterozygous SNP sites. Shown is box-plot of the percentage of reference allele depth. The percentage of reference allele depth at each heterozygous site was calculated for each replicate of the three platforms as well as for whole genome sequencing within the targeted and flanking regions of Agilent (AgiYH) and NimbleGen (NimYH).