1. Synthetic Spike-in mRNA-Seq data
for Cancer Gene Fusion Detection
Waibhav D. Tembe, Stephanie J.K.
Pond, Christophe Legendre, Han-Yu
Chuang, Winnie S. Liang,
Nancy E. Kim, Valerie Montel, Shukmei
Wong, Timothy K. McDaniel, David W.
Craig, John D. Carpten
Research supported by a Stand Up To Cancer – Melanoma Research Alliance Melanoma Dream Team
Translational Cancer Research Grant (#SU2C-AACR-DT0612). Stand Up To Cancer is a program of the
Entertainment Industry Foundation administered by the American Association for Cancer Research.
2. 2
! Oncogenic fusions provide actionable, druggable targets with established clinical
validity.
– Examples include imatinib, tretinoin, and crizotinib, which target the BCR-ABL, PML-
RAR, and EML4-ALK fusion products associated with chronic myelogenous leukemia,
acute promyelocytic leukemia, and non-small cell lung carcinoma, respectively.
! Validating the laboratory and analysis methods to establish analytical
parameters including the limit of detection, linearity, sensitivity, and specificity of
fusion detection in tumor RNA specimens is critical for adoption in clinical
research, clinical, and diagnostic settings.
! Difficult due to the lack of publically available RNA-seq data specifically
generated to capture gene fusions and the lack of well characterized reference
materials.
Motivation
4. 4
! Synthetic sequences were incorporated into plasmids (IDT)
! Linearized; T7 Transcription, purified, polyA tailed (50-200 bp), purified
! mRNA transcripts characterized: concentration, sequencing
! RNA spikes were mixed together to create a high concentration pool with 40 nM
of each spike.
! This pool was diluted and titrated into to 1µg aliquots of COLO- 829 total RNA
(ATCC 1974) at 10 different concentrations.
! cDNA libraries were prepared using the TruSeq Stranded mRNA LT Sample
Prep Kit
! Sequenced on the Illumina HiSeq2500 in Rapid Run mode with a 2x100 read
! We analyzed the data using three fusion detection tools: ChimeraScan1, Tophat-
Fusion2, and Snowshoes-FTD3
! To verify that fusion reads were present in the original data, we used GSNAP164
as an independent tool to align entire data against a combined reference
sequence consisting of human genome GRCh37 build and the nine synthetic
fusions transcripts.
Experimental method