The so-called “next-generation” sequencing (NGS) technologies allows us, in a short time and in parallel, to sequence massive amounts of DNA, overcoming the limitations of the original Sanger sequencing methods used to sequence the first human genome. NGS technologies have had an enormous impact on biomedical research within a short time frame. This talk will give an overview of these applications with specific examples from Mendelian genomics and cancer research. #h2ony
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
Next Generation Sequencing and its Applications in Medical Research - Francesc lopez
1. Francesc Lopez
Yale Center for Genome Analysis
Dept. of Genetics
(francesc.lopez@yale.edu)
Next-Generation Sequencing and
its Applications in Medical
Research
2. Brief History of DNA Sequencing
1953: Discovery of DNA structure by Watson and Crick
1973: First sequence of 24 bases published
1977: Sanger sequencing method published
1982: GenBank started
1987: 1st automated sequencer: Applied Biosystems Prism 373 (up to 600 bases)
1996: First Capillary sequencer: ABI310
2000-2003: Human Genome Sequenced
2005- : First NGS sequencers 454 Life Sciences, Solexa/Illumina, Helicos, Ion Torrent
3. Sequencing of the human
genome using Sanger
technology took more than
a decade and cost an
estimated $70 million
dollars
Sanger VS NGS
Bases Genes
Human Genome 3.3x109 ~20,000
In 3 days (one run), Illumina HiSeq 4000
is able to produce 1,680x109 bases for
~$32,000
4. Production facility. 7,000 Sq Ft
dedicated facility
25 Full time staff including 4
PhD level bioinformaticians
Yale Center for Genome Analysis (YCGA)
Dedicated computation
infrastructure
3.5 Petabytes data storage
4500 cores HPC
5. 7 Illumina HiSeqs
5: 2500
2: 4000
One PacBio RS
Illumina MiSeq Ion PGM™ Sequencer
Sequencing Platforms at YCGA
6.
7. Trend of sequencing data output at YCGA
Sequencers are operated at ~70% of the max capacity
Progress made at YCGA in the past years
1% 5%
30%
1%
63%
Library Prep Sample Types
ChIP
Whole Genome
mRNA
micro RNA
Seqcap
Types of samples processed at YCGA
Whole Exome
8. Protein coding genes
(exome) constitute 1.5% of
the human genome but
harbor 85% of disease
causing mutations.
Significantly cheaper than
sequencing entire genome
>50,000 exomes sequenced
at YCGA
Whole-Genome VS Whole-Exome Sequencing
Choi et al PNAS 2009
9. FastQ format – single read
@SEQ_ID
GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT
+
!''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65
X 5x109 reads in a run of HiSeq 4000
11. ExAC: 61,000 exomes
dbSNP
1000 genomes
NHLBI exomes: 6,500
Yale exomes: 2,500
Variant frequency DBs
44 vertebrate species
2 invertebrate species (fly and worm)
PhyloP
Conservation
Polyphen-2
SIFT
Functional prediction OMIM
GO
KEGG
Jackson lab knockout
Gene annotation
Genome build: hg38
Variant caller: GATK
Annotation gene reference: refGene
General parameters
Variant annotation
12. Sequencing a genome is simple
finding a cause of a disease is not
First clinical use of whole genome sequencing shows just
how challenging it can be
Genomes on prescription: Nature 2011
13. DNA Sequencing and Precision Medicine
• Precision Medicine: Use of genomics to tailor medical care to
individuals based on their genetic makeup.
Which treatment?What are my
chances?
Which class of
cancer?
Is it benign?
Therapeutic
Choice
PrognosisDiagnosis Classification
How and why
• Elucidation of
mechanism of cause
• Identification of cancer
biomarkers
• Therapeutic targets
Discovery
14. Genetic diagnosis by whole exome capture and
massively parallel DNA sequencing.
Choi M, et al. (2009) PNAS 106 (45): 19096-101
5 month child presented with
failure to thrive and dehydration.
Treatments for kidney disease
failed
Captured 180,000 exons of 18,673 protein-coding genes comprising 34.0
Mb of genomic sequence
Identified a mutation in SLC26A3 gene which causes congenital chloride
diarrhea – treatments for which have effectively managed the disease
Demonstration of the clinical utility of whole-exome sequencing and its
implications for disease gene discovery and clinical diagnosis
15. In the first 362 trios (affected proband), ~2000 putative de
novo pre-filtered variants were detected.
17. Comparing variants from cases and controls per gene
allows for detection of gene causing diseases
18. Broad, Baylor/Hopkins, U of Washington, and Yale
More than 6,000 rare Mendelian disorders affecting more than 25 million
individuals in US
Discover the genes and variants responsible for as many Mendelian
phenotypes as possible
Develop and disseminate improved methods for disease gene discovery and
analysis
Educate colleagues and public regarding Mendelian disease
Whole-Exome/whole-genome analysis is carried out at no
cost and on a collaborative basis
23. 23
Filtering Recessive Variants
1 1
High quality
Protein altering
Rare in control
databases*
* Yale exome database, NHLBI ESP exome, 1000 Genomes
Kindred 1 Kindred 2
Subject 1 Subject 2 Subject 1 Subject 2
Same
gene
DGKE
4 1
3,151 3,072
12,326 12,094
2 5
3,283 3,227
12,959 12,753
Lemaire et al., Nature Genetics 2013
24. Some Machine learning applications in genetics and genomics
Gene prediction (2002): predict which regions of the genome code for proteins.
RNA secondary structure prediction (2006): predict the base-pairing interactions within a
strand of RNA.
Transcription factor target prediction (2007): predict the sequence of bases most likely to
bind a specific transcription factor.
Base calling (2009): predict the base photographed by an Illumina sequencing device
during a sequencing by synthesis reaction.
Enhancer prediction (2012): predict regions of the genome that act as enhancers for
expression using information about the epigenetic marks present on the chromosomes.
Splicing code (2015): predict how a mutation within a gene will affect the splicing of that
gene's transcript.
Pathogenicity prediction (2015): predict the functional impact of a mutation in a sample
of DNA.
Pharmacogenomics (2011): predict if mutations in a person's DNA will impact how a drug
works in their body.
Predicting the functions of long noncoding RNAs (2015)
Predicting effects of noncoding variants using predicted DNaseI hypersensitivity, histone
modifications, and transcription factor binding (2015)
Predicting RNA editing (2016)
25. List of select publications resulting form the next-generation sequencing usage at YCGA
Whole-exome sequencing identifies recessive WDR62 mutations in severe brain malformations. Bilguvar Nature, v467, 2010
A Novel miRNA Processing Pathway Independent of Dicer Requires Argonaute2 Activity. Cifuentes Science, v328, 2010
Mitotic recombination in ichthyosis causes reversion of dominant mutations in KRT10. Choate K Science, v330, 2010
Transcriptomic analysis of avian digits reveals conserved and derived digit identities in birds. Wang s. Nature, v477, 2011
Transposom-mediated rewiring of gene regulatory networks contributed to the evolution of pregnancy in mammals.
Lynch and Wagner
Nature, Genet. v43,
2011
K
+
channel mutations in adrenal aldosterone-producing adenomas and hereditary hypertension. Choi M Science, v331, 2011
Recessive LAMC3 mutations cause malformations of occipital cortical development. Barak and Gunel. Nat Genet., V43, 2011
Spatio-temporal transcriptome of the human brain. Kang and Sestan Nature, v478, 2011
Langerhans cells facilitate epithelial DNA damage and squamous cell carcinoma. Modi and Girardi Science, v335, 2012
Mutations in kelch-like 3 and cullin 3 causes hypertension and electrolyte abnormalities. Boyden et al Nature, v482, 2012
De novo point mutations are strongly associated with Autism Spectrum Disorders. Sanders and State Nature, v485, 2012
Exome sequencing identifies recurrent somatic RAC1 mutations in melanoma. Krauthammer Nat Genet., V44, 2012
Genomic Analysis of Non-NF2 Meningiomas Reveals Mutations in TRAF7, KLF4, AKT1,& SMO. Clark V et al Science, v339, 2013
De novo mutations in histone-modifying genes in congenital heart disease. Zaidi and Lifton Nature, v498, 2013
Recessive mutations in DGKE cause atypical hemolytic-uremic syndrome. Lemaire and Lifton Nat Genet., V45, 2013
Somatic and germline CACNA1D calcium channel mutations in aldosterone-producing adenomas Scholl and Lifton Nat Genet., V45, 2013
The evolution of lineage-specific regulatory activities in the human embryonic limb. Cotney and Noonan Cell, v154, 2013
Mutations in DSTYK and dominant urinary tract malformations. Sanna-Cherchi and Gharavi N Eng J Med., 2013
Nanog, and SoxB1 activate zygotic gene expression during the maternal-to-zygotic transition. Lee et al Nature, 2013
Co-expression networks implicate mid-fetal deep cortical projection neurons in the pathogenesis of autism. State Cell, 2013
CLP1 Founder Mutation Links tRNA Splicing and Maturation to Cerebellar Development. Schaffer and Gleeson . Cell, V157, 2014
Exome sequencing links corticospinal motor neuron disease to neurodegenerative disorders. Novarino and Gleeson Science, V363, 2014
Recurrent mutations in NF1 and RASopathy genes in sun-exposed melanomas. Krauthammer and halaban Nat Genet. V47 2015
Genetic Causes for Congenital Heart Disease with Neurodevelopmental and Other Deficits. Homsy J et al Science , 2015