2. What is gene sequencing?
• Sequencing means to determine the
primary structure of an unbranched
biopolymer.
• Gene sequencing is the technique that
allows researchers to read the genetic
information found in DNA. Sequencing
involves determining the order of
bases.
3. Importance of gene sequencing:
• Sequencing the gene is an important step toward
understanding the gene.
• A gene sequence contains some clues about where
genes are.
• Gene sequencing give us understanding how the
genome as a whole works-how genes work together to
direct the growth, development and maintenance of an
entire organism.
• It help scientists to study the part of genome outside the
genes-regulatory regions
4. History:
• First sequenced genome was ᵩX 174 bacteriophage in 1977
(5,375 bp) by Fred Sanger.
• In 1995: Haemophilus Influenza; (1,830,137 bp)
Mycoplasma genitalium; (580,000 bp)
• In 1996: Saccharomyces cerevisiae; (12,068,000 bp)
• In 1997: Escherichia coli; (4,639,221 bp)
• In 1999: Human chromosome; (53,000,000 bp)
• In 2000: Drosophila melanogaster; (180,000,000 bp)
• In 2001: Human; Working draft; (3,200,000,000 bp)
5. • In 2002: Plasmodium falciparum; (23,000,000
bp)
Anopheles gambiea; (278,000,000 bp)
Mus musculus; (2,500,000,000 bp)
• In 2003: Human; finished sequence;
(3,200,000,000 bp)
• In 2005: Oryza sativa; (489,000,000 bp)
• In 2006: Populus trichocarpa; (485,000,000 bp)
6. Characteristics of an ideal nucleic acid
technology:
An ideal nucleic acid technology technology should
• Sequence natural DNA or RNA fragments of various
sizes fully with minimal or no error.
• Generate no sequence dependent or other biases for
optimal qualification and genotyping performance.
• Require minimum quantities of input material, ideally
single cell.
• Have high or complete efficiency in all of the
sequencing steps.
• Detect modified nucleotides directly
7. • Operate in laboratories and low-resource settings
without special external environment requirements such
as temperature and humidity control.
• Be sufficiently affordable and economically viable to
allow worldwide adoption and operation in research
and clinical settings.
With the introduction of the SGSTs and subsequent rapid
improvement in capabilities, we are now closer, but still
far away, from the ultimate goal of rapid, comprehensive
and unbiased sequencing of DNA or RNA molecules.
10. Sanger Sequencing
Introduction
• Sanger method used in the
In-vitro DNA replication.
• Sanger sequencing
technique is developed by
British biochemist Frederick
and his team in 1977 .
• Frederick Sanger received
Nobel prize for
development of Sanger
sequencing 1980.
11.
12.
13.
14. • In Sanger dideoxy terminator sequencing, the sample DNA
is used as a template for a DNA polymerase.
• Four polymerase reactions are carried out involving
enzyme, primer and sample DNA, along with dNTP's.
• Each reaction also contains one of the four dideoxy NTP's.
When a dideoxy NTP is added, chain lengthening terminates
because ddNTP nucleotides lack 3' hydroxyl groups by
which to form the next phosphodiester bond.
• Each reaction contains one of the four bases as a dideoxy
NTP, thus each reaction results in fragments terminating at
that base.
• The four reactions produce four collections of fragments
with lengths reflecting the sequence positions of each of the
four respective bases.
20. Introduction
The era of first generation sequencing
dominated by Sanger sequencing quickly came
to an end with the introduction of the
1. Roche/454 platform in 2005.
2. Followed by Solexa/ Illumina system in 2006.
3. Applied Biosystems SOLiD system in 2007.
4. Human genome sequencing service in 2008.
5. Ion Torrent system in 2010.
21. • These SGSTs generate hundreds of thousands to
billions of 25–800 nucleotide-long reads within
days in a low-cost manner compared to Sanger
sequencing.
• Following a shotgun approach, several of these
technologies today can re-sequence a human
genome in 10–30× coverage for under $5,000 in
reagent costs in 8–10 days .
• SGSTs have enjoyed a warm welcome from the
academic and industrial scientists.
22. Applications of SGST’s:
SGST applications in drug discovery have primarily
been whole genome sequencing.
Targeted (e.g. exome) sequencing.
Digital gene expression.
DNA methylation sequencing and copy-number
variation.
And have been expanding to chromatin
immunoprecipitation sequencing (ChIP-Seq).
RNA immunoprecipitation sequencing (RIP-Seq).
23. Continued
whole transcriptome analyses.
single cell analyses.
nucleic acid structure determination.
chromatin conformation (Hi-C) analysis and many
others. SGSTs technologies have also been adapted
beyond nucleic acid sequencing and, for instance, to
protein-DNA affinity measurements ,which may find
applications in drug discovery in the future.
The details of each of these applications are beyond
the scope of this review.
24. NGS versus Sanger
sequencing
Sanger Sequencing
• Preparation of sample is
slow.
• In Sanger Sequencing, it
can’t be started directly
from a gDNA or cDNA
library.
• It’s a time intensive task.
• It’s an expensive task.
Next Generation Sequencing
• Preparation of sample for
NGS are faster and
straightforward.
• In NGS, it may be started
directly from a gDNA or
cDNA library.
• Read more than one billion
short reads in single run.
• A fast and inexpensive
method to get accurate
genomic information.
25. Limitations of NGS:
• Although, NGS is cheaper and faster in comparison to
traditional Sanger sequencing but still it is too expensive
to be affordable by small labs or an individual.
• NGS data analysis is time-consuming and needs sufficient
knowledge of bioinformatics to harvest accurate
information from these sequence data.
• NGS supports read lengths of small size, which results into
highly repetitive sequences. Support for sequencing from
short read lengths is one of the major shortcoming which
limit its application, especially in de novo sequencing.
• Data processing steps is another major bottleneck for the
implementation and capitalization of NGS technology.
26. Methodology of Next
Generation Sequencing :
Basic Principle:
Basic principle on which NGS works is similar to
traditional Sanger sequencing methods involving
capillary electrophoresis.
1. Template preparation
The double-stranded DNA is
considered as the starting material. However, the source
from which this material is derived may vary. Sources can
be either genomic DNA, immuno-precipitated DNA,
reverse-transcribed RNA or cDNA
27. 2. library preparation:
Sequence library preparation involves some common
steps of fragmentation, size selection and adapter
ligation.
Steps serve to break the considered DNA template into
smaller, but sequence-able fragments depending upon
requisite platform.
Moreover, adapter ligation is also involved in this
process, which adds platform specific, synthetic DNAs at
the end of the DNA fragments present in this library to
facilitate the sequencing reactions.
28. 3. library amplification
To produce significant signal for
nucleotide addition. This step
involves either by the attachment
of DNA fragment to micro bead or
attachment of the same to glass
slide, when some PCR techniques
are followed.
30. Major platforms in
NGS
Illumina Sequencing:
One of the major platforms used in NGS.
In this sequencing technology reads of
100-150bp are considered.
Comparatively longer fragments are
considered from the template library for
ligation.
31.
32.
33. Ion Torrent PGM (Personal
Genome Machine)
This sequencing platform does not uses optic signals.
They utilize the concept that the addition of a dNTP to a
DNA polymer releases an H+ ion.
Here template DNA or RNA is fragmented into size of
~200bp.
Using emulsion PCR, the amplification takes place.
H+ ion released on addition of dNTP to a DNA polymer
decreases the pH. The pH changes are detected and
recorded from each well, which allows determining the
bases type and its concentration in that well.
41. What is Whole Genome
Sequencing?
• The NCI defines whole-
genome sequencing in
humans as
“a laboratory process that is
used to determine nearly all of
the approximately 3 billion
nucleotides of an individual’s
complete DNA sequence,
including non-coding
sequence.”
42. • Whole-genome sequencing was originally performed
for the human genome using Sanger sequencing and
took more than a decade and more than $1 billion.
• Today, we use newer technology referred to as “next-
generation sequencing” or “massively parallel
sequencing” and also known as “high-throughput
sequencing.”
• These techniques can sequence both DNA and RNA
faster and cheaper than traditional Sanger
sequencing and, typically, take a few days to perform
with costs around $1000.
43.
44. Whole Genome Sequencing
Hierarchical shotgun Approach:
• Overlapping regions between BAC clones are identified
by restriction mapping or STS analysis.
Whole genome shotgun approach:
• DNA is cut randomly into smaller fragments, cloned and
then sequenced
45.
46.
47. Advantages
One lab method for all bacteria and all typing
needs.
Many different analysis-possible to use different
approaches depending on organisms and needs.
Much more economical and faster.
Labor saving because the sequencing reactions
is virtually fully automated and the sequences
being assembled by computer programs.
49. NEED FOR THIRD
GENERATION SEQUENCING
Genomes are very complex with many
repetitive areas that SGS technologies are
incapable to solve them and the relatively
short reads made genome assembly more
difficult.
In second-generation DNA sequencers, the
template DNA must be amplified before the
sequencing reaction, and an error may occur
during the amplification.
50. • Third-generation DNA sequencers use a single
DNA molecule as the template without needing
to amplify the DNA.
• These third generations of sequencing have the
ability to offer a low sequencing cost and easy
sample preparation in an execution time
significantly faster than SGS technologies.
• In addition, TGS are able to produce long reads
exceeding several kilo bases for the solution of
the assembly problem and repetitive regions of
complex genome.
51. Two Widely Used Platforms
In TGS
Pacific Biosciences single
molecule real time (SMRT)
sequencing (2010)
Oxford Nanopore
Technologies
MinION (2014)
52. Pacific Biosciences single
molecule real time (SMRT)
sequencing
• Pacific Biosciences uses the same fluorescent
labeling as the other technologies . It detects the
signals in real time, as they are emitted when the
incorporations occur.
• It uses a structure composed of many SMRT
cells, each cell contains micro fabricated
nanostructures called zero mode waveguides
(ZMWs) which are wells of tens of nanometers in
diameter micro fabricated in a metal film which is
in turn deposited onto a glass substrate.
• These ZMWs exploit the properties of light
passing through openings with a diameter less
than its wavelength, so light cannot be
propagated.
53. • Due to their small diameter, the light intensity decreases along
the wells and the bottom of the wells illuminated.
• Each ZMW contains a DNA polymerase attached to their
bottom and the target DNA fragment for sequencing.
• During the sequencing reaction, the DNA fragment is
incorporated by the DNA polymerase with fluorescent labeled
nucleotides (with different colors).
• Whenever a nucleotide is incorporated, it releases a luminous
signal that is recorded by sensors. The detection of the labeled
nucleotides makes it possible to determine the DNA sequence.
56. • In this sequencing technology, the first strand of a DNA molecule is
linked by a hairpin to its complementary strand.
• The DNA fragment is passed through a protein nanopore (a nanopore is
a nanoscale hole made of proteins or synthetic materials).
• When the DNA fragment is translated through the pore by the action of
a motor protein attached to the pore, it generates a variation of an ionic
current caused by differences in the moving nucleotides occupying the
pore.
• This variation of ionic current is recorded progressively on a graphic
model and then interpreted to identify the sequence.
• The sequencing is made on the direct strand generating the “template
read” and then the hairpin structure is read followed by the inverse
strand generating the “complement read”, these reads is called "1D".
• If the “temple” and “complement” reads are combined, then we have a
resulting consensus sequence called “two direction read” or "2D“.
60. DNA Forensics:
DNA sequencing has been applied in
forensic science to identify particular
individual because every individual has
unique sequence of his/her DNA.
It is particularly used to identify the
criminals by finding some proof from the
crime scene in the form of hair, nail, skin or
blood samples.
61.
62. Agriculture:
DNA sequencing has played vital role
in the field of agriculture. The mapping and
sequencing of whole genome of micro-
organisms has allowed the agriculturists to
make them useful for the crops and food
plants.
63. Molecular biology:
Sequencing is used in molecular biology to
study genomes and the proteins they encode.
Information obtained using sequencing allows
researchers to identify changes in gene, its
association with disease and phenotypes, and
potential drug targets.
Evolutionary biology:
Since DNA is an informative macromolecule in
terms of transmission from one generation to
another. DNA sequencing is used in evolutionary
biology to study how different organisms are
related and how they evolved.
64. Metagenomics:
It is the study of genetic material
recovered directly from environmental
samples. The field of metagenomics
involves identification of organisms
present in a body of water, sewage, dirt
and debris filtered from air or swamp
samples from organisms.
Example: sequencing enables
researchers to determine which types of
microbes may be present in a
microbiome.
65.
66. Medicine:
In medical research,
DNA sequencing can
be used to detect the
genes which are
associated with some
hereditary diseases .
Scientists use different
techniques of genetic
engineering to identify
the detective genes and
replace them with
healthy one.
Cancer:
With the
help of
comparative
DNA sequence
study, we can
detect any
mutation.
67.
68.
69. Epigenetics:
The study of changes in organisms
caused by modification of gene expression
rather than alteration of genetic code itself.
Lifestyle, nutrition and environmental
factors can lead to epigenetic changes. The
mechanisms that produce changes are:
DNA methylation
Histone modification