Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Sequencing @ BitLab
1. My
Experience
with
454
Sequencing
Sequencing
@
BitLab
Dec,
17,
2010
Raoul
Jean
Pierre
Bonnal
<bonnal@ingm.it>
2. Topics
• Bacteria
– De-‐novo
– Re-‐sequencing
• Human
– Snp
– Mitochondria
• Ancient
• CharacterizaRon
of
mutaRons
&
deleRons
– Etc…
3. Topics
• Bacteria
– De-‐novo
– Re-‐sequencing
• Human
– Mitochondria
• Ancient
• CharacterizaRon
of
mutaRons
&
deleRons
– Snp
– Etc…
4. A
case
study:
Acinetobacter
Baumannii
• Epidemic
mulRdrug-‐
resistant
strain
• Single
chromosome
–
3.904.116
bp
• 2
plasmids
– pACICU1:
28.279
bp
– pACICU2:
64.366
bp
5. Sequencing
&
Assembly
• Sequencing:
– mixed
data
from
GS20
and
GS
FLX,
100
and
~250
reads
lenght
–
900.363
reads
– 96.454.548
total
bp
• Assembly
– 23
fold
coverage
of
the
genome
– 1.036
conRgs
with
a
maximum
length
of
200.179
bp
(automaRc)
– 47
scaffolds
using
Paired
End
and
a
semi
automaRc
assembly
– Manual
check
6. Newbler’s
weakness
is
the
same
for
the
others
• Hey,
there
is
something
strange
in
my
sequences.
Clean!
• Keep
in
mind
that
living
beings
are
complex.
Repeated
regions.
7. Cleaning
• With
Newbler
by
Roche/454
you
can
define
a
database
of
known
sequences
and
the
reads
matching
against
it
will
be
removed
from
the
alignment
• What
to
clean
up?
– Plasmids
– Contaminants
8. Cleaning:
Plasmids
• They
are
mixed
in
our
target
genome
• We
can
try
to
reduce
the
complexity
– MidiPrep
kit
separate
the
plasmids
from
main
chromosome
• Easy,
Quick,
Cheap
– Sanger
sequencing
– Remove
the
plasmid
sequence
from
the
dataset
• A
public
database
?
– VectorDB
–old-‐
– hpp://www.lablife.org/vectordb
9. Repeated
regions
• InserRon
sequences,
transposases,
rRNA
operons,
…
• Soqware
tends
to
collapse
repeated
regions
in
single
conRgs
– For
few
variaRons,
we
used
degenerated
code
(IUB)
– Re-‐assembly
each
conRg
separately
with
more
stringent
parameters
– Check
wrong
frame
shiq
due
to
homopolymeric
stretches
10. Final
approach
• 7
conRgs
each
one
with
an
interrupRon
at
the
4.4-‐kb
rRNA
gene
clusters,
then
?
Who
knows!
• Brute
force
PCR
strategy
1. 14
primer
pairs
designed
inside
the
flanking
regions
2. all
combinaRons
of
primers
and
conRgs
using
Elongase
3. 7
amplicons
sequenced
with
an
ABI
3730
DNA
sequencer
4. Manual
&
final
alignment
• We
can’t
skip
the
PCR
we
must
test
our
hypotheses
like
in
programming
– Usually
I
don’t
trust
myself
11. General
problems
• Original
design
– Which
Paired
End
?
• None
• 3kb
• 8kb
• 20kb
• MulR
assembly
– New
data
are
coming
in
– New
soqware
update
• In
the
mean
Rme
the
experts
are
doing
hypotheses
and
they
want
to
keep
track
of
them
– Comparison
of
different
assemblies
12. Chat
• We
don’t
talk,
each
other,
enough!
– Avoid
black
boxes
• The
team
is
made
of
– BioinformaRcians
• Output
is
not
always
user
friendly
– Microbiologists
• Are
there
similar
strains
or
parents?
• Are
there
important
elements
that
we
must
find?
13. Soqware
used
• Assembler
– 454
Newbler
– DNAStar
Lasergene
soqware
hpp://www.dnastar.com/products/lasergene.php
• Good
visual
tool
for
exploring
– Applied
MATHS
– Celera
asseblers
• AnnotaRon
– FGENESB
hpp://www.soqberry.com/
– GeneMark
– GLIMMER
– IS
Finder
hpp://www-‐is.biotoul.fr
– TIGR/JCVI
&
Manatee
• Synteny
– Mauve
• View
– Circos
for
represenRng
informaRon
in
a
circular
way,
cute
images.
– Consed