How AI, OpenAI, and ChatGPT impact business and software.
C. elegans Genetics: Understanding Life at the Molecular Level
1. C. elegans Genetics
C.elegans has 2 sexes, self fertilizing hermaphrodites and males.
Sex determined chromosomally - XX-hermaphrodite, X-male.
Diploid for 5 autosomes.
Standard classical genetic techniques can be applied.
Life cycle – Zygote to adult ~3 days.
Grow on petri dish – they eat bacteria.
Can store them frozen in liquid nitrogen indefinately.
Why might the hermaphrodite sex be useful for genetics?
2. Chromosome I Genetic mapping.
Left arm m.u. bli-3
m.u. = map unit.
-15 egl-30
Genetic mapping – recombination.
mab-20
-10
1 m.u. is 1% recombination per meiosis.
-5 fog-1
unc-73 unc-57
Central 0 dpy-5
dpy-14
cluster fer-1
5 lin-11 unc-29
unc-75 Parent Recombinant
10
unc-101
15
20 glp-4 fog-1 + fog-1 +
25
unc-54 glp-4 + + glp-4
Right arm
3. We want to understand how life works – at the molecular level.
We had mutant genes with informative phenotypes.
The mutated genes were mapped onto linkage groups – chromosomes.
What kinds of proteins do these genes encode and how do these proteins
function?
In 1983, identifying the molecular sequence of a gene
defined by mutation was a complicated and time
consuming business, even in the worm.
If we only new the sequence of the genome!
4. As the term applies to recombinant DNA, what is a clone?
Starting with DNA extracted from
any organism,
Vector
How can you take that and get one
single fragment into a vector and
grow billions of copies of that
single “cloned” molecule?
Cloned DNA insert
5. C. elegans Genome Project
unc-101
unc-75
unc-54
unc-73
mab-20
lin-11
dpy-5
glp-4
fog-1
egl-30
fer-1
Mutants - function
bli-3
Genetic map
25
10
15
20
0
5
-15
-10
-5
Chromosomes AACGTTCCACG.......
DNA sequence – genes Cloned DNA
and proteins fragments
Identify DNA sequences corresponding to genes defined by mutation.
6. If you wanted to clone sections of chromosomes for
sequencing, how many copies of each chromosome
would you start with?
DNA
Of the order of millions – millions of copies of each chromosome
8. Cloning methods used by the C. elegans genome project
Cosmid clones – ~ 40 Kb insert size – Genomic Library.
Cosmid cloning vector Linearised cosmid vector
Random fragments of
genomic DNA –
Drug resistance marker
E. coli origin of replication millions of them.
cos site
Useful restriction sites DNA Ligase
Long concatenates of cosmid
vectors interspaced with
random fragments of
genomic DNA.
9. Mixed population “inserts” In vitro lambda packaging
extracts
Lambda Terminase
Other phage proteins
COS sites in
cosmid vector
E. coli
Critical step
Phage “transfects” single
cosmid into an E. coli cell.
10. CLONING
This is a clone
Cells are plated onto medium with antibiotic selection.
Cells grown up to form bacterial colonies.
Insert X
Each colony is derived from a single transfected cell.
Each colony is a clonal population.
E. coli - clonal population with a single
cosmid clone – single genomic DNA
fragment.
Billions of copies of one cloned
insert.
Freeze it for storage.
Purify cosmid DNA.
Sequence the insert.
Solid medium on plates Liquid culture Sub-clone fragments etc.
11. Started with many millions of different fragments of chromosomal DNA in
one tube.
End up with potentially millions of CLONED fragments, each in a different
E.coli colony – or culture.
12. We have got as far as random cloned fragments of genomic DNA.
What next?
Average cosmid insert size – 40 Kb
C.elegans genome ~100.3 Mb = 100,300 Kb
100,300/40 = 2,507.5
i.e. ~2,500 cosmid clones could contain the entire C. elegans
genome – but WOULD they?
13. In principle, 2500 cosmid clones could contain all the DNA of the C. elegans
genome.
Why not just start sequencing ~2500 clones picked at random?
Imagine this:
I give you a large and awkwardly shaped dice with 2500 faces, with a single
number on each face, the numbers 1-2500.
Roll the dice and write down the number on top.
Repeat this – again and again and…….
How many times would you have to roll the dice so that every face of the dice
would have been on top at least once?
~ 4x2500 will give ~95% probability of any one side or DNA fragment, appearing.
~10x2500 raises probability to ~99%
14. The Golden Path
What if you could identify clones that overlapped slightly with ones another?
How can we get these clones?
Cloned DNA fragments – moderate overlaps.
With this approach you could sequence the entire genome by
sequencing less than 5000 cosmid clones (2x2500)
15. Cosmid fingerprinting
1. Restriction digest of cosmid DNA.
2. Separate fragments according to size by gel electrophoresis.
3. Digitise the ladder of different sized DNA fragments obtained.
Multiple common fragments – clones probably overlap.
C. elegans genome project, ~17,000 cosmid clones fingerprinted.
A B C
Assembled into “contigs” – overlapping clones.
“Contig” ~17,000 random cosmid clones
A Fingerprinting ~700 contigs
B
C
D C.elegans genome 100 Mb
~2,500 cosmid clones
16. 700 contigs.
What is the minimum number of contigs the C. elegans genome could be
contained in?
Or – how would we know when we had succeeded in joining all the contigs?
A method of filling the gaps – joining the contigs – was needed.
17. YACs – Yeast Artificial Chromosomes
DNA inserts of ~100 kb – 2 Mb.
Grown in yeast.
Clonal growth of yeast colonies, much like cosmids in E. coli.
YAC DNA separated by pulsed-field gel electrophoresis.
C. elegans genome is ~100 Mb.
Cosmid clones – approximately 40 kb inserts.
YAC clones – select average 500 kb inserts.
~2500 cosmid clones would permit 1x coverage of the genome.
~200 YAC clones would permit 1x coverage of the genome.
19. Joining up the contigs
Contig X Contig Y
YAC clone
~700 contigs – grids of
representative cosmid clones. • Large YAC clones (> 1Mb).
• Purify YAC DNA – (PFGE).
• Radio-label YAC DNA.
• Hybridise to cosmid grid.
• Expose to X-ray film.
Linked cosmid clones
20. unc-101
unc-75
unc-54
unc-73
mab-20
lin-11
dpy-5
glp-4
fog-1
egl-30
fer-1
bli-3
Genetic map
10
15
20
25
0
5
-15
-10
-5
A physical map of the genome - the “Golden Path” – chromosomes represented in ordered
overlapping clones or “clone contigs”.
YACs
Cosmids
The Sequence of The Genome
21. Sequencing the C. elegans Genome
Individual cosmid clone.
Randomly fragmented and shotgun cloned into
sequencing vectors.
Generally smaller insert size is best for primary
sequence determination – 2-10 Kb.
Sequence of cosmid or YAC etc, determined and compiled in silico.
Finishing – directed cloning to fill in any gaps.
Check for overlap of sequence with overlapping cosmids.
22. Gaps between cosmid contigs ~20% of genome.
Most of these gaps were not random. They contained regions that could not be
cloned in cosmids.
YAC clones covering most of the gaps.
YAC DNA shotgun cloned into M13 or plasmid vectors.
Most of the DNA contained in these awkward regions was successfully sub-cloned
into small insert size vectors, and sequenced.
The sequence as published in December 1998 was generated from:
2527 cosmids, 257 YACs, 113 fosmids, 44 PCR products.
24. Genome sequence of C.elegans.
Sequence of entire genome.
Sequence of cDNA clones.
Approximately 19,500 predicted protein coding
gene sequences.
Large number of various kinds of functional
RNAs – not discuss further.
For this lecture – focus predicted proteins.
Gene prediction? How?
Science, December 1998.
25. Computer based predictions
GENEFINDER
Biases in coding sequence - in C. elegans non-coding is AT rich.
Splice site signals, initiator methionines, termination codons.
Likely exons and probable/possible splice patterns.
• Evidence that a prediction is correct?
• Homology with genes in other organisms – homologues.
• Known protein families.
•Experimental evidence.