2. • My background & track record
• Environmental metagenomics – existing problems
• Phylogenomics Add Maximum Value To Datasets
• Illustrative study
Outline
3. Joe Parker: Novel methods for cutting-edge science
High-throughput
phylogenomics
Parallelised analyses
Bayesian statistics
Information-theoretic
measures
NGS datasets
Integrating clinical,
genetic & molecular
data
Machine-learning and
antigen modeling
BaTS software
>100 citations
‘000s downloads
HADPACK framework
in silico HIV vaccine design
Clinical trial
ABCDet API
First genomic convergent
evolution demonstration
Nature Oct 2013
Public alpha
5. Mason et al. (2014) Metagenomics reveals sediment microbial community response to Deepwater Horizon oil spill.
The ISME J (epub ahead of print; 23rd Jan 2014; retrieved 1st Mar 2014): doi:10.1038/ismej.2013.254
Metagenomics of an environmental disaster
9. Deepwater Horizon, revisited
Continuous analyses with immediate results
Iterative sample collection / analysis; rapid cline detection
Exploit phylogenetic
methods
Detect: population
dynamics, adaptive
evolution, migration
Facilitate NGS
Gene functions and
Ecosystem services
Explicitly model errors
Account for paralogy &
horizontal transfer
Reduce
ascertainment bias
Unbiased taxon /
gene discovery
10. Dr. Joe Parker
Dr. Elizabeth Clare
Environmental metagenomics
Dr. Steve Rossiter
Phylogenomics
Prof. Richard Nichols
Population genetics
Prof. Steve Lloyd
Parallel computing
Prof. Mark Trimmer
Biogeochemistry
Dr. Jon Grey
Aquatic ecology
Prof. Alfried Vogler (NHM)
Metagenomics & turbotaxonomy
Mr. Tim Booth (NEBC)
Bio-Linux & virtual machines
Prof. Jonathan Eisen (US)
Microbial phylogenomics
Prof. Alexei Drummond (NZ)
Bayesian phylogenetics, Geneious CSO
Dr. Matthew Hahn (US)
Genomics
Dr. Aris Katzourakis (Oxford)
Phylodynamics modelling
GridPP HTC
3,000+ cores
MidPlus HPC
2,000+ cores
Genome Centre
Sequencing expertise
11. Deepwater Horizon, revisited
Continuous analyses with immediate results
Iterative sample collection / analysis; rapid cline detection
Exploit phylogenetic
methods
Detect: population
dynamics, adaptive
evolution, migration
Facilitate NGS
Gene functions and
Ecosystem services
Explicitly model errors
Account for paralogy &
horizontal transfer
Reduce
ascertainment bias
Unbiased taxon /
gene discovery
RB: more explnation of basic ideas
RK: not here – arctic microbes slide
RB: ok
me, problem, solution:
My track record and why I can take this field forward
Current analyses in env. Metag. Falling short,
Why phylogenetics add
demonstration
Throughout my career : track record of novel models, implemented in apps for others, doing cutting-edge science
Bats, >100 cites, thousands d/ls, weekly/daily user contact
Hadpack initiated entirely novel hiv analysis / vaccine design w/ machine learning, phylogenetics, GUI
Current work package for HT phylogenomics, detected convergent evol (NATURE)
**Throughput** usually in terms of sequencing , Analysis – not limited by CPU
intersection of able developers who are also users v.small
Access drives impact
Fundamental to my goals
Distributed / cloud infrastructures – no bar to entry
miniION etc exacerbate
00s I could pick, this is one - Typical example of an environmental metagenomics question: oil spill effects on marine micro?
Sediment cores, 50 sites single gene, handful of genomes
MDS could distinguish some signal w/ geochemical variables, found some taxa, some new
How many more new? Similarity based
Slow
Sequences embody Information, including important on adaptation etc - wasted
***
Deepwater Horizon (DWH) oil spill – spring 2010
~4.1 million barrels of oil to the Gulf of Mexico; >22% of this oil is unaccounted for,
64 sites by targeted sequencing of 16S rRNA genes, shotgun metagenomic sequencing of 14 samples
16S rRNA: most heavily oil-impacted sediments enriched in an uncultured Gammaproteobacterium and a Colwellia species, both of which were highly similar to sequences in the DWH deep-sea hydrocarbon plume.
The primary drivers in structuring the microbial community were nitrogen and hydrocarbons. Annotation of unassembled metagenomic data revealed the most abundant hydrocarbon degradation pathway encoded genes involved in degrading aliphatic and simple aromatics via butane monooxygenase.
Further, analysis of metagenomic sequence data revealed an increase in abundance of genes involved in denitrification pathways in samples that exceeded the Environmental Protection Agency (EPA)’s benchmarks for polycyclic aromatic hydrocarbons (PAHs) compared with those that did not. Importantly, these data demonstrate that the indigenous sediment microbiota contributed an important ecosystem service for remediation of oil in the Gulf. However, PAHs were more recalcitrant to degradation, and their persistence could have deleterious impacts on the sediment ecosystem.
Given observed microbial diversity
Phylogeny reveals evolutionary history; trait acquired once?
Or multiple times – biologically significant…
Why aren’t there more phylogenetics in environmental micro? Orthology assumptions from classical phylogenetics
Simple case, defined as orthologous when gene and species histories identical. Genes = taxa, and vice versa
Gene duplications give rise to paralogous copies, may confuse – esp similarity matching
Secondary copies.. Or deletions screw up more
!microbial communities! Horizontal transfer
This is a COMPLETELY NOVEL approach
Continuous analysis, agent-based – outputs instantly with increasing resolution
How I envisage it working:
[1] collection of short-read envir. Metagen. Sequences, low complexity
[2] tiled into pseudo assemblies by similarity clustering.
may be chimeric
may be orthologous or paralogous
I CALL THESE RAFT-ALIGNED-READS, and this step CRYSTALISATION
each raft handled by an agent
increased local order, still globally disordered
[3] we can compute phylogenetic measures along sliding windows within a raft. These measure the coherence of the evolutionary signal along the raft
[4] areas of great incoherence I CALL PHYLOGENETIC STRESS – thrse might correspond to chimeric reads, e.g. other taxa; paralogues; horiz transfer
[5] agents can compare stress values and attempt to exchange reads; proportional to stress. I CALL THIS DISLOCATION
[6] iteration towards maximally globally ordered state
More taxa / genes
Full evol. Information extracted
Explicit modelling
NGS-ready
Fast / instant
Compute / sequencing resources
QM experts, collaborators & mentors
International collaborators
Leave it there for questions
More taxa / genes
Full evol. Information extracted
Explicit modelling
NGS-ready
Fast / instant