SlideShare una empresa de Scribd logo
1 de 21
Informatics and inference in a
sequenced world
Dr. Joe Parker
Early Career Research Fellow (Phylogenomics)
Royal Botanic Gardens, Kew
@lonelyjoeparker:
Joe Parker - background 2
VL 4 length
Average
VL 1 length
≤
3
4
≤
2
7
>
2
7
>
3
4
Neut -
Neut +
Incredible times for bioscience 3
Images – Wikimedia commons CC BY-SA
(clockwise from top left: Jeroen Rouwkema, @aGastya, author’s own, @RE73)
Step back: molecular evolution 4
“Horizontal gene transfer occurs x more frequently in these lineages, because of
this biology”
“Convergent evolution is rare in most genes, in most organisms, but y times
greater in these gene families …because of this biology”
“New chomosomes are created & destroyed at z, q, rates in this reproductive
strategy …because of this biology”
Field-based DNA
sequencing
Snowdonia, HelloWorld & ‘tent-seq’ 6
A. thaliana Arabidopsis lyrata
Congeneric species;
Reference genomes available
Field-sequenced (MinION) &
Lab-sequenced (Illumina™)
Orthogonal BLAST:
4 sample*sequencer combinations
Compare TRUE & FALSE rates for
varying ID statistic cutoffs
Tasty pics 7
Conditions
100% humidity; 6-13ºC
Essential kit
800w generator
3x laptops
Centrifuge
Waterbath
Polystyrene boxes (lots)
Kettle(…!)
Yield
>400Mbp data in three days;
A. thaliana ~2.01x coverage
Field- vs. lab-sequenced sample ID 8
Match individual reads to each
reference with BLAST
Compare match lengths in
TRUE and FALSE cases
‘Length bias’ ID stat:
lengthTRUE - lengthFALSE
Compare TRUE & FALSE rates as
length bias cutoff varies
MiSeq (lab)
MinION (field)
Bitty data (1) partial queries 9
Subsample MinION output
Repeat ID pipeline, record
mean ID stat sbias
Replicates: N = 30
Simulate from 100 – 104 reads
(≈instant → hours)
Bitty data (2) partial references 10
Take reference genome at
high contiguity
Fragment randomly to
target (low) contiguity
Repeat read identification
using fragmented DB
Simulate N50 ≈1,000bp
to N50 ≈ 10Mbp
Keeping it simple: Kew Science Festival 11
Six species: whole genome-
skim samples with MinION in
preparation
Build BLAST DBs from skimmed
data
Select ‘unknown’ (blinded)
sample, extract DNA and
resequence in real-time
Compare to partial DBs in six-
way BLAST competition
Live ID ?
de novo genome assembly 12
Data MiSeq only MiSeq + MinION
Assembler Abyss hybridSPAdes
Illumina reads, 300bp paired-end 8,033,488 8,033,488
Illumina data (yield) 2,418 Mbp 2,418 Mbp
MinION reads, R7.3 + R9 kits,
N50 ~ 4,410bp
- 96,845
MinION data (yield) - 240 Mbp
Approx. coverage 19.49x 19.49x + 2.01x
Assembly key statistics:
# contigs 24,999 10,644
Longest contig 90 Kbp 414 Kbp
N50 contiguity 7,853 bp 48,730 bp
Fraction of reference genome (%) 82 88
Errors, per 100 kbp: #N’s 1.7 5.4
# mismatches 518 588
# indels 120 130
Largest alignment 76,935 bp 264,039 bp
CEGMA gene completeness estimate:
# genes 219 of 248 245 of 248
% genes 88% 99%
Wait – genes? 13
Entire chloroplast
genome (~150kbp)
Plastid
coding loci
Individual field-
sequenced MinION
reads
Real-time phylogenomics 14
Filtered
reads
Gene
models
TAIR10
CDS code
Annotation SNAP 1:1 reciprocal BLAST
Multiple sequence
alignments
MUSCLE
Trimal
Gene trees → Consensus tree
*BEAST
RAxML, TreeAnnotator
Cumulative counts:
Unique genes
All genes
(‘Lab’ being
transported!)
Emerging health threats & globalisation 15
Acute oak decline:
A syndrome-type oak disease
• Unknown cause, no treatment
• ca. 200 million oaks in GB
…amenity & timber value: ~£500/tree
• Emerged ca. 2004, spreading rapidly
• Significant morbidity and mortality
Defra ‘Futureproofing Plant Health’ initiative
• Test field-based methods
• Balanced survey of microbial community composition
(healthy & affected individuals)
• Overcome ascertainment bias
• Pilot training of non-experts.
• Draw conclusions relevant to rapid-response plant
health monitoring in the UK.
© 2016 Katy Reed / Forest Research
Recap 16
From lab-based…
… to ‘app store’ genomics
Problems with phylogeny… and comparative genomics 17
Suh (2016) Zool. Scripta.
doi:10.1111/zsc.12213
Zapata et al. (2016) PNAS
113:E4052-E4060
©2016 National Academy of Sciences
Key:
Extant node
Inferred node
Synteny edge (physical connection
Phylogeny edge (evolutionary connection)
Identity edge (organismal connection)
Three-colour graphs: phylogeny, synteny & identity 18
a b c d
x y
z
e
a
a
Three-colour graphs: phylogeny, synteny & identity 19
a1 b1 a2 b2 a3 b3 b’3
a4 b4
a5 b5
Duplication
a1 b1 a2 b2
a3 b3
a4 b4
x4 y4
x3 y3
x1 y1x2
Tetraploid
hybrid
formed
Diploidization
Key:
Extant node
Inferred node
Synteny edge (physical connection)
Phylogeny edge (evolutionary connection)
Identity edge (organismal connection)a1 b1
b2 a2
a3 b3
c1
c3
c1
Inversion
a1 b1
a2 b2
x1
x5
x2
x3
x4
x7
x6
HGT
Final thoughts 20
bionode.js
bioboxes.org
Singularity
Portable sequencing, by anyone means
really Big Data
Informatics connecting this data through
explicit models is inference
Scalable, reproducible, sustainable research:
Thanks, funders, contacts and questions 21
Oxford Nanopore
Technologies Ltd.
Dan Turner, Richard
Ronan, Gerrard CoyneRBG Kew:
Alexander S.T. Papadopulos (@metallophyte)
Andrew Helmstetter (@ajhelmstetter)
Dion Devey, Robyn Cowan, Tim Wilkinson, Stephen Dodsworth, Pepijn Kooij, Felix Forest, Bill Baker, Jan T.
Kim, Jenny Williams, Abigail Barker, Mark Lee, Jim Clarkson, Mike Chester, Ester Gaya, Lisa Pokorny, Laszlo
Csiba, Paul Wilkin, Richard Buggs, Mike Fay, Mark Chase, Ilia Leitch
QMUL
Laura Kelly, Kalina Davies, Steve Rossiter
Oxford
Aris Katzourakis, Oli Pybus, Jayna Raghwani
Others
Forest Research: Daegan Inward, Katy Reed
Dstl: Claire Lonstale, James Taylor
Birmingham: Nick Loman, Josh Quick
U. Utah: Bryn Dentinger
Imperial: James Rosindell
This research was
conducted in the
Sackler Phylogenomics
Laboratory and was
supported by the
Calleva Foundation
Phylogenomic Research
Programme and the
Sackler Trust
@lonelyjoeparker:
joe.parker@kew.org

Más contenido relacionado

La actualidad más candente

Sequencing 2016
Sequencing 2016Sequencing 2016
Sequencing 2016Surya Saha
 
Pipeline or pipe dream - Midlands Micro Meeting UK - mon 15 sep 2014
Pipeline or pipe dream - Midlands Micro Meeting UK - mon 15 sep 2014Pipeline or pipe dream - Midlands Micro Meeting UK - mon 15 sep 2014
Pipeline or pipe dream - Midlands Micro Meeting UK - mon 15 sep 2014Torsten Seemann
 
Avances en genética. Utilidad de la NGS y la bioinformática.
Avances en genética. Utilidad de la NGS y la bioinformática.Avances en genética. Utilidad de la NGS y la bioinformática.
Avances en genética. Utilidad de la NGS y la bioinformática.BBK Innova Sarea
 
A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015
A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015
A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015Torsten Seemann
 
Real-time Phylogenomics: Joe Parker
Real-time Phylogenomics: Joe ParkerReal-time Phylogenomics: Joe Parker
Real-time Phylogenomics: Joe ParkerJoe Parker
 
Single-molecule real-time (SMRT) Nanopore sequencing for Plant Pathology appl...
Single-molecule real-time (SMRT) Nanopore sequencing for Plant Pathology appl...Single-molecule real-time (SMRT) Nanopore sequencing for Plant Pathology appl...
Single-molecule real-time (SMRT) Nanopore sequencing for Plant Pathology appl...Joe Parker
 
Field-based, real-time metagenomics and phylogenomics for responsive pathogen...
Field-based, real-time metagenomics and phylogenomics for responsive pathogen...Field-based, real-time metagenomics and phylogenomics for responsive pathogen...
Field-based, real-time metagenomics and phylogenomics for responsive pathogen...Joe Parker
 
Expanding Your Research Capabilities Using Targeted NGS
Expanding Your Research Capabilities Using Targeted NGSExpanding Your Research Capabilities Using Targeted NGS
Expanding Your Research Capabilities Using Targeted NGSIntegrated DNA Technologies
 
Lab in a Suitcase and Other Adventures with Nanopore Sequencing
Lab in a Suitcase and Other Adventures with Nanopore SequencingLab in a Suitcase and Other Adventures with Nanopore Sequencing
Lab in a Suitcase and Other Adventures with Nanopore Sequencingscalene
 
How to Standardise and Assemble Raw Data into Sequences: What Does it Mean fo...
How to Standardise and Assemble Raw Data into Sequences: What Does it Mean fo...How to Standardise and Assemble Raw Data into Sequences: What Does it Mean fo...
How to Standardise and Assemble Raw Data into Sequences: What Does it Mean fo...Joseph Hughes
 
Cross-Kingdom Standards in Genomics, Epigenomics and Metagenomics
Cross-Kingdom Standards in Genomics, Epigenomics and MetagenomicsCross-Kingdom Standards in Genomics, Epigenomics and Metagenomics
Cross-Kingdom Standards in Genomics, Epigenomics and Metagenomics Christopher Mason
 
Rapid outbreak characterisation - UK Genome Sciences 2014 - wed 3 sep 2014
Rapid outbreak characterisation  - UK Genome Sciences 2014 - wed 3 sep 2014Rapid outbreak characterisation  - UK Genome Sciences 2014 - wed 3 sep 2014
Rapid outbreak characterisation - UK Genome Sciences 2014 - wed 3 sep 2014Torsten Seemann
 
New Generation Sequencing Technologies: an overview
New Generation Sequencing Technologies: an overviewNew Generation Sequencing Technologies: an overview
New Generation Sequencing Technologies: an overviewPaolo Dametto
 
How to fingerprint a bat
How to fingerprint a batHow to fingerprint a bat
How to fingerprint a batDavid Martin
 
High Throughput Sequencing Technologies: What We Can Know
High Throughput Sequencing Technologies: What We Can KnowHigh Throughput Sequencing Technologies: What We Can Know
High Throughput Sequencing Technologies: What We Can KnowBrian Krueger
 
Evolution of DNA Sequencing - talk by Jonathan Eisen for the Bodega Workshop ...
Evolution of DNA Sequencing - talk by Jonathan Eisen for the Bodega Workshop ...Evolution of DNA Sequencing - talk by Jonathan Eisen for the Bodega Workshop ...
Evolution of DNA Sequencing - talk by Jonathan Eisen for the Bodega Workshop ...Jonathan Eisen
 

La actualidad más candente (20)

Sequencing 2016
Sequencing 2016Sequencing 2016
Sequencing 2016
 
A Journey Through The History Of DNA Sequencing
A Journey Through The History Of DNA Sequencing A Journey Through The History Of DNA Sequencing
A Journey Through The History Of DNA Sequencing
 
Pipeline or pipe dream - Midlands Micro Meeting UK - mon 15 sep 2014
Pipeline or pipe dream - Midlands Micro Meeting UK - mon 15 sep 2014Pipeline or pipe dream - Midlands Micro Meeting UK - mon 15 sep 2014
Pipeline or pipe dream - Midlands Micro Meeting UK - mon 15 sep 2014
 
Avances en genética. Utilidad de la NGS y la bioinformática.
Avances en genética. Utilidad de la NGS y la bioinformática.Avances en genética. Utilidad de la NGS y la bioinformática.
Avances en genética. Utilidad de la NGS y la bioinformática.
 
A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015
A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015
A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015
 
Real-time Phylogenomics: Joe Parker
Real-time Phylogenomics: Joe ParkerReal-time Phylogenomics: Joe Parker
Real-time Phylogenomics: Joe Parker
 
Single-molecule real-time (SMRT) Nanopore sequencing for Plant Pathology appl...
Single-molecule real-time (SMRT) Nanopore sequencing for Plant Pathology appl...Single-molecule real-time (SMRT) Nanopore sequencing for Plant Pathology appl...
Single-molecule real-time (SMRT) Nanopore sequencing for Plant Pathology appl...
 
A brief history of DNA sequencing
A brief history of DNA sequencingA brief history of DNA sequencing
A brief history of DNA sequencing
 
Field-based, real-time metagenomics and phylogenomics for responsive pathogen...
Field-based, real-time metagenomics and phylogenomics for responsive pathogen...Field-based, real-time metagenomics and phylogenomics for responsive pathogen...
Field-based, real-time metagenomics and phylogenomics for responsive pathogen...
 
Expanding Your Research Capabilities Using Targeted NGS
Expanding Your Research Capabilities Using Targeted NGSExpanding Your Research Capabilities Using Targeted NGS
Expanding Your Research Capabilities Using Targeted NGS
 
Lab in a Suitcase and Other Adventures with Nanopore Sequencing
Lab in a Suitcase and Other Adventures with Nanopore SequencingLab in a Suitcase and Other Adventures with Nanopore Sequencing
Lab in a Suitcase and Other Adventures with Nanopore Sequencing
 
How to Standardise and Assemble Raw Data into Sequences: What Does it Mean fo...
How to Standardise and Assemble Raw Data into Sequences: What Does it Mean fo...How to Standardise and Assemble Raw Data into Sequences: What Does it Mean fo...
How to Standardise and Assemble Raw Data into Sequences: What Does it Mean fo...
 
Cross-Kingdom Standards in Genomics, Epigenomics and Metagenomics
Cross-Kingdom Standards in Genomics, Epigenomics and MetagenomicsCross-Kingdom Standards in Genomics, Epigenomics and Metagenomics
Cross-Kingdom Standards in Genomics, Epigenomics and Metagenomics
 
T-bioinfo overview
T-bioinfo overviewT-bioinfo overview
T-bioinfo overview
 
Rapid outbreak characterisation - UK Genome Sciences 2014 - wed 3 sep 2014
Rapid outbreak characterisation  - UK Genome Sciences 2014 - wed 3 sep 2014Rapid outbreak characterisation  - UK Genome Sciences 2014 - wed 3 sep 2014
Rapid outbreak characterisation - UK Genome Sciences 2014 - wed 3 sep 2014
 
Big data nebraska
Big data nebraskaBig data nebraska
Big data nebraska
 
New Generation Sequencing Technologies: an overview
New Generation Sequencing Technologies: an overviewNew Generation Sequencing Technologies: an overview
New Generation Sequencing Technologies: an overview
 
How to fingerprint a bat
How to fingerprint a batHow to fingerprint a bat
How to fingerprint a bat
 
High Throughput Sequencing Technologies: What We Can Know
High Throughput Sequencing Technologies: What We Can KnowHigh Throughput Sequencing Technologies: What We Can Know
High Throughput Sequencing Technologies: What We Can Know
 
Evolution of DNA Sequencing - talk by Jonathan Eisen for the Bodega Workshop ...
Evolution of DNA Sequencing - talk by Jonathan Eisen for the Bodega Workshop ...Evolution of DNA Sequencing - talk by Jonathan Eisen for the Bodega Workshop ...
Evolution of DNA Sequencing - talk by Jonathan Eisen for the Bodega Workshop ...
 

Similar a Inference and informatics in a 'sequenced' world

Similar a Inference and informatics in a 'sequenced' world (20)

20140710 6 c_mason_ercc2.0_workshop
20140710 6 c_mason_ercc2.0_workshop20140710 6 c_mason_ercc2.0_workshop
20140710 6 c_mason_ercc2.0_workshop
 
2014 davis-talk
2014 davis-talk2014 davis-talk
2014 davis-talk
 
2014 villefranche
2014 villefranche2014 villefranche
2014 villefranche
 
2014 naples
2014 naples2014 naples
2014 naples
 
GIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant poster
 
20150601 bio sb_assembly_course
20150601 bio sb_assembly_course20150601 bio sb_assembly_course
20150601 bio sb_assembly_course
 
Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.
Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.
Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.
 
Tyler future of genomics thurs 0920
Tyler future of genomics thurs 0920Tyler future of genomics thurs 0920
Tyler future of genomics thurs 0920
 
Giab agbt SVs_2019
Giab agbt SVs_2019Giab agbt SVs_2019
Giab agbt SVs_2019
 
Dna microarray mehran- u of toronto
Dna microarray  mehran- u of torontoDna microarray  mehran- u of toronto
Dna microarray mehran- u of toronto
 
155 dna microarray
155 dna microarray155 dna microarray
155 dna microarray
 
155 dna microarray
155 dna microarray155 dna microarray
155 dna microarray
 
Dna microarray mehran
Dna microarray  mehranDna microarray  mehran
Dna microarray mehran
 
Microarrays;application
Microarrays;applicationMicroarrays;application
Microarrays;application
 
Introduction to Apollo for i5k
Introduction to Apollo for i5kIntroduction to Apollo for i5k
Introduction to Apollo for i5k
 
Aug2013 illumina platinum genomes
Aug2013 illumina platinum genomesAug2013 illumina platinum genomes
Aug2013 illumina platinum genomes
 
2014 whitney-research
2014 whitney-research2014 whitney-research
2014 whitney-research
 
Sweden_eemis_big_data
Sweden_eemis_big_dataSweden_eemis_big_data
Sweden_eemis_big_data
 
2013 duke-talk
2013 duke-talk2013 duke-talk
2013 duke-talk
 
Synthetic biology
Synthetic biologySynthetic biology
Synthetic biology
 

Más de Joe Parker

Challenges and potential of real-time phylogenomics: lessons from a metagenom...
Challenges and potential of real-time phylogenomics: lessons from a metagenom...Challenges and potential of real-time phylogenomics: lessons from a metagenom...
Challenges and potential of real-time phylogenomics: lessons from a metagenom...Joe Parker
 
Reframing Phylogenomics
Reframing PhylogenomicsReframing Phylogenomics
Reframing PhylogenomicsJoe Parker
 
Real-time Phylogenomics: Joe Parker
Real-time Phylogenomics: Joe ParkerReal-time Phylogenomics: Joe Parker
Real-time Phylogenomics: Joe ParkerJoe Parker
 
'Omics in extreme Environments (Lightweight bioinformatics)
'Omics in extreme Environments (Lightweight bioinformatics)'Omics in extreme Environments (Lightweight bioinformatics)
'Omics in extreme Environments (Lightweight bioinformatics)Joe Parker
 
Interpreting ‘tree space’ in the context of very large empirical datasets
Interpreting ‘tree space’ in the context of very large empirical datasetsInterpreting ‘tree space’ in the context of very large empirical datasets
Interpreting ‘tree space’ in the context of very large empirical datasetsJoe Parker
 
Phylogenomic methods for comparative evolutionary biology - University Colleg...
Phylogenomic methods for comparative evolutionary biology - University Colleg...Phylogenomic methods for comparative evolutionary biology - University Colleg...
Phylogenomic methods for comparative evolutionary biology - University Colleg...Joe Parker
 
Phylogenomic Convergence Detection - Evolutionary Biology Meeting in Marseill...
Phylogenomic Convergence Detection - Evolutionary Biology Meeting in Marseill...Phylogenomic Convergence Detection - Evolutionary Biology Meeting in Marseill...
Phylogenomic Convergence Detection - Evolutionary Biology Meeting in Marseill...Joe Parker
 

Más de Joe Parker (7)

Challenges and potential of real-time phylogenomics: lessons from a metagenom...
Challenges and potential of real-time phylogenomics: lessons from a metagenom...Challenges and potential of real-time phylogenomics: lessons from a metagenom...
Challenges and potential of real-time phylogenomics: lessons from a metagenom...
 
Reframing Phylogenomics
Reframing PhylogenomicsReframing Phylogenomics
Reframing Phylogenomics
 
Real-time Phylogenomics: Joe Parker
Real-time Phylogenomics: Joe ParkerReal-time Phylogenomics: Joe Parker
Real-time Phylogenomics: Joe Parker
 
'Omics in extreme Environments (Lightweight bioinformatics)
'Omics in extreme Environments (Lightweight bioinformatics)'Omics in extreme Environments (Lightweight bioinformatics)
'Omics in extreme Environments (Lightweight bioinformatics)
 
Interpreting ‘tree space’ in the context of very large empirical datasets
Interpreting ‘tree space’ in the context of very large empirical datasetsInterpreting ‘tree space’ in the context of very large empirical datasets
Interpreting ‘tree space’ in the context of very large empirical datasets
 
Phylogenomic methods for comparative evolutionary biology - University Colleg...
Phylogenomic methods for comparative evolutionary biology - University Colleg...Phylogenomic methods for comparative evolutionary biology - University Colleg...
Phylogenomic methods for comparative evolutionary biology - University Colleg...
 
Phylogenomic Convergence Detection - Evolutionary Biology Meeting in Marseill...
Phylogenomic Convergence Detection - Evolutionary Biology Meeting in Marseill...Phylogenomic Convergence Detection - Evolutionary Biology Meeting in Marseill...
Phylogenomic Convergence Detection - Evolutionary Biology Meeting in Marseill...
 

Último

All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyDrAnita Sharma
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 

Último (20)

All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomology
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 

Inference and informatics in a 'sequenced' world

  • 1. Informatics and inference in a sequenced world Dr. Joe Parker Early Career Research Fellow (Phylogenomics) Royal Botanic Gardens, Kew @lonelyjoeparker:
  • 2. Joe Parker - background 2 VL 4 length Average VL 1 length ≤ 3 4 ≤ 2 7 > 2 7 > 3 4 Neut - Neut +
  • 3. Incredible times for bioscience 3 Images – Wikimedia commons CC BY-SA (clockwise from top left: Jeroen Rouwkema, @aGastya, author’s own, @RE73)
  • 4. Step back: molecular evolution 4 “Horizontal gene transfer occurs x more frequently in these lineages, because of this biology” “Convergent evolution is rare in most genes, in most organisms, but y times greater in these gene families …because of this biology” “New chomosomes are created & destroyed at z, q, rates in this reproductive strategy …because of this biology”
  • 6. Snowdonia, HelloWorld & ‘tent-seq’ 6 A. thaliana Arabidopsis lyrata Congeneric species; Reference genomes available Field-sequenced (MinION) & Lab-sequenced (Illumina™) Orthogonal BLAST: 4 sample*sequencer combinations Compare TRUE & FALSE rates for varying ID statistic cutoffs
  • 7. Tasty pics 7 Conditions 100% humidity; 6-13ºC Essential kit 800w generator 3x laptops Centrifuge Waterbath Polystyrene boxes (lots) Kettle(…!) Yield >400Mbp data in three days; A. thaliana ~2.01x coverage
  • 8. Field- vs. lab-sequenced sample ID 8 Match individual reads to each reference with BLAST Compare match lengths in TRUE and FALSE cases ‘Length bias’ ID stat: lengthTRUE - lengthFALSE Compare TRUE & FALSE rates as length bias cutoff varies MiSeq (lab) MinION (field)
  • 9. Bitty data (1) partial queries 9 Subsample MinION output Repeat ID pipeline, record mean ID stat sbias Replicates: N = 30 Simulate from 100 – 104 reads (≈instant → hours)
  • 10. Bitty data (2) partial references 10 Take reference genome at high contiguity Fragment randomly to target (low) contiguity Repeat read identification using fragmented DB Simulate N50 ≈1,000bp to N50 ≈ 10Mbp
  • 11. Keeping it simple: Kew Science Festival 11 Six species: whole genome- skim samples with MinION in preparation Build BLAST DBs from skimmed data Select ‘unknown’ (blinded) sample, extract DNA and resequence in real-time Compare to partial DBs in six- way BLAST competition Live ID ?
  • 12. de novo genome assembly 12 Data MiSeq only MiSeq + MinION Assembler Abyss hybridSPAdes Illumina reads, 300bp paired-end 8,033,488 8,033,488 Illumina data (yield) 2,418 Mbp 2,418 Mbp MinION reads, R7.3 + R9 kits, N50 ~ 4,410bp - 96,845 MinION data (yield) - 240 Mbp Approx. coverage 19.49x 19.49x + 2.01x Assembly key statistics: # contigs 24,999 10,644 Longest contig 90 Kbp 414 Kbp N50 contiguity 7,853 bp 48,730 bp Fraction of reference genome (%) 82 88 Errors, per 100 kbp: #N’s 1.7 5.4 # mismatches 518 588 # indels 120 130 Largest alignment 76,935 bp 264,039 bp CEGMA gene completeness estimate: # genes 219 of 248 245 of 248 % genes 88% 99%
  • 13. Wait – genes? 13 Entire chloroplast genome (~150kbp) Plastid coding loci Individual field- sequenced MinION reads
  • 14. Real-time phylogenomics 14 Filtered reads Gene models TAIR10 CDS code Annotation SNAP 1:1 reciprocal BLAST Multiple sequence alignments MUSCLE Trimal Gene trees → Consensus tree *BEAST RAxML, TreeAnnotator Cumulative counts: Unique genes All genes (‘Lab’ being transported!)
  • 15. Emerging health threats & globalisation 15 Acute oak decline: A syndrome-type oak disease • Unknown cause, no treatment • ca. 200 million oaks in GB …amenity & timber value: ~£500/tree • Emerged ca. 2004, spreading rapidly • Significant morbidity and mortality Defra ‘Futureproofing Plant Health’ initiative • Test field-based methods • Balanced survey of microbial community composition (healthy & affected individuals) • Overcome ascertainment bias • Pilot training of non-experts. • Draw conclusions relevant to rapid-response plant health monitoring in the UK. © 2016 Katy Reed / Forest Research
  • 16. Recap 16 From lab-based… … to ‘app store’ genomics
  • 17. Problems with phylogeny… and comparative genomics 17 Suh (2016) Zool. Scripta. doi:10.1111/zsc.12213 Zapata et al. (2016) PNAS 113:E4052-E4060 ©2016 National Academy of Sciences
  • 18. Key: Extant node Inferred node Synteny edge (physical connection Phylogeny edge (evolutionary connection) Identity edge (organismal connection) Three-colour graphs: phylogeny, synteny & identity 18 a b c d x y z e a a
  • 19. Three-colour graphs: phylogeny, synteny & identity 19 a1 b1 a2 b2 a3 b3 b’3 a4 b4 a5 b5 Duplication a1 b1 a2 b2 a3 b3 a4 b4 x4 y4 x3 y3 x1 y1x2 Tetraploid hybrid formed Diploidization Key: Extant node Inferred node Synteny edge (physical connection) Phylogeny edge (evolutionary connection) Identity edge (organismal connection)a1 b1 b2 a2 a3 b3 c1 c3 c1 Inversion a1 b1 a2 b2 x1 x5 x2 x3 x4 x7 x6 HGT
  • 20. Final thoughts 20 bionode.js bioboxes.org Singularity Portable sequencing, by anyone means really Big Data Informatics connecting this data through explicit models is inference Scalable, reproducible, sustainable research:
  • 21. Thanks, funders, contacts and questions 21 Oxford Nanopore Technologies Ltd. Dan Turner, Richard Ronan, Gerrard CoyneRBG Kew: Alexander S.T. Papadopulos (@metallophyte) Andrew Helmstetter (@ajhelmstetter) Dion Devey, Robyn Cowan, Tim Wilkinson, Stephen Dodsworth, Pepijn Kooij, Felix Forest, Bill Baker, Jan T. Kim, Jenny Williams, Abigail Barker, Mark Lee, Jim Clarkson, Mike Chester, Ester Gaya, Lisa Pokorny, Laszlo Csiba, Paul Wilkin, Richard Buggs, Mike Fay, Mark Chase, Ilia Leitch QMUL Laura Kelly, Kalina Davies, Steve Rossiter Oxford Aris Katzourakis, Oli Pybus, Jayna Raghwani Others Forest Research: Daegan Inward, Katy Reed Dstl: Claire Lonstale, James Taylor Birmingham: Nick Loman, Josh Quick U. Utah: Bryn Dentinger Imperial: James Rosindell This research was conducted in the Sackler Phylogenomics Laboratory and was supported by the Calleva Foundation Phylogenomic Research Programme and the Sackler Trust @lonelyjoeparker: joe.parker@kew.org

Notas del editor

  1. Welcome, thanks, menu Formal introduction and thanks; Lay out the menu / journey I’ll mainly be talking about work in the last 2.5 yrs since taking up my ECRF at Kew
  2. Wide range of taxa, techniques and questions. Enough to set my scene without taking ages, confusing/losing audience, or giving the impression I’m just a tools-bot.
  3. Start of… Incredible times Traditional to start bioinformatics talks with a slide about Moore’s law, sequencing costs, and the data deluge Actually this is a fantastic age to be living in, ever bigger analyses – and I’ll talk a lot about “real-time” phylogenomics But why? What are we attempting to discover?
  4. We need enough data to turn obervations, into empirical comparisons, into models and laws We know a lot about evolutionary mechanisms And a lot about (a handful of genomes) What we know tells us “it’s complicated” Most genes don’t have simple orthologues etc etc etc, hotizonatl etc But we don’t, really, have an empirical understanding of how they fit together, e.g.: - ”horizontal gene transfer occurs x more frequently in these lineages, because of this biology” - adaptive molecularconvergence is rare in most genes, in most organisms, but y times greater in these gene families because of this biology - new chomosomes are created (by duplication, endogenisation, polyploidy) and destroyed (by diploidization) at z rates in this reproductive strategy because of biology
  5. Portable sequencing: also long reads and real-time
  6. Direct, explicit, orthogonal test – and can it work? Picture of experimental design Outline of the study In terms of bioinformatics questions Funding: a first pot and timeline…
  7. Data in terrible conditions but anyone can do it Social media reach The Atlantic, Economist
  8. We compare match lengths, and minon allows long matches
  9. EXPLAIN AXES: precision improves rapidly
  10. EXPLAIN AXES: a partial REFERENCE would work, too
  11. MORE FUNDING. SO simple a kid could do it? Yes The challenge I set myself: OK, it’s a simple experiment. Can I buid a trest simple ehough a child can understand it? SOCIAL MEDIA Funding: NANOPORE
  12. Data from one time and place can and should be useful elsewhere lash a bit of proper genomics
  13. Single reads match whole genes – meat & drink
  14. EXPLAIN AXES postdoc-years PAPER ACCEPTED
  15. FUNDED tailor made for health research/application need to mention it somewhere because of: strategic links Building the ‘momentum narrative’ Other related stuff; VIPs etc Plant health and emerging threats A connected world means new diseases can spread globally, fast. Lay out the problem, e.g. opportunities – look! Health! Ascertainment bias! Field-portable! etc Funding: yet another pot, this one also bigger. Software etc to improve UI (ahem)
  16. HPCs to apps: Exponential data, linear understanding. Pause – to recap This is important because it’s where we tie it together and show my contribution: Portable, mass sequencing is really here Massive potential for de novo genomics; phylogenomics But while we’re accumulating information at an exponential rate, we’re integrating it linearly, in essence … where are we going?
  17. Nature is cruel: more data only muddies the water Bifurcating phylogenies are decreasingly useful and complicated to get ‘Comparative’ genomics actually uses relatively few datapoints (e.g. Encode…) In part because most phylogenetic methods require variations on homology assumption
  18. Here’s a common framework for all these studies How to infer – sounds like a nightmare Many of the edges in this network are really there already Shifting paradigms, making linking easier Explicitly model phylogeny, synteny and identity Edge support reflects evidence; deviations from neutrality reflect hypotheses/models/phenomena Any nodes connecting to an identity edge are considered completely connected Maximum # edges ~n (2n-1)/2 Digraphs ~n!! Possible ancestors from one locus on n taxa essentially inverse func of when they coalesce (can have m generations of n ancestors until an event where n(m)<n(t)
  19. EXAMPLES Gene duplication e.g. paralogue in animal Tetraploid formed then secondary diploidization, e.g. plant Inversion in a genome Unlinked loci (e.g. bacterial plasmids) and HGT. How to infer – sounds like a nightmare Many of the edges in this network are really there already Shifting paradigms, making linking easier Explicitly model phylogeny, synteny and identity Edge support reflects evidence; deviations from neutrality reflect hypotheses/models/phenomena
  20. Formally linking datasets and models is inferring the network of life Shifts the job for bioinformatics from something it’s good at – sophisiticated analysis incemental To sometheing computers in gerneral are great at: linking elements In this case informatics doesn’t enable research , it is the process of inference It’s relatively easy to write a new standalone app to do x, or analyse some big dataset Reproducibility and scaling-up science mean we must work harder on the links Informatics as inference. The lonely astronomers.
  21. Funders Thanks Reach out