SlideShare una empresa de Scribd logo
1 de 40
Repeatable plant pathology
bioinformatic analysis:
Not everything is NGS data.
Peter Cock & Leighton Pritchard
Genomics for Non-Model Organisms
ISMB/ECCB - Workshop 6
Vienna, Austria
19 July 2011
JHI Plant Pathology
 We work on a range of
organisms
 Plant Viruses
 Bacteria
 Oomycetes
 Fungi
 Nematodes
 Aphids (as virus vectors)
 Many genome sequences
now available…
JHI Dundee site, formerly SCRI
(Scottish Crop Research Institute)
Plant Pathogen Genomics
 Sequencing of plant pathogens becoming common
 I will focus on the protein complement today
(i.e. What happens after assembly of NGS data)
 Often looking at first genome for a genus
 Automated gene annotation from other species limited
 In particular we want to find “effectors”
 i.e. proteins used to manipulate host
Effector Protein Analysis
e.g. Type III
secretion
Pathogen
Planthost
Host nucleus
e.g. transcriptional
regulators
Export
Import
Injection e.g. protease inhibitors
(block plant defence)
LocalisationGolgi
Effector in transit:
Active effector:
Example: Effector Protein Analysis
 Want to identify effector genes, e.g.
 Similarity to known effectors (e.g. with BLAST)
 Signal peptides
 Localization signals
 Possible horizontal gene transfer (e.g. different GC%)
 Part of larger task of automated gene annotation, e.g.
 HMMER or RPS-BLAST domain searches
 GO annotation with Blast2GO
Example: Nematode Effector Proteins
 To identify candidate effector genes:
 Want a signal peptide (for export)
 Don’t want a transmembrane domain (not secreted)
 How? Initially used a Python script calling
 SignalP 3.0
 TMHMM 2.0
 Now we make this easy to use, thanks to Galaxy…
See Jones et al. (2009), Molecular Plant Pathology
What is Galaxy?
 Web system for running many analysis tools remotely:
 Upload your data to Galaxy (or access shared data)
 Run analysis on Galaxy server/cluster
 Download results (or share them)
 Designed to link multiple analysis steps as workflows:
 Repeatable
 Shareable
Our local Galaxy Server
Will upload a FASTA file
of predicted proteins
Step 1 – Upload FASTA file
Step 1 – Upload FASTA file
Step 2 – Run workflow
Step 2 – Run workflow
Step 2 – Run workflow
Step 2 – Run workflow
Step 2 – Run workflow
Step 2 – Run workflow
Step 3 – Get results
Example: Effector Protein Analysis
 To identify candidate effector genes:
 Want a signal peptide (for export)
 Don’t want a transmembrane domain (not secreted)
 Now use Galaxy workflow calling:
 SignalP 3.0
 TMHMM 2.0
 I can share this workflow:
 Easily available to anyone using our Server
 If tools installed, could share with other Galaxy Servers
 Easy for our Biologists to apply workflow to their own data
See Jones et al. (2009), Molecular Plant Pathology & Kikuchi et al. (2011), PLoS Pathogens
Workflow Editor – Effector finding
Why Galaxy?
 Hi Peter, could you run a big BLAST job for me?
 Everyone using standalone BLAST is not practical
 Want a local BLAST web interface with multiple-query support
 Group XXX have just published the YYY genome – could you
look for ZZZ proteins please?
 With a suitable interface, lots of analyses are simple enough for
non-bioinformaticians to run and interpret
 You remember that analysis we did last year? I want to do it
again on this new genome
 Running old scripts on new data is tedious
 Workflows should be reproducible
Why Galaxy? Plus Points
 Don’t have to worry about local software installation
 Mostly Windows here at JHI, most tools need Linux
 Uniform web based GUI for wrapped tools
 Web interfaces are all different
 Command line tools are scary
 Coupling tools together as sharable repeatable workflows
 Can share data files (better than email/shared drives)
 Open Source (extendable, free)
 Almost any tool can be added
Why Galaxy? Downsides
 Investment in training users
 But interface is consistent across tools
 Bugs in Galaxy
 Most issues arise when wrapping tools
 Missing tools
 Have to invest time wrapping things we need
Protein Analysis Tools in our Galaxy
All take a FASTA protein file as input, return a tabular file.
 Sequence similarity
 NCBI BLAST+
 Blast2GO
 Transmembrane domains
 TMHMM
 Signal Peptides/Motifs
 SignalP
 EffectiveT3
 RXLR
 Nuclear Localisation
 PredictNLS
 NLStradamus
 Nucleolus Localisation
 NoD
 Sub-cellular Localisation
 PSORTB (*)
 WoLF PSORT
(*) Konrad Paszkiewicz’s wrapper from Galaxy Tool Shed, other wrappers written by us.
Observations from Wrapping Tools
 Tabular output for Galaxy
 Most tools’ output needed reformatting
 Some tools are not threaded
 Galaxy team working on “embarrassing parallel” case
 Interaction with tool authors can be productive and
informative, and improve their tools
 To tool authors
 Offer tabular output (if appropriate)
 Better error handling (e.g. zero length sequences)
Workflow example - RXLR motifs
 Important translocation motif in oomycetes
 We have implemented three methods in Galaxy:
 Bhattacharjee et al. (2006)
 Win et al. (2007)
 Whisson et al. (2007)
 Venn Diagram comparing the three methods
Next few steps omitted…
 Repeated RXLR search & filter using other two models
 Labelled some history entries
Python script using rpy
R library limma handles
the plotting
Wrapper is on the
Galaxy Tool Shed
Save this analysis
as a workflow
Workflow editor view
RXLRs in Phytophthora draft genomes
P. sojae
(19,027 proteins)
P. ramorum
(15,743 proteins)
P. capsici
(19,805 proteins)
Acknowledgements
 Helpful tool authors:
 Alex Nguyen (NLStradamus)
 Laszlo Kajan (PredictNLS)
 Peter Troshin, Michelle Scott (NoD)
 JHI Testers:
 John Jones, Remco Stam, Julietta Jupe
 The Galaxy Developers & mailing list community
Conclusions
 We like Galaxy
 Consistent web interface for combining many tools
 Open source and extensible
 We can add tools of local interest
 Our Galaxy tool wrappers are on the Galaxy Tool Shed
 http://usegalaxy.org/community
 Questions?

Más contenido relacionado

La actualidad más candente

Assessing Galaxy's ability to express scientific workflows in bioinformatics
Assessing Galaxy's ability to express scientific workflows in bioinformaticsAssessing Galaxy's ability to express scientific workflows in bioinformatics
Assessing Galaxy's ability to express scientific workflows in bioinformatics
Peter van Heusden
 
Loughborough research forum 2010 data overload presentation
Loughborough research forum 2010 data overload presentationLoughborough research forum 2010 data overload presentation
Loughborough research forum 2010 data overload presentation
Nicola Louise Beddall-Hill
 
UniProt-GOA
UniProt-GOAUniProt-GOA
UniProt-GOA
EBI
 
Venkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkitVenkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkit
BOSC 2010
 

La actualidad más candente (20)

Assessing Galaxy's ability to express scientific workflows in bioinformatics
Assessing Galaxy's ability to express scientific workflows in bioinformaticsAssessing Galaxy's ability to express scientific workflows in bioinformatics
Assessing Galaxy's ability to express scientific workflows in bioinformatics
 
Fairport domain specific metadata using w3 c dcat & skos w ontology views
Fairport domain specific metadata using w3 c dcat & skos w ontology viewsFairport domain specific metadata using w3 c dcat & skos w ontology views
Fairport domain specific metadata using w3 c dcat & skos w ontology views
 
Loughborough research forum 2010 data overload presentation
Loughborough research forum 2010 data overload presentationLoughborough research forum 2010 data overload presentation
Loughborough research forum 2010 data overload presentation
 
A guided tour of Araport
A guided tour of AraportA guided tour of Araport
A guided tour of Araport
 
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
 
Plant ontology web services on Araport
Plant ontology web services on AraportPlant ontology web services on Araport
Plant ontology web services on Araport
 
Getting Started with RNA-Seq Data Analysis
Getting Started with RNA-Seq Data AnalysisGetting Started with RNA-Seq Data Analysis
Getting Started with RNA-Seq Data Analysis
 
2014 10-01-assembly summaryvariantsoverview
2014 10-01-assembly summaryvariantsoverview2014 10-01-assembly summaryvariantsoverview
2014 10-01-assembly summaryvariantsoverview
 
Introducing ProtAnnot - Araport workshop at PAG 2016
Introducing ProtAnnot - Araport workshop at PAG 2016Introducing ProtAnnot - Araport workshop at PAG 2016
Introducing ProtAnnot - Araport workshop at PAG 2016
 
ResearchKit
ResearchKitResearchKit
ResearchKit
 
ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
ACS 248th Paper 146 VIVO/ScientistsDB Integration into EurekaACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
 
UniProt-GOA
UniProt-GOAUniProt-GOA
UniProt-GOA
 
Venkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkitVenkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkit
 
Gene Ontology Project
Gene Ontology ProjectGene Ontology Project
Gene Ontology Project
 
BioTorrents: A File Sharing Service for Scientific Data
BioTorrents: A File Sharing Service for Scientific DataBioTorrents: A File Sharing Service for Scientific Data
BioTorrents: A File Sharing Service for Scientific Data
 
Scientific Workflow Systems for accessible, reproducible research
Scientific Workflow Systems for accessible, reproducible researchScientific Workflow Systems for accessible, reproducible research
Scientific Workflow Systems for accessible, reproducible research
 
US2TS presentation on Gene Ontology
US2TS presentation on Gene OntologyUS2TS presentation on Gene Ontology
US2TS presentation on Gene Ontology
 
2016 Summer - Araport Project Overview Leaflet
2016 Summer - Araport Project Overview Leaflet2016 Summer - Araport Project Overview Leaflet
2016 Summer - Araport Project Overview Leaflet
 
ICAR 2015 Workshop - Agnes Chan
ICAR 2015 Workshop - Agnes ChanICAR 2015 Workshop - Agnes Chan
ICAR 2015 Workshop - Agnes Chan
 
247th ACS Meeting: The Eureka Research Workbench
247th ACS Meeting: The Eureka Research Workbench247th ACS Meeting: The Eureka Research Workbench
247th ACS Meeting: The Eureka Research Workbench
 

Destacado

In-Silico Identification of inhibitors for Controlling Rice Blast
In-Silico Identification of inhibitors for Controlling Rice BlastIn-Silico Identification of inhibitors for Controlling Rice Blast
In-Silico Identification of inhibitors for Controlling Rice Blast
Nisha Juyal
 
Defense mechanism in plants
Defense mechanism in plantsDefense mechanism in plants
Defense mechanism in plants
Vinit Pimputkar
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
Dayananda Salam
 

Destacado (18)

In-Silico Identification of inhibitors for Controlling Rice Blast
In-Silico Identification of inhibitors for Controlling Rice BlastIn-Silico Identification of inhibitors for Controlling Rice Blast
In-Silico Identification of inhibitors for Controlling Rice Blast
 
Prime-ome: "A molecular approach towards defense priming"
Prime-ome: "A molecular approach towards defense priming"Prime-ome: "A molecular approach towards defense priming"
Prime-ome: "A molecular approach towards defense priming"
 
Receptor-like proteins (RLPs) and kinases are major classes of resistance gen...
Receptor-like proteins (RLPs) and kinases are major classes of resistance gen...Receptor-like proteins (RLPs) and kinases are major classes of resistance gen...
Receptor-like proteins (RLPs) and kinases are major classes of resistance gen...
 
NGS: bioinformatic challenges
NGS: bioinformatic challengesNGS: bioinformatic challenges
NGS: bioinformatic challenges
 
Cold stress
Cold stressCold stress
Cold stress
 
Agrobacterium mediated Transformation in rice
Agrobacterium mediated Transformation in riceAgrobacterium mediated Transformation in rice
Agrobacterium mediated Transformation in rice
 
Hypersensitive Reaction in plant and their mechanism
Hypersensitive Reaction in plant and their mechanism Hypersensitive Reaction in plant and their mechanism
Hypersensitive Reaction in plant and their mechanism
 
New Generation Sequencing Technologies: an overview
New Generation Sequencing Technologies: an overviewNew Generation Sequencing Technologies: an overview
New Generation Sequencing Technologies: an overview
 
Omics in plant breeding
Omics in plant breedingOmics in plant breeding
Omics in plant breeding
 
Defense mechanism in plants
Defense mechanism in plantsDefense mechanism in plants
Defense mechanism in plants
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
 
transforming clinical microbiology by next generation sequencing
transforming clinical microbiology by next generation sequencingtransforming clinical microbiology by next generation sequencing
transforming clinical microbiology by next generation sequencing
 
NGS technologies - platforms and applications
NGS technologies - platforms and applicationsNGS technologies - platforms and applications
NGS technologies - platforms and applications
 
Ngs presentation
Ngs presentationNgs presentation
Ngs presentation
 
Ngs ppt
Ngs pptNgs ppt
Ngs ppt
 
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
 
Introduction to next generation sequencing
Introduction to next generation sequencingIntroduction to next generation sequencing
Introduction to next generation sequencing
 

Similar a Repeatable plant pathology bioinformatic analysis: Not everything is NGS data

Cool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchCool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical Research
David Ruau
 
Java Introductie
Java IntroductieJava Introductie
Java Introductie
mbruggen
 
From peer-reviewed to peer-reproduced: a role for research objects in scholar...
From peer-reviewed to peer-reproduced: a role for research objects in scholar...From peer-reviewed to peer-reproduced: a role for research objects in scholar...
From peer-reviewed to peer-reproduced: a role for research objects in scholar...
Alejandra Gonzalez-Beltran
 
Munoz torres web-apollo-workshop_exeter-2014_ss
Munoz torres web-apollo-workshop_exeter-2014_ssMunoz torres web-apollo-workshop_exeter-2014_ss
Munoz torres web-apollo-workshop_exeter-2014_ss
Monica Munoz-Torres
 

Similar a Repeatable plant pathology bioinformatic analysis: Not everything is NGS data (20)

Folker Meyer: Metagenomic Data Annotation
Folker Meyer: Metagenomic Data AnnotationFolker Meyer: Metagenomic Data Annotation
Folker Meyer: Metagenomic Data Annotation
 
Apollo provides collaborative genome annotation editing with the power of jbr...
Apollo provides collaborative genome annotation editing with the power of jbr...Apollo provides collaborative genome annotation editing with the power of jbr...
Apollo provides collaborative genome annotation editing with the power of jbr...
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and Knowledge
 
Cool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchCool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical Research
 
Results may vary: Collaborations Workshop, Oxford 2014
Results may vary: Collaborations Workshop, Oxford 2014Results may vary: Collaborations Workshop, Oxford 2014
Results may vary: Collaborations Workshop, Oxford 2014
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reports
 
ISMB Workshop 2014
ISMB Workshop 2014ISMB Workshop 2014
ISMB Workshop 2014
 
The Galaxy bioinformatics workflow environment
The Galaxy bioinformatics workflow environmentThe Galaxy bioinformatics workflow environment
The Galaxy bioinformatics workflow environment
 
Java Introductie
Java IntroductieJava Introductie
Java Introductie
 
Ugene
UgeneUgene
Ugene
 
B.3.5
B.3.5B.3.5
B.3.5
 
From peer-reviewed to peer-reproduced: a role for research objects in scholar...
From peer-reviewed to peer-reproduced: a role for research objects in scholar...From peer-reviewed to peer-reproduced: a role for research objects in scholar...
From peer-reviewed to peer-reproduced: a role for research objects in scholar...
 
SFSCON23 - Michele Finelli - Management of large genomic data with free software
SFSCON23 - Michele Finelli - Management of large genomic data with free softwareSFSCON23 - Michele Finelli - Management of large genomic data with free software
SFSCON23 - Michele Finelli - Management of large genomic data with free software
 
Munoz torres web-apollo-workshop_exeter-2014_ss
Munoz torres web-apollo-workshop_exeter-2014_ssMunoz torres web-apollo-workshop_exeter-2014_ss
Munoz torres web-apollo-workshop_exeter-2014_ss
 
PYTHON PPT.pptx
PYTHON PPT.pptxPYTHON PPT.pptx
PYTHON PPT.pptx
 
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
 
Initial steps towards a production platform for DNA sequence analysis on the ...
Initial steps towards a production platform for DNA sequence analysis on the ...Initial steps towards a production platform for DNA sequence analysis on the ...
Initial steps towards a production platform for DNA sequence analysis on the ...
 
Computational Resources In Infectious Disease
Computational Resources In Infectious DiseaseComputational Resources In Infectious Disease
Computational Resources In Infectious Disease
 
3rd presentation
3rd presentation3rd presentation
3rd presentation
 
ACS 248th Paper 67 Eureka Collaboration
ACS 248th Paper 67 Eureka CollaborationACS 248th Paper 67 Eureka Collaboration
ACS 248th Paper 67 Eureka Collaboration
 

Más de Leighton Pritchard

Más de Leighton Pritchard (20)

In a Different Class?
In a Different Class?In a Different Class?
In a Different Class?
 
RDVW Hands-on session: Python
RDVW Hands-on session: PythonRDVW Hands-on session: Python
RDVW Hands-on session: Python
 
Little Rotters: Adventures With Plant-Pathogenic Bacteria
Little Rotters: Adventures With Plant-Pathogenic BacteriaLittle Rotters: Adventures With Plant-Pathogenic Bacteria
Little Rotters: Adventures With Plant-Pathogenic Bacteria
 
Pathogen Genome Data
Pathogen Genome DataPathogen Genome Data
Pathogen Genome Data
 
Reverse-and forward-engineering specificity of carbohydrate-processing enzymes
Reverse-and forward-engineering specificity of carbohydrate-processing enzymesReverse-and forward-engineering specificity of carbohydrate-processing enzymes
Reverse-and forward-engineering specificity of carbohydrate-processing enzymes
 
Comparative Genomics and Visualisation BS32010
Comparative Genomics and Visualisation BS32010Comparative Genomics and Visualisation BS32010
Comparative Genomics and Visualisation BS32010
 
Whole genome taxonomic classi cation for prokaryotic plant pathogens
Whole genome taxonomic classication for prokaryotic plant pathogensWhole genome taxonomic classication for prokaryotic plant pathogens
Whole genome taxonomic classi cation for prokaryotic plant pathogens
 
Microbial Genomics and Bioinformatics: BM405 (2015)
Microbial Genomics and Bioinformatics: BM405 (2015)Microbial Genomics and Bioinformatics: BM405 (2015)
Microbial Genomics and Bioinformatics: BM405 (2015)
 
Microbial Agrogenomics 4/2/2015, UK-MX Workshop
Microbial Agrogenomics 4/2/2015, UK-MX WorkshopMicrobial Agrogenomics 4/2/2015, UK-MX Workshop
Microbial Agrogenomics 4/2/2015, UK-MX Workshop
 
BM405 Lecture Slides 21/11/2014 University of Strathclyde
BM405 Lecture Slides 21/11/2014 University of StrathclydeBM405 Lecture Slides 21/11/2014 University of Strathclyde
BM405 Lecture Slides 21/11/2014 University of Strathclyde
 
Sequencing and Beyond?
Sequencing and Beyond?Sequencing and Beyond?
Sequencing and Beyond?
 
Highly Discriminatory Diagnostic Primer Design From Whole Genome Data
Highly Discriminatory Diagnostic Primer Design From Whole Genome DataHighly Discriminatory Diagnostic Primer Design From Whole Genome Data
Highly Discriminatory Diagnostic Primer Design From Whole Genome Data
 
ICSB 2013 - Visits Abroad Report
ICSB 2013 - Visits Abroad ReportICSB 2013 - Visits Abroad Report
ICSB 2013 - Visits Abroad Report
 
Adventures in Bioinformatics (2012)
Adventures in Bioinformatics (2012)Adventures in Bioinformatics (2012)
Adventures in Bioinformatics (2012)
 
Golden Rules of Bioinformatics
Golden Rules of BioinformaticsGolden Rules of Bioinformatics
Golden Rules of Bioinformatics
 
Plant Pathogen Genome Data: My Life In Sequences
Plant Pathogen Genome Data: My Life In SequencesPlant Pathogen Genome Data: My Life In Sequences
Plant Pathogen Genome Data: My Life In Sequences
 
What makes the enterobacterial plant pathogen Pectobacterium atrosepticum dif...
What makes the enterobacterial plant pathogen Pectobacterium atrosepticum dif...What makes the enterobacterial plant pathogen Pectobacterium atrosepticum dif...
What makes the enterobacterial plant pathogen Pectobacterium atrosepticum dif...
 
Rapid generation of E.coli O104:H4 PCR diagnostics
Rapid generation of E.coli O104:H4 PCR diagnosticsRapid generation of E.coli O104:H4 PCR diagnostics
Rapid generation of E.coli O104:H4 PCR diagnostics
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to Bioinformatics
 
Mining Plant Pathogen Genomes for Effectors
Mining Plant Pathogen Genomes for EffectorsMining Plant Pathogen Genomes for Effectors
Mining Plant Pathogen Genomes for Effectors
 

Último

Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
gindu3009
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
ssuser79fe74
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
RizalinePalanog2
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Sérgio Sacani
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
AlMamun560346
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 

Último (20)

Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
American Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptxAmerican Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptx
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
 

Repeatable plant pathology bioinformatic analysis: Not everything is NGS data

  • 1. Repeatable plant pathology bioinformatic analysis: Not everything is NGS data. Peter Cock & Leighton Pritchard Genomics for Non-Model Organisms ISMB/ECCB - Workshop 6 Vienna, Austria 19 July 2011
  • 2. JHI Plant Pathology  We work on a range of organisms  Plant Viruses  Bacteria  Oomycetes  Fungi  Nematodes  Aphids (as virus vectors)  Many genome sequences now available… JHI Dundee site, formerly SCRI (Scottish Crop Research Institute)
  • 3. Plant Pathogen Genomics  Sequencing of plant pathogens becoming common  I will focus on the protein complement today (i.e. What happens after assembly of NGS data)  Often looking at first genome for a genus  Automated gene annotation from other species limited  In particular we want to find “effectors”  i.e. proteins used to manipulate host
  • 4. Effector Protein Analysis e.g. Type III secretion Pathogen Planthost Host nucleus e.g. transcriptional regulators Export Import Injection e.g. protease inhibitors (block plant defence) LocalisationGolgi Effector in transit: Active effector:
  • 5. Example: Effector Protein Analysis  Want to identify effector genes, e.g.  Similarity to known effectors (e.g. with BLAST)  Signal peptides  Localization signals  Possible horizontal gene transfer (e.g. different GC%)  Part of larger task of automated gene annotation, e.g.  HMMER or RPS-BLAST domain searches  GO annotation with Blast2GO
  • 6. Example: Nematode Effector Proteins  To identify candidate effector genes:  Want a signal peptide (for export)  Don’t want a transmembrane domain (not secreted)  How? Initially used a Python script calling  SignalP 3.0  TMHMM 2.0  Now we make this easy to use, thanks to Galaxy… See Jones et al. (2009), Molecular Plant Pathology
  • 7. What is Galaxy?  Web system for running many analysis tools remotely:  Upload your data to Galaxy (or access shared data)  Run analysis on Galaxy server/cluster  Download results (or share them)  Designed to link multiple analysis steps as workflows:  Repeatable  Shareable
  • 8. Our local Galaxy Server Will upload a FASTA file of predicted proteins
  • 9. Step 1 – Upload FASTA file
  • 10. Step 1 – Upload FASTA file
  • 11. Step 2 – Run workflow
  • 12. Step 2 – Run workflow
  • 13. Step 2 – Run workflow
  • 14. Step 2 – Run workflow
  • 15. Step 2 – Run workflow
  • 16. Step 2 – Run workflow
  • 17. Step 3 – Get results
  • 18. Example: Effector Protein Analysis  To identify candidate effector genes:  Want a signal peptide (for export)  Don’t want a transmembrane domain (not secreted)  Now use Galaxy workflow calling:  SignalP 3.0  TMHMM 2.0  I can share this workflow:  Easily available to anyone using our Server  If tools installed, could share with other Galaxy Servers  Easy for our Biologists to apply workflow to their own data See Jones et al. (2009), Molecular Plant Pathology & Kikuchi et al. (2011), PLoS Pathogens
  • 19. Workflow Editor – Effector finding
  • 20. Why Galaxy?  Hi Peter, could you run a big BLAST job for me?  Everyone using standalone BLAST is not practical  Want a local BLAST web interface with multiple-query support  Group XXX have just published the YYY genome – could you look for ZZZ proteins please?  With a suitable interface, lots of analyses are simple enough for non-bioinformaticians to run and interpret  You remember that analysis we did last year? I want to do it again on this new genome  Running old scripts on new data is tedious  Workflows should be reproducible
  • 21. Why Galaxy? Plus Points  Don’t have to worry about local software installation  Mostly Windows here at JHI, most tools need Linux  Uniform web based GUI for wrapped tools  Web interfaces are all different  Command line tools are scary  Coupling tools together as sharable repeatable workflows  Can share data files (better than email/shared drives)  Open Source (extendable, free)  Almost any tool can be added
  • 22. Why Galaxy? Downsides  Investment in training users  But interface is consistent across tools  Bugs in Galaxy  Most issues arise when wrapping tools  Missing tools  Have to invest time wrapping things we need
  • 23. Protein Analysis Tools in our Galaxy All take a FASTA protein file as input, return a tabular file.  Sequence similarity  NCBI BLAST+  Blast2GO  Transmembrane domains  TMHMM  Signal Peptides/Motifs  SignalP  EffectiveT3  RXLR  Nuclear Localisation  PredictNLS  NLStradamus  Nucleolus Localisation  NoD  Sub-cellular Localisation  PSORTB (*)  WoLF PSORT (*) Konrad Paszkiewicz’s wrapper from Galaxy Tool Shed, other wrappers written by us.
  • 24. Observations from Wrapping Tools  Tabular output for Galaxy  Most tools’ output needed reformatting  Some tools are not threaded  Galaxy team working on “embarrassing parallel” case  Interaction with tool authors can be productive and informative, and improve their tools  To tool authors  Offer tabular output (if appropriate)  Better error handling (e.g. zero length sequences)
  • 25. Workflow example - RXLR motifs  Important translocation motif in oomycetes  We have implemented three methods in Galaxy:  Bhattacharjee et al. (2006)  Win et al. (2007)  Whisson et al. (2007)  Venn Diagram comparing the three methods
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33. Next few steps omitted…  Repeated RXLR search & filter using other two models  Labelled some history entries
  • 34. Python script using rpy R library limma handles the plotting Wrapper is on the Galaxy Tool Shed
  • 35.
  • 36. Save this analysis as a workflow
  • 38. RXLRs in Phytophthora draft genomes P. sojae (19,027 proteins) P. ramorum (15,743 proteins) P. capsici (19,805 proteins)
  • 39. Acknowledgements  Helpful tool authors:  Alex Nguyen (NLStradamus)  Laszlo Kajan (PredictNLS)  Peter Troshin, Michelle Scott (NoD)  JHI Testers:  John Jones, Remco Stam, Julietta Jupe  The Galaxy Developers & mailing list community
  • 40. Conclusions  We like Galaxy  Consistent web interface for combining many tools  Open source and extensible  We can add tools of local interest  Our Galaxy tool wrappers are on the Galaxy Tool Shed  http://usegalaxy.org/community  Questions?