SlideShare una empresa de Scribd logo
1 de 19
An Introduction to Running, Reusing and 
Sharing Workflows with Taverna 
Stian Soiland-Reyes and Christian Brenninkmeijer 
University of Manchester 
materials by Katy Wolstencroft, Aleksandra Pawlik, Alan Williams 
http://orcid.org/0000-0001-9842-9718 
http://orcid.org/0000-0002-2937-7819 
http://orcid.org/0000-0002-1279-5133 
http://orcid.org/0000-0001-8418-6735 
http://orcid.org/0000-0003-3156-2105 
Bonn University, 2014-09-01 / 2014-09-03 
This work is licensed under a http://www.taverna.org.uk/ 
Creative Commons Attribution 3.0 Unported License
 This tutorial will give you a basic introduction to reusing 
workflows in Taverna and my Experiment. 
 Other tutorials will explore nested workflows, the workflow 
engine (iteration, looping, parallel invocation) 
 Like in the previous tutorial workflows in this practical use 
small data-sets and are designed to run in a few minutes. In 
the real world, you would be using larger data sets and 
workflows would typically run for longer
Bigger Workflows: Enrichment Analysis 
The previous examples were trivial, small tasks. Taverna’s real 
power is in iterating over large data sets 
 Many experiments result in a list of genes (e.g. microarray 
analysis, Chip-Seq, SNP identification etc). 
 In this exercise, we will use Taverna to analyse a gene set 
from a Chip-Seq experiment by finding and reusing existing 
workflows 
 We will enrich our dataset by discovering: 
1. Which pathways our genes are involved in 
2. The functions of the genes 
3. Literature evidence for the phenotype/trait of interest
Re-using workflows from myExperiment 
 Go to http://www.myexperiment.org and click on ‘find 
workflows’ 
 You will see a list of the most viewed and downloaded workflow 
– see what the most popular workflow does by reading the 
description 
 Change the rank to ‘Latest’ and see what has been uploaded in 
the last few weeks 
 We will now find and download a workflow to identify the 
pathways each gene in our gene set is involved in
Re-using workflows from myExperiment 
 Find the workflow called “UnigeneID to KEGG Pathways” and look 
at the workflow entry page (uploaded by “Aleksandra Pawlik”) 
 Download the workflow by clicking on the link: “Download 
Workflow” and find out what it does by reading the descriptions in 
myExperiment 
 Open the workflow in Taverna by going to ‘File ->Open Workflow’ 
 Run the workflow using the example values supplied (Hint: when 
you run the workflow you can add the example values by clicking 
on “Use examples”) 
 Look at the workflow output – now you will see pathway 
information and pathway diagrams
Combining workflows from myExperiment 
 To analyse all the genes from our ChipSeq study, we need to 
extract the gene list from our results file 
 To make it easier to work through the example, we have 
provided a Chip-Seq gene list on myExperiment, you can find 
it under “GalaxyGeneList - short : datafile for training” 
 Save this file to your local machine 
 Open the file in Excel 
 Save the file with a .csv extension 
 As you can see, the list of genes is in column D 
 Taverna can process and extract this column automatically
Combining workflows from myExperiment 
 In myExperiment, find and download the workflow called 
“Import and convert gene list” 
 This workflow will extract the list of genes in column D 
using Taverna’s built-in spreadsheet import tool (which can 
be found in the services panel, for future reference) 
 The next step in the workflow converts RefSeq IDs into 
unigene IDs (required for the pathways workflow – 
converting between different types of identifiers is a 
common problem in bioinformatics!) 
 Run the workflow. This time, in the input window, select 
“set file location” and set the location to the saved .csv 
gene list. 
 Look at the workflow results
Combining workflows from myExperiment 
 We will now combine the two 
workflows 
 While you are still in the “import 
and convert” workflow, go to the 
top of the workbench and select 
“insert -> Nested workflow” 
 In the pop-up window, select 
“import from file” and find the 
pathways workflow you 
downloaded earlier. 
 Click on “import workflow” and 
the pathways workflow will 
appear in the main workflow 
diagram.
Combining workflows from myExperiment 
 Connect the workflows up by linking the output of the 
‘Merge_Gene_List’ with the nested workflow input
Combining workflows from myExperiment 
 Create new output ports for the Nested workflow and connect the 
Nested workflow outputs to the new outputs 
NOTE: you don’t need to connect them all, just pathway descriptions, 
pathway images and gene descriptions 
 Save the workflow 
 Run the workflow (it may take a few minutes)
GO Associations 
There are many different tools we could use to find Gene 
Ontology associations for your gene list 
For example, we could simply modify the BioMart/Ensembl 
service in the ‘Import and convert gene list’ workflow we have 
already used 
Reload the ‘Import and Convert gene list’ workflow 
Right-click on the ‘mmusculus_gene_ensembl’ service and 
select ‘Copy’ 
Paste an extra copy of this service into the same workflow 
diagram
GO Associations 
This is a BioMart service. It allows you to retrieve omics data 
from ENSEMBL and other genomics resources. If you are familiar 
with BioMart, you will see the interface in Taverna is very similar 
to the web interface 
We will modify the BioMart query to find all GO associations for 
each gene associated with a Chip-Seq peak 
Right-click on the new copy of the service and select ‘Configure 
BioMart Query’
GO Associations 
The inputs (or filters) already accept RefSeq Ids from our input 
file, but we need to modify the outputs (or attributes) 
Select ‘Attributes’ and expand the …‘External’… section. 
Select ‘Go Term Accession’, ‘GO Term Definition’ and ‘Go 
Domain’ 
Unselect ‘UniGeneID’ and select ‘RefSeq mRNA’… 
 (See screenshot on the next slide for an example)
GO Associations
GO Associations 
Click ‘apply’ to save your changes, and ‘close’, to go back to 
Taverna 
At the top of the workflow diagram, change the workflow view 
to show all ports by clicking on the table icon
Connect your new service to 
the workflow by linking the 
‘D’ output port of the 
spreadsheet service to the 
input of your new service 
Make the new output ports 
and connect them as shown 
to your new service 
GO Associations
GO Associations 
Save the workflow by going to ‘File -> Save Workflow’ 
Run the workflow 
Set file location to the GalaxyGeneList file saved earlier 
(Download and) view the GO report 
In “GO_REPORT” tab select “Value 1” and press “Save Value” 
Save as a .txt or as a .tsf (tab separated) file
Simple Text Mining 
So far we have looked at enriching the genomic information, but 
we could also use workflows for running data analyses (e.g. 
aligning mouse genes with human homologues) or performing 
literature searches 
Think about the ways you could extend this analysis with 
literature searches (e.g. Correlations between pathways, genes, 
GO terms, phenotypes etc) 
Search myExperiment for workflows involving text mining, 
using the search terms “text mining” and “Pubmed”
Text Mining 
 Find and open the workflow “Phenotype to pubmed” 
One of the services is no longer available in the nested workflow 
(the faded-out service). Taverna checks the availability of each 
service when you load the workflow and when you run it 
In this case, the workflow will still run without the final nested 
workflow (clean text) 
Delete the ‘clean text’ nested workflow (by selecting it and 
right-clicking), and reconnect the workflow output 
Run the workflow with the search term ‘erythropoiesis’ (or a 
phenotype term to describe the disease you are studying)

Más contenido relacionado

La actualidad más candente

performancetestingjmeter-121109061704-phpapp02 (1)
performancetestingjmeter-121109061704-phpapp02 (1)performancetestingjmeter-121109061704-phpapp02 (1)
performancetestingjmeter-121109061704-phpapp02 (1)
QA Programmer
 

La actualidad más candente (13)

2014 Taverna tutorial Advanced Taverna
2014 Taverna tutorial Advanced Taverna2014 Taverna tutorial Advanced Taverna
2014 Taverna tutorial Advanced Taverna
 
Less13 3 e_loadmodule_3
Less13 3 e_loadmodule_3Less13 3 e_loadmodule_3
Less13 3 e_loadmodule_3
 
Anypoint Batch Processing and Polling Scope With Mulesoft
Anypoint Batch Processing and Polling Scope With MulesoftAnypoint Batch Processing and Polling Scope With Mulesoft
Anypoint Batch Processing and Polling Scope With Mulesoft
 
Create & Execute First Hadoop MapReduce Project in.pptx
Create & Execute First Hadoop MapReduce Project in.pptxCreate & Execute First Hadoop MapReduce Project in.pptx
Create & Execute First Hadoop MapReduce Project in.pptx
 
Ajax and xml
Ajax and xmlAjax and xml
Ajax and xml
 
Mule batch processing
Mule batch processingMule batch processing
Mule batch processing
 
Change tracking
Change trackingChange tracking
Change tracking
 
performancetestingjmeter-121109061704-phpapp02 (1)
performancetestingjmeter-121109061704-phpapp02 (1)performancetestingjmeter-121109061704-phpapp02 (1)
performancetestingjmeter-121109061704-phpapp02 (1)
 
BD-zero lecture.pptx
BD-zero lecture.pptxBD-zero lecture.pptx
BD-zero lecture.pptx
 
exp-7-pig installation.pptx
exp-7-pig installation.pptxexp-7-pig installation.pptx
exp-7-pig installation.pptx
 
Less14 3 e_loadmodule_4
Less14 3 e_loadmodule_4Less14 3 e_loadmodule_4
Less14 3 e_loadmodule_4
 
Database example
Database exampleDatabase example
Database example
 
Using groovy component
Using groovy componentUsing groovy component
Using groovy component
 

Similar a 2014 Taverna tutorial myExperiment

Project
ProjectProject
Project
Xu Liu
 
Bhagat Myexperiment Bosc2008
Bhagat Myexperiment Bosc2008Bhagat Myexperiment Bosc2008
Bhagat Myexperiment Bosc2008
bosc_2008
 
2013-07-19 myExperiment research objects, beyond workflows and packs (PPTX)
2013-07-19 myExperiment research objects, beyond workflows and packs (PPTX)2013-07-19 myExperiment research objects, beyond workflows and packs (PPTX)
2013-07-19 myExperiment research objects, beyond workflows and packs (PPTX)
Stian Soiland-Reyes
 
User Manual_SQLVB Report Tool v2
User Manual_SQLVB Report Tool v2User Manual_SQLVB Report Tool v2
User Manual_SQLVB Report Tool v2
Russell Frearson
 

Similar a 2014 Taverna tutorial myExperiment (20)

2014 Taverna Tutorial Biodiversity example
2014 Taverna Tutorial Biodiversity example2014 Taverna Tutorial Biodiversity example
2014 Taverna Tutorial Biodiversity example
 
IMPACT/myGrid Hackathon - Introduction to Taverna
IMPACT/myGrid Hackathon - Introduction to TavernaIMPACT/myGrid Hackathon - Introduction to Taverna
IMPACT/myGrid Hackathon - Introduction to Taverna
 
2014 Taverna Tutorial Components
2014 Taverna Tutorial Components2014 Taverna Tutorial Components
2014 Taverna Tutorial Components
 
Taverna tutorial
Taverna tutorialTaverna tutorial
Taverna tutorial
 
Test Strategy Utilising Mc Useful Tools
Test Strategy Utilising Mc Useful ToolsTest Strategy Utilising Mc Useful Tools
Test Strategy Utilising Mc Useful Tools
 
Code transformation With Spoon
Code transformation With SpoonCode transformation With Spoon
Code transformation With Spoon
 
Project
ProjectProject
Project
 
Bhagat Myexperiment Bosc2008
Bhagat Myexperiment Bosc2008Bhagat Myexperiment Bosc2008
Bhagat Myexperiment Bosc2008
 
2013-07-19 myExperiment research objects, beyond workflows and packs (PPTX)
2013-07-19 myExperiment research objects, beyond workflows and packs (PPTX)2013-07-19 myExperiment research objects, beyond workflows and packs (PPTX)
2013-07-19 myExperiment research objects, beyond workflows and packs (PPTX)
 
2013 06-24 Wf4Ever: Annotating research objects (PDF)
2013 06-24 Wf4Ever: Annotating research objects (PDF)2013 06-24 Wf4Ever: Annotating research objects (PDF)
2013 06-24 Wf4Ever: Annotating research objects (PDF)
 
2013 06-24 Wf4Ever: Annotating research objects (PPTX)
2013 06-24 Wf4Ever: Annotating research objects (PPTX)2013 06-24 Wf4Ever: Annotating research objects (PPTX)
2013 06-24 Wf4Ever: Annotating research objects (PPTX)
 
Tools and connectors
Tools and connectorsTools and connectors
Tools and connectors
 
Michael Reich, GenomeSpace Workshop, fged_seattle_2013
Michael Reich, GenomeSpace Workshop, fged_seattle_2013Michael Reich, GenomeSpace Workshop, fged_seattle_2013
Michael Reich, GenomeSpace Workshop, fged_seattle_2013
 
Learning to Rank Relevant Files for Bug Reports using Domain Knowledge
Learning to Rank Relevant Files for Bug Reports using Domain KnowledgeLearning to Rank Relevant Files for Bug Reports using Domain Knowledge
Learning to Rank Relevant Files for Bug Reports using Domain Knowledge
 
OpenML NeurIPS2018
OpenML NeurIPS2018OpenML NeurIPS2018
OpenML NeurIPS2018
 
MULTIPLE SEQUENCE ALIGNMENT
MULTIPLE SEQUENCE ALIGNMENTMULTIPLE SEQUENCE ALIGNMENT
MULTIPLE SEQUENCE ALIGNMENT
 
Data Mining using Weka
Data Mining using WekaData Mining using Weka
Data Mining using Weka
 
White Paper On ConCurrency For PCMS Application Architecture
White Paper On ConCurrency For PCMS Application ArchitectureWhite Paper On ConCurrency For PCMS Application Architecture
White Paper On ConCurrency For PCMS Application Architecture
 
User Manual_SQLVB Report Tool v2
User Manual_SQLVB Report Tool v2User Manual_SQLVB Report Tool v2
User Manual_SQLVB Report Tool v2
 
Java 7 Features and Enhancements
Java 7 Features and EnhancementsJava 7 Features and Enhancements
Java 7 Features and Enhancements
 

Más de myGrid team

Más de myGrid team (12)

Taverna summary
Taverna summaryTaverna summary
Taverna summary
 
2014 Taverna Tutorial Introduction to eScience and workflows
2014 Taverna Tutorial Introduction to eScience and workflows2014 Taverna Tutorial Introduction to eScience and workflows
2014 Taverna Tutorial Introduction to eScience and workflows
 
2014 Taverna Tutorial Interactions
2014 Taverna Tutorial Interactions2014 Taverna Tutorial Interactions
2014 Taverna Tutorial Interactions
 
2014 Taverna Tutorial Nested workflows
2014 Taverna Tutorial Nested workflows2014 Taverna Tutorial Nested workflows
2014 Taverna Tutorial Nested workflows
 
2014 Taverna tutorial Xpath
2014 Taverna tutorial Xpath2014 Taverna tutorial Xpath
2014 Taverna tutorial Xpath
 
SWeDe - Scientific Webservice Description
SWeDe - Scientific Webservice DescriptionSWeDe - Scientific Webservice Description
SWeDe - Scientific Webservice Description
 
Taverna workflows in the cloud
Taverna workflows in the cloudTaverna workflows in the cloud
Taverna workflows in the cloud
 
The Taverna Software Suite
The Taverna Software SuiteThe Taverna Software Suite
The Taverna Software Suite
 
The Taverna Workflow Management Software Suite - Past, Present, Future
The Taverna Workflow Management Software Suite - Past, Present, FutureThe Taverna Workflow Management Software Suite - Past, Present, Future
The Taverna Workflow Management Software Suite - Past, Present, Future
 
2014-06-03-Taverna-IS-ENES2
2014-06-03-Taverna-IS-ENES22014-06-03-Taverna-IS-ENES2
2014-06-03-Taverna-IS-ENES2
 
The beauty of workflows and models
The beauty of workflows and modelsThe beauty of workflows and models
The beauty of workflows and models
 
If we build it will they come?
If we build it will they come?If we build it will they come?
If we build it will they come?
 

Último

%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
masabamasaba
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 

Último (20)

%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 

2014 Taverna tutorial myExperiment

  • 1. An Introduction to Running, Reusing and Sharing Workflows with Taverna Stian Soiland-Reyes and Christian Brenninkmeijer University of Manchester materials by Katy Wolstencroft, Aleksandra Pawlik, Alan Williams http://orcid.org/0000-0001-9842-9718 http://orcid.org/0000-0002-2937-7819 http://orcid.org/0000-0002-1279-5133 http://orcid.org/0000-0001-8418-6735 http://orcid.org/0000-0003-3156-2105 Bonn University, 2014-09-01 / 2014-09-03 This work is licensed under a http://www.taverna.org.uk/ Creative Commons Attribution 3.0 Unported License
  • 2.  This tutorial will give you a basic introduction to reusing workflows in Taverna and my Experiment.  Other tutorials will explore nested workflows, the workflow engine (iteration, looping, parallel invocation)  Like in the previous tutorial workflows in this practical use small data-sets and are designed to run in a few minutes. In the real world, you would be using larger data sets and workflows would typically run for longer
  • 3. Bigger Workflows: Enrichment Analysis The previous examples were trivial, small tasks. Taverna’s real power is in iterating over large data sets  Many experiments result in a list of genes (e.g. microarray analysis, Chip-Seq, SNP identification etc).  In this exercise, we will use Taverna to analyse a gene set from a Chip-Seq experiment by finding and reusing existing workflows  We will enrich our dataset by discovering: 1. Which pathways our genes are involved in 2. The functions of the genes 3. Literature evidence for the phenotype/trait of interest
  • 4. Re-using workflows from myExperiment  Go to http://www.myexperiment.org and click on ‘find workflows’  You will see a list of the most viewed and downloaded workflow – see what the most popular workflow does by reading the description  Change the rank to ‘Latest’ and see what has been uploaded in the last few weeks  We will now find and download a workflow to identify the pathways each gene in our gene set is involved in
  • 5. Re-using workflows from myExperiment  Find the workflow called “UnigeneID to KEGG Pathways” and look at the workflow entry page (uploaded by “Aleksandra Pawlik”)  Download the workflow by clicking on the link: “Download Workflow” and find out what it does by reading the descriptions in myExperiment  Open the workflow in Taverna by going to ‘File ->Open Workflow’  Run the workflow using the example values supplied (Hint: when you run the workflow you can add the example values by clicking on “Use examples”)  Look at the workflow output – now you will see pathway information and pathway diagrams
  • 6. Combining workflows from myExperiment  To analyse all the genes from our ChipSeq study, we need to extract the gene list from our results file  To make it easier to work through the example, we have provided a Chip-Seq gene list on myExperiment, you can find it under “GalaxyGeneList - short : datafile for training”  Save this file to your local machine  Open the file in Excel  Save the file with a .csv extension  As you can see, the list of genes is in column D  Taverna can process and extract this column automatically
  • 7. Combining workflows from myExperiment  In myExperiment, find and download the workflow called “Import and convert gene list”  This workflow will extract the list of genes in column D using Taverna’s built-in spreadsheet import tool (which can be found in the services panel, for future reference)  The next step in the workflow converts RefSeq IDs into unigene IDs (required for the pathways workflow – converting between different types of identifiers is a common problem in bioinformatics!)  Run the workflow. This time, in the input window, select “set file location” and set the location to the saved .csv gene list.  Look at the workflow results
  • 8. Combining workflows from myExperiment  We will now combine the two workflows  While you are still in the “import and convert” workflow, go to the top of the workbench and select “insert -> Nested workflow”  In the pop-up window, select “import from file” and find the pathways workflow you downloaded earlier.  Click on “import workflow” and the pathways workflow will appear in the main workflow diagram.
  • 9. Combining workflows from myExperiment  Connect the workflows up by linking the output of the ‘Merge_Gene_List’ with the nested workflow input
  • 10. Combining workflows from myExperiment  Create new output ports for the Nested workflow and connect the Nested workflow outputs to the new outputs NOTE: you don’t need to connect them all, just pathway descriptions, pathway images and gene descriptions  Save the workflow  Run the workflow (it may take a few minutes)
  • 11. GO Associations There are many different tools we could use to find Gene Ontology associations for your gene list For example, we could simply modify the BioMart/Ensembl service in the ‘Import and convert gene list’ workflow we have already used Reload the ‘Import and Convert gene list’ workflow Right-click on the ‘mmusculus_gene_ensembl’ service and select ‘Copy’ Paste an extra copy of this service into the same workflow diagram
  • 12. GO Associations This is a BioMart service. It allows you to retrieve omics data from ENSEMBL and other genomics resources. If you are familiar with BioMart, you will see the interface in Taverna is very similar to the web interface We will modify the BioMart query to find all GO associations for each gene associated with a Chip-Seq peak Right-click on the new copy of the service and select ‘Configure BioMart Query’
  • 13. GO Associations The inputs (or filters) already accept RefSeq Ids from our input file, but we need to modify the outputs (or attributes) Select ‘Attributes’ and expand the …‘External’… section. Select ‘Go Term Accession’, ‘GO Term Definition’ and ‘Go Domain’ Unselect ‘UniGeneID’ and select ‘RefSeq mRNA’…  (See screenshot on the next slide for an example)
  • 15. GO Associations Click ‘apply’ to save your changes, and ‘close’, to go back to Taverna At the top of the workflow diagram, change the workflow view to show all ports by clicking on the table icon
  • 16. Connect your new service to the workflow by linking the ‘D’ output port of the spreadsheet service to the input of your new service Make the new output ports and connect them as shown to your new service GO Associations
  • 17. GO Associations Save the workflow by going to ‘File -> Save Workflow’ Run the workflow Set file location to the GalaxyGeneList file saved earlier (Download and) view the GO report In “GO_REPORT” tab select “Value 1” and press “Save Value” Save as a .txt or as a .tsf (tab separated) file
  • 18. Simple Text Mining So far we have looked at enriching the genomic information, but we could also use workflows for running data analyses (e.g. aligning mouse genes with human homologues) or performing literature searches Think about the ways you could extend this analysis with literature searches (e.g. Correlations between pathways, genes, GO terms, phenotypes etc) Search myExperiment for workflows involving text mining, using the search terms “text mining” and “Pubmed”
  • 19. Text Mining  Find and open the workflow “Phenotype to pubmed” One of the services is no longer available in the nested workflow (the faded-out service). Taverna checks the availability of each service when you load the workflow and when you run it In this case, the workflow will still run without the final nested workflow (clean text) Delete the ‘clean text’ nested workflow (by selecting it and right-clicking), and reconnect the workflow output Run the workflow with the search term ‘erythropoiesis’ (or a phenotype term to describe the disease you are studying)

Notas del editor

  1. ###