1. RNA-Seq Analysis
Using Pathogen Portal’s RNA-Seq analysis pipeline RNARocket
Overview
Creating an account
Exploring the site
Getting data
Checking quality
Starting analysis
Further analysis
2. Create an account
Step 1: Create a login account:
I. Go to http://pathogenportal.org
II. Click on RNA Rocket.
III. Click on Create account
IV. Fill in the required information.
3. Exploring the site: Launch Pad
- Interactive concept
diagram
- Task oriented menu
system
- Designed for novice user
4. Exploring the site: Launch Pad Trim Reads
- User guide to why,
what, and how
- Details required inputs
and expected outputs
- Helps organize files
into project spaces
5. Exploring the site: Project View
-
View existing projects
Download files
View metadata
Stream to BRC sites
Manage space allocation
Share projects
6. Exploring the site: Shared Data Published Projects
- View shared projects
- Import into your project
space
- Share with collaborators
- Provide data for
presentations…
-
7. Getting Data
A. Importing shared data
B. Transferring ENA/SRA data
C. Uploading your own
1. Click on “Shared Data” “Published Projects”
2. Click on the title of the Project you wish to import
8. 3. Click “Import History” to import the Project into your Project View
9. Getting Data
A. Importing shared data
B. Transferring ENA/SRA data
C. Uploading your own
1. Navigate to the ‘Launch Pad’ page and click the ‘Get fastq files from SRA/ENA’ link
2. Click the ‘Continue’ button
10. 3. Search for the SRA or ENA accession in the search box provided. Alternatively search for
the GEO, ArrayExpress, SRA, or ENA identifiers in the global search box at the top.
4. Click on the Nucleotide Sequences Record title you wish to import.
11. 5. On the subsequent ENA record page click the ‘File’ link in the ‘Fastq files (galaxy)’ column
for the files you wish to transfer.
12. Getting Data
A. Importing shared data
B. Transferring ENA/SRA data
C. Uploading your own
1. To upload data from your computer or a remote computer click the ‘Upload Files’ link on
the Launch Pad page.
2. On the subsequent page use the ‘Choose File’ button to upload files from your own
computer (limited to 2Gb), the ‘URL/Text’ box to paste URLs for files on remote
computers, and the FTP instructions for transferring files over FTP (better for larger files).
Choose files from your computer here
Paste the FastQ URLs here
Instructions for using FTP
13. Checking quality
Read base quality can affect how the reads map to the genome. Different sequencing
technologies can have different quality and base-call error profiles. Depending on the quality
of base calls you may wish to trim your read sequences or make special adjustments to the
alignment parameters to account for this.
There are two tools, FastQC and SAMStat, for checking the average base call quality in a
fastq file and the number of reads aligned, respectively.
An example is provided in Shared Data Published Projects RNASeq_QC_Demo
Here we show two classes of files:
1. the original reads
2. trimmed version of those reads with low quality ends removed
For these two classes we give both the FastQC and SAMStat report
Original fastq & analysis
Trimmed fastq & analysis
Click the eye see the contents of a file or report
14. From the FastQC report we see that the average base call quality is improved by trimming
the reads.
From the SAMStat report we see that the number of unaligned reads only shows a slight
improvement with trimming. Modern alignment software is often able to account for the base
call quality in determining alignments. Also of note is that the ‘Mean Base Quality’ profile is
not substantially different for MAPQ >=30 and MAPQ < 3.
15. Starting Analysis
Test datasets have been provided for the purpose of starting an alignment and transcript assembly
job at Shared Data Published Projects RNASeq_Run_Demo.
- To begin, import this history into your own workspace by using the ‘Import history’ functionality
demonstrated previously.
-
After the Project is imported it should appear in your ‘Project View’
16. -
Proceed to the ‘Launch Pad’ page and click the ‘Align Reads & Assemble Transcripts’ link.
-‐
On the next page choose the type of analysis (we are analyzing a paired end prokaryotic
sample).
Next select the target project from the drop down menu. You should have a project called
‘imported: RNASeq_Run_Demo’. Once you select the correct project you should see the
two FASTQ files listed. Next click ‘Continue’.
-‐
17. The following page allows you to configure the parameters for the various tools that will run
as part of the analysis you have selected. Here we describe the bare minimum for running a
job. More care should be taken when customizing analysis to your data.
First populate the Upstream and Downstream Read Files with READ1_SHORT.fastq and
READ2_SHORT.fastq respectively.
Select the reference organism ‘Salmonella enterica subsp. Typhimurium 14028S’ from the
dropdown. It may take a moment for the dropdown to appear once clicked due to the number
of organisms.
18. Select ‘Run Workflow’ at the bottom of the page
If the workflow is successfully queued you should see the following
19. Next go to the ‘Project View’ page to see the status of your jobs
From the display in the right most panel: Grey jobs are pending, Green jobs are complete,
and Yellow Jobs are running.
20. Further Analysis
Test datasets have been provided for the purpose of testing the RNA-Seq visualization capabilities at
PATRIC. Navigate to Shared Data Published Projects RNASeq_Analysis_Demo
The files displayed each have a visualization component on the PATRIC site. This can be done by
first clicking the dataset title to expand the dataset section, then clicking the display at PATRIC link.
Read Quality View
Displaying BAM at PATRIC
Expression View