SlideShare a Scribd company logo
1 of 45
QIIME: Quantitative Insights Into
Microbial Ecology (part 1)
Thomas Jeffries
Federico M. Lauro
Grazia Marina Quero
Tiziano Minuzzo
The Omics Analysis Sydney Tutorial
Australian Museum
23rd
-24th
February 2015
QIIME
• Open source software package for taxonomic analysis of 16S
rRNA sequences
• UC Colorado & Northern Arizona
• www.qiime.org (great resource…..)
• Good community support
• Can google most problems
• Multi-platform
• Widely used
Caporaso
Knight
Getting QIIME
Linux: https://github.com/qiime/qiime-deploy
Mac: http://www.wernerlab.org/software/macqiime
Ubuntu virtualbox: http://qiime.org/install/virtual_box.html
Linux remote machine e.g UTS FEIT cluster, NECTAR:
http://nectar.org.au/research-cloud
http://qiime.org/install/install.html
Data formats
• 454:
DNA sequences (FASTA, .fna)
Quality (.qual)
Mapping file (.txt)
• Illumina
Sequences and quality in same file (.fastq)
Also supports paired end
Getting into QIIME
• Command line interface
• Some very basic commands needed for QIIME:
example:
/folder$ programme.py -i file_in -o
file_out
ls :list files in working directory
cd : changes directory
cd .. : goes back to parent directory
‘tab’ key: magically fills out file names
mkdir : makes a directory
pwd : tells you where you are
QIIME tutorial and example data
• Many tutorials @ http://qiime.org/tutorials/index.html
• Good place to start: http://qiime.org/tutorials/tutorial.html
• Great Microbial Ecology course (includes QIIME): http://edamame-course.org/
• A few of the commands have changed in the new version – the current
commands are in this talk - and I have renamed the files to make it easier to
follow
Some useful terminology
Diversityα
Alpha diversity is the diversity within ONE sample
Diversityα
Diversity: Richnessα
Diversity:α
Evenness
Diversity:α
Evenness
Common metric: Pielou’s evenness
Tutorial dataset
Tutorial dataset
1. Check mapping file format
• Checks that format of mapping file is ok
validate_mapping_file.py -m my_mapping_file.txt -o
validate_mapping_file_output
“No errors or warnings were found in mapping file”
1. Check mapping file
Name (ID) of
sample
Primer
Sequencing
barcode
Sample categories
(treatments)
Tab separated !!!
Hands on – validate your mapping
file
validate_mapping_file.py -o
moving_pictures_tutorial-
1.8.0/illumina/cid_l1/ -m
moving_pictures_tutorial-
1.8.0/illumina/raw/filtered_mapping_l1.txt
2. De-multiplex - 454
• Using sample specific barcodes, identify each sequence
with a sample (renames sequences)
• Performs some QC:
 Removes sequences < 200bp
 Removes sequences with a quality score <25
 Removes sequences with >6 ambiguous bases or >6
homopolymer runs
split_libraries.py -m my_mapping_file.txt -f
my_sequence_file.fna -q my_quality_file.qual -o
split_library_output
• Produces seqs.fna
2. De-multiplex - Illumina (Step 1)
• If the samples contain paired-end reads, you first need to
join them and update the barcodes using:
join_paired_ends.py -f my_forw_reads.fastq -r
my_rev_reads.fastq -b my_barcodes.fastq -o
my_joined.fastq
2. De-multiplex - Illumina (Step 2)
 Then you can proceed to the split libraries step. If the
sequences are NOT paired-ends go directly to
split_libraries_fastq.py. This step also performs the
Illumina reads QC:
split_libraries_fastq.py -m my_mapping_file.txt -i
my_sequence_file.fastq -b my_barcodes.fastq -o
split_library_output
• Data from multiple lanes can be processed together by separating
inputs with a comma (,)
• Produces seqs.fna
1.8.0/illumina/raw/subsampled_s_1_sequence.fastq,moving_pictures_tutorial-
1.8.0/illumina/raw/subsampled_s_2_sequence.fastq,moving_pictures_tutorial-
1.8.0/illumina/raw/subsampled_s_3_sequence.fastq,moving_pictures_tutorial-
1.8.0/illumina/raw/subsampled_s_4_sequence.fastq,moving_pictures_tutorial-
1.8.0/illumina/raw/subsampled_s_5_sequence.fastq,moving_pictures_tutorial-
1.8.0/illumina/raw/subsampled_s_6_sequence.fastq -b moving_pictures_tutorial-
1.8.0/illumina/raw/subsampled_s_1_sequence_barcodes.fastq,moving_pictures_tutorial-
1.8.0/illumina/raw/subsampled_s_2_sequence_barcodes.fastq,moving_pictures_tutorial-
1.8.0/illumina/raw/subsampled_s_3_sequence_barcodes.fastq,moving_pictures_tutorial-
1.8.0/illumina/raw/subsampled_s_4_sequence_barcodes.fastq,moving_pictures_tutorial-
1.8.0/illumina/raw/subsampled_s_5_sequence_barcodes.fastq,moving_pictures_tutorial-
1.8.0/illumina/raw/subsampled_s_6_sequence_barcodes.fastq -m
moving_pictures_tutorial-
1.8.0/illumina/raw/filtered_mapping_l1.txt,moving_pictures_tutorial-
1.8.0/illumina/raw/filtered_mapping_l2.txt,moving_pictures_tutorial-
1.8.0/illumina/raw/filtered_mapping_l3.txt,moving_pictures_tutorial-
1.8.0/illumina/raw/filtered_mapping_l4.txt,moving_pictures_tutorial-
1.8.0/illumina/raw/filtered_mapping_l5.txt,moving_pictures_tutorial-
1.8.0/illumina/raw/filtered_mapping_l6.txt
count_seqs.py -i moving_pictures_tutorial-1.8.0/illumina/slout/seqs.fna
Hands on: split your libraries
3. OTU picking strategies
• De Novo OTU picking: clustering of sequences at 97%
Overlapping sequences
No reference database necessary
computationally expensive
• Closed-Reference
non overlapping reads
needs reference database
discards sequences with no match - e.g. no erroneous reads
• Open-reference
Overlapping reads
reads clustered against reference and non matching reads are clustered de-
novo
Hands on – picking O.T.U.s
pick_open_reference_otus.py -o moving_pictures_tutorial-1.8.0/illumina/otus/ -i
moving_pictures_tutorial-1.8.0/illumina/slout/seqs.fna -r
gg_13_8_otus/rep_set/97_otus.fasta -p moving_pictures_tutorial-
1.8.0/uc_fast_params.txt
pick_de_novo_otus.py -o moving_pictures_tutorial-1.8.0/illumina/otus_denovo/ -i
moving_pictures_tutorial-1.8.0/illumina/slout/seqs.fna
3. Pick OTUs
Note: following steps can be automated by (what we are doing):
pick_de_novo_otus.py –i seqs.fna -o otus
pick_otus.py -i seqs.fna -o picked_otus_default
•Will cluster your sequences at 97% similarity (can change this if
you wish) and produce ‘seqs_otus.txt’ which maps each
sequence to a cluster
•Uses UCLUST algorithm (Edgar, 2010, Bioinformatics)
3. Pick OTUs
Generate OTUs by clustering reads based on similarity (default is
97%)
Sort reads according to size (long -> short)
Cluster
OTU1
OTU2
OTU3
OTU4
OTU5
4. Pick representative sequences
• We want a representative sequence for each OTU – time
consuming to annotate each sequence and they are already
clustered……
• This will take the most abundant sequence in each OTU and
make a file that has 1 sequence for each OTU (rep_set1.fna)
pick_rep_set.py -i seqs_otus.txt -f seqs.fna -o rep_set1.fna
5. Annotate (assign taxonomy to each
OTU)
• Compare each representative sequence to a database using one
of several algorithms:
• UCLUST, BLAST, RDP Classifier, et al…..….
• New Defaults: UCLUST against the Greengenes database
assign_taxonomy.py -i rep_set1.fna
(output in directory: uclust_assigned_taxonomy)
• BLAST example (reference sequences and taxonomy
downloaded from database):
assign_taxonomy.py -i rep_set1.fna -r ref_seq_set.fna -t
id_to_taxonomy.txt -m blast
5. Annotate
• Some useful databases that are compatible with QIIME:
http://greengenes.secondgenome.com
Good for everything and default in
QIIME
http://unite.ut.ee
Fungal Internal Transcribed Spacer (ITS)
Good for soil fungi
http://www.arb-silva.de
Contains both 16S and 18S rRNA (Eukaryotes…)
Good representation of marine taxa
Recap
Species A
Species B
Species C
mixed
amplicons
Sample 1
Sample 2
Sample 3
OTU 1
OTU 2
OTU 3
Split library into
samples using
barcodes
Used clustering
to choose OTUs
Picked a
representative
sequences and
assigned
taxonomy
Reference
database
6. Putting it all together: making an
OTU table
• Need to combine the OTU identity with the abundance
information in the clusters and link back to each sample so we
can do ECOLOGY
• The table is in .biom format:
• http://biom-format.org/documentation/biom_format.html
• Convert to text file:
• biom convert -i otu_table.biom -o otu_table.txt --table-type "otu table" --header-key
taxonomy –b
make_otu_table.py -i seqs_otus.txt -t
rep_set1_tax_assignments.txt -o otu_table.biom
Closed reference O.T.U. picking
pick_closed_reference_otus.py -i seqs.fna -r reference.fna -o
otus_w_tax/ -t taxa_map.txt
•Reference is database i.e. greengenes unaligned 97% otus and
matching taxa map (same files as for BLAST)
•Output has all of your sequences aligned to greengenes and an OTU
table
•So this picks OTUs and Assign taxonomy in 1 step (but loose non-
matching sequences….do we care? – taxa summaries no, beta-
diversity maybe….)
•Quick – good for illumina
7. Aligning sequences
• Back to our representative sequences….
• How closely related are the organisms present in the samples i.e.
what is the phylogeny of our community and how does this shift
between samples
• Default: PYNAST to align samples to a reference set of pre-
aligned sequences (e.g. greengenes ALIGNED) – more
computationally efficient than de novo alignment
• Can also select other methods e.g. MUSCLE,
align_seqs.py -i rep_set1.fna –o pynast_aligned/
7. Aligning sequences
• Not all regions of the rRNA gene are informative or useful for phylogenetic
inference
• Gaps – short length sequence vs full length rRNA gene
• filter_alignment.py -i rep_set1_aligned.fasta -o
filtered_alignment/
• Optional lanemask template that defines informative regions for some
databases
• filter_alignment.py -i seqs_rep_set_aligned.fasta -m
lanemask_in_1s_and_0s -o filtered_alignment/
• If you are going to use this alignment for making a phylogenetic tree this step
is essential…..
A note on chimera removal
•Chimeras sequences formed from DNA of 2 or more organisms (artifact of PCR
amplification)
•QIIME uses ChimeraSlayer to detect chimeric sequences using your alignment and a
reference database
•You should then remove these OTU’s from your OTU table and alignment before
proceeding with tree building and visualization of results :
•-e chimeric_seqs.txt when making OTU table, filter_fasta.py for alignment
identify_chimeric_seqs.py -m ChimeraSlayer -i rep_set_aligned.fasta -a
reference_set1_aligned.fasta -o chimeric_seqs.txt
8. Make a phylogenetic tree
make_phylogeny.py -i rep_set1_aligned_pfiltered.fasta -o
rep_phylo.tre
• Builds a tree from the alignment using FastTree
• Outputs a tree in newick format (.tre) which can be
opened with software such as FigTree or can be
used to calculate phylogenetic metrics
• Also filter Chimeras from tree
We now have 2 final outputs:
• OTU Table
1.Taxonomic composition
2. -diversity (e.g. ‘species’ richness)α
3. -diversity (e.g. abundance similarity between samples)β
• Phylogenetic tree
1.Phylogenetic -diversityβ
QIIME has powerful visualization and statistical
tools
Hands on – reformatting outputs
biom convert -i "otu table" --header-key taxonomy -b
moving_pictures_tutorial-
1.8.0/illumina/otus_denovo/otu_table.biom -o
moving_pictures_tutorial-
1.8.0/illumina/otus_denovo/otu_table.txt --table-type
filter_alignment.py -i moving_pictures_tutorial-
1.8.0/illumina/otus_denovo/pynast_aligned_seqs/seqs_rep_se
t_aligned.fasta -o moving_pictures_tutorial-
1.8.0/illumina/otus_denovo/pynast_aligned_seqs/filtered_align
ment
We have automated (piped) most of the steps I have talked about
We need to convert the OTU table to a text file and filter the alignment
9. Merging the mapping files
• We started with 6 lanes of Illumina but now we have a single OTU table. The
merged mapping file will have duplicated barcodes but these are not used
anymore (already demultiplexed):
• merge_mapping_files.py -o combined_mapping_file.txt -m
mapfile1.txt,mapfile2.txt…,mapfilexxx.txt
Hands on – merge your mapping files
merge_mapping_files.py -o moving_pictures_tutorial-
1.8.0/illumina/combined_mapping_file.txt -m moving_pictures_tutorial-
1.8.0/illumina/raw/filtered_mapping_l1.txt,moving_pictures_tutorial-
1.8.0/illumina/raw/filtered_mapping_l2.txt,moving_pictures_tutorial-
1.8.0/illumina/raw/filtered_mapping_l3.txt,moving_pictures_tutorial-
1.8.0/illumina/raw/filtered_mapping_l4.txt,moving_pictures_tutorial-
1.8.0/illumina/raw/filtered_mapping_l5.txt,moving_pictures_tutorial-
1.8.0/illumina/raw/filtered_mapping_l6.txt
biom summarize-table -i moving_pictures_tutorial-
1.8.0/illumina/otus_denovo/otu_table.biom -o moving_pictures_tutorial-
1.8.0/illumina/otus_denovo/otu_table.summary
Visualizing diversity 1 – community
composition
biom summarize-table –i otu_table.biom –o otu_table_summary.txt
Counts/Sample detail:
L3S237: 138.0
L3S235: 187.0
L3S372: 205.0
L3S373: 228.0
L3S367: 259.0
L3S370: 273.0
L3S368: 274.0
L3S369: 284.0
• Summary of OTU table: we want to standardize the number of
sequences (sampling depth) to allow accurate comparison
Ie. 146 sequences
single_rarefaction.py -i otu_table.biom -o otu_table_even146.biom -d 138
alpha_rarefaction.py -i otu_table.biom -m combined_mapping_file.txt -o
rarefaction/ -t rep_set.tre
• How ‘deep’ do we need to go to adequately
sample community? = Rarefaction analysis
• number of species increase until a point
where producing more sequence does not
significantly increase the number of
observed species
• repeated subsampling of your data at
different intervals. Plots subsamples against
the number of observed species. If curves
flatten, then you have sequenced at
sufficient depth.
• Rarefaction trade off between ‘keeping’
samples below a given sequence cut-off and
loosing diversity
Visualizing diversity 1 – community
composition
Hands on - Rarefaction
single_rarefaction.py -i moving_pictures_tutorial-
1.8.0/illumina/otus_denovo/otu_table.biom -o moving_pictures_tutorial-
1.8.0/illumina/otus_denovo/otu_table_even138.biom -d 138
alpha_rarefaction.py -i moving_pictures_tutorial-
1.8.0/illumina/otus_denovo/otu_table.biom -o moving_pictures_tutorial-
1.8.0/illumina/otus_denovo/rarefaction/ -m moving_pictures_tutorial-
1.8.0/illumina/combined_mapping_file.txt -t moving_pictures_tutorial-
1.8.0/illumina/otus_denovo/rep_set.tre
Tomorrow……
Visualizing and comparing diversity
Software references:
QIIME Caporaso et al 2010. QIIME allows analysis of high-throughput community sequencing data. Nature Methods
7(5): 335-336.
UCLUST Edgar RC. 2010. Search and clustering orders of magnitude faster than BLAST. Bioinformatics
26(19):2460-2461.
BLAST Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local alignment search tool. J Mol Biol
215(3):403-410.
GRENGENES McDonald et al 2012. An improved Greengenes taxonomy with explicit ranks for ecological and 
evolutionary analyses of bacteria and archaea. ISME J 6(3): 610–618. 
RDP Classifier Wang Q, Garrity GM, Tiedje JM, Cole JR. 2007. Naive Bayesian classifier for rapid assignment of
rRNA sequences into the new bacterial taxonomy. Appl Environ Microb 73(16): 5261-5267.
PyNAST Caporaso JG et al 2010. PyNAST: a flexible tool for aligning sequences to a template alignment.
Bioinformatics 26:266-267.
ChimeraSlayer Haas BJ, Gevers D, Earl AM, Feldgarden M, Ward DV, Giannoukos G, et al. 2011. Chimeric 16S
rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons. Genome Research
21:494-504.
MUSCLE Edgar, R.C. 2004 MUSCLE: multiple sequence alignment with high accuracy and high throughput
 Nucleic Acids Res:1792-1797
FasttTree Price MN, Dehal PS, Arkin AP. 2010. FastTree 2-Approximately Maximum-Likelihood Trees for Large
Alignments. Plos One 5(3)
UNIFRAC Lozupone C, Knight R. 2005. UniFrac: a new phylogenetic method for comparing microbial communities.
Appl Environ Microbiol 71(12): 8228-8235.
Emperor Vazquez-Baeza Y, Pirrung M, Gonzalez A, Knight R. 2013. Emperor: A tool for visualizing high-throughput
microbial community data. Gigascience 2(1):16. 

More Related Content

What's hot

2016 bioinformatics i_databases_wim_vancriekinge
2016 bioinformatics i_databases_wim_vancriekinge2016 bioinformatics i_databases_wim_vancriekinge
2016 bioinformatics i_databases_wim_vancriekingeProf. Wim Van Criekinge
 
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...Jan Aerts
 
Leveraging ancestral state reconstruction to infer community function from a ...
Leveraging ancestral state reconstruction to infer community function from a ...Leveraging ancestral state reconstruction to infer community function from a ...
Leveraging ancestral state reconstruction to infer community function from a ...Morgan Langille
 
2016 bioinformatics i_bio_python_wimvancriekinge
2016 bioinformatics i_bio_python_wimvancriekinge2016 bioinformatics i_bio_python_wimvancriekinge
2016 bioinformatics i_bio_python_wimvancriekingeProf. Wim Van Criekinge
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqEnis Afgan
 
Initial steps towards a production platform for DNA sequence analysis on the ...
Initial steps towards a production platform for DNA sequence analysis on the ...Initial steps towards a production platform for DNA sequence analysis on the ...
Initial steps towards a production platform for DNA sequence analysis on the ...Barbera van Schaik
 
Bio ontologies and semantic technologies
Bio ontologies and semantic technologiesBio ontologies and semantic technologies
Bio ontologies and semantic technologiesProf. Wim Van Criekinge
 
BITS - Comparative genomics: the Contra tool
BITS - Comparative genomics: the Contra toolBITS - Comparative genomics: the Contra tool
BITS - Comparative genomics: the Contra toolBITS
 
Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...
Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...
Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...Alejandra Gonzalez-Beltran
 
DNA analysis on your laptop: Spot the differences
DNA analysis on your laptop: Spot the differencesDNA analysis on your laptop: Spot the differences
DNA analysis on your laptop: Spot the differencesBarbera van Schaik
 
2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issues2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issuesDongyan Zhao
 
DEseq, voom and vst
DEseq, voom and vstDEseq, voom and vst
DEseq, voom and vstQiang Kou
 

What's hot (20)

2016 bioinformatics i_databases_wim_vancriekinge
2016 bioinformatics i_databases_wim_vancriekinge2016 bioinformatics i_databases_wim_vancriekinge
2016 bioinformatics i_databases_wim_vancriekinge
 
Cloud bioinformatics 2
Cloud bioinformatics 2Cloud bioinformatics 2
Cloud bioinformatics 2
 
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...
 
Leveraging ancestral state reconstruction to infer community function from a ...
Leveraging ancestral state reconstruction to infer community function from a ...Leveraging ancestral state reconstruction to infer community function from a ...
Leveraging ancestral state reconstruction to infer community function from a ...
 
Introduction to METAGENOTE
Introduction to METAGENOTE Introduction to METAGENOTE
Introduction to METAGENOTE
 
2016 bioinformatics i_bio_python_wimvancriekinge
2016 bioinformatics i_bio_python_wimvancriekinge2016 bioinformatics i_bio_python_wimvancriekinge
2016 bioinformatics i_bio_python_wimvancriekinge
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-Seq
 
Initial steps towards a production platform for DNA sequence analysis on the ...
Initial steps towards a production platform for DNA sequence analysis on the ...Initial steps towards a production platform for DNA sequence analysis on the ...
Initial steps towards a production platform for DNA sequence analysis on the ...
 
Bio ontologies and semantic technologies
Bio ontologies and semantic technologiesBio ontologies and semantic technologies
Bio ontologies and semantic technologies
 
BITS - Comparative genomics: the Contra tool
BITS - Comparative genomics: the Contra toolBITS - Comparative genomics: the Contra tool
BITS - Comparative genomics: the Contra tool
 
Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...
Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...
Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...
 
DNA analysis on your laptop: Spot the differences
DNA analysis on your laptop: Spot the differencesDNA analysis on your laptop: Spot the differences
DNA analysis on your laptop: Spot the differences
 
Variant analysis and whole exome sequencing
Variant analysis and whole exome sequencingVariant analysis and whole exome sequencing
Variant analysis and whole exome sequencing
 
2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issues2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issues
 
DEseq, voom and vst
DEseq, voom and vstDEseq, voom and vst
DEseq, voom and vst
 
2014 bangkok-talk
2014 bangkok-talk2014 bangkok-talk
2014 bangkok-talk
 
Whole exome sequencing(wes)
Whole exome sequencing(wes)Whole exome sequencing(wes)
Whole exome sequencing(wes)
 
MicrobeDB Overview
MicrobeDB OverviewMicrobeDB Overview
MicrobeDB Overview
 
Paprica course
Paprica coursePaprica course
Paprica course
 
D02-NextGenSeq-MOLGENIS
D02-NextGenSeq-MOLGENISD02-NextGenSeq-MOLGENIS
D02-NextGenSeq-MOLGENIS
 

Viewers also liked

CCBC tutorial beiko
CCBC tutorial beikoCCBC tutorial beiko
CCBC tutorial beikobeiko
 
Bacterial Identification by 16s rRNA Sequencing.ppt
Bacterial Identification by 16s rRNA Sequencing.pptBacterial Identification by 16s rRNA Sequencing.ppt
Bacterial Identification by 16s rRNA Sequencing.pptRakesh Kumar
 
Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.
Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.
Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.jennomics
 
Introduction to 16S Analysis with NGS - BMR Genomics
Introduction to 16S Analysis with NGS - BMR GenomicsIntroduction to 16S Analysis with NGS - BMR Genomics
Introduction to 16S Analysis with NGS - BMR GenomicsAndrea Telatin
 
Policy Brief-Costly Disease: How to reduce out of pocket expenditure in Diabe...
Policy Brief-Costly Disease: How to reduce out of pocket expenditure in Diabe...Policy Brief-Costly Disease: How to reduce out of pocket expenditure in Diabe...
Policy Brief-Costly Disease: How to reduce out of pocket expenditure in Diabe...Anupam Singh
 
Introduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysisIntroduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysisJosh Neufeld
 
Amplicon sequencing slides - Trina McMahon - MEWE 2013
Amplicon sequencing slides - Trina McMahon - MEWE 2013Amplicon sequencing slides - Trina McMahon - MEWE 2013
Amplicon sequencing slides - Trina McMahon - MEWE 2013mcmahonUW
 
Dr. Tom Burkey - Host-Microbe Interactions: Effects on nutrition and physiology
Dr. Tom Burkey - Host-Microbe Interactions: Effects on nutrition and physiologyDr. Tom Burkey - Host-Microbe Interactions: Effects on nutrition and physiology
Dr. Tom Burkey - Host-Microbe Interactions: Effects on nutrition and physiologyJohn Blue
 
[13.09.19] 16S workshop introduction
[13.09.19] 16S workshop introduction[13.09.19] 16S workshop introduction
[13.09.19] 16S workshop introductionMads Albertsen
 
16S Ribosomal DNA Sequence Analysis
16S Ribosomal DNA Sequence Analysis16S Ribosomal DNA Sequence Analysis
16S Ribosomal DNA Sequence AnalysisAbdulrahman Muhammad
 
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun SequencesTools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun SequencesSurya Saha
 

Viewers also liked (14)

CCBC tutorial beiko
CCBC tutorial beikoCCBC tutorial beiko
CCBC tutorial beiko
 
Bacterial Identification by 16s rRNA Sequencing.ppt
Bacterial Identification by 16s rRNA Sequencing.pptBacterial Identification by 16s rRNA Sequencing.ppt
Bacterial Identification by 16s rRNA Sequencing.ppt
 
Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.
Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.
Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.
 
Introduction to 16S Analysis with NGS - BMR Genomics
Introduction to 16S Analysis with NGS - BMR GenomicsIntroduction to 16S Analysis with NGS - BMR Genomics
Introduction to 16S Analysis with NGS - BMR Genomics
 
Policy Brief-Costly Disease: How to reduce out of pocket expenditure in Diabe...
Policy Brief-Costly Disease: How to reduce out of pocket expenditure in Diabe...Policy Brief-Costly Disease: How to reduce out of pocket expenditure in Diabe...
Policy Brief-Costly Disease: How to reduce out of pocket expenditure in Diabe...
 
Thesis
ThesisThesis
Thesis
 
16s
16s16s
16s
 
Introduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysisIntroduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysis
 
Amplicon sequencing slides - Trina McMahon - MEWE 2013
Amplicon sequencing slides - Trina McMahon - MEWE 2013Amplicon sequencing slides - Trina McMahon - MEWE 2013
Amplicon sequencing slides - Trina McMahon - MEWE 2013
 
Dr. Tom Burkey - Host-Microbe Interactions: Effects on nutrition and physiology
Dr. Tom Burkey - Host-Microbe Interactions: Effects on nutrition and physiologyDr. Tom Burkey - Host-Microbe Interactions: Effects on nutrition and physiology
Dr. Tom Burkey - Host-Microbe Interactions: Effects on nutrition and physiology
 
[13.09.19] 16S workshop introduction
[13.09.19] 16S workshop introduction[13.09.19] 16S workshop introduction
[13.09.19] 16S workshop introduction
 
Biodiversity
BiodiversityBiodiversity
Biodiversity
 
16S Ribosomal DNA Sequence Analysis
16S Ribosomal DNA Sequence Analysis16S Ribosomal DNA Sequence Analysis
16S Ribosomal DNA Sequence Analysis
 
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun SequencesTools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
 

Similar to Toast 2015 qiime_talk

GLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics WorkshopGLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics WorkshopMorgan Langille
 
RNA-Seq_analysis_course(2).pptx
RNA-Seq_analysis_course(2).pptxRNA-Seq_analysis_course(2).pptx
RNA-Seq_analysis_course(2).pptxBiancaMoreira45
 
Lightning
LightningLightning
LightningArvados
 
Introduction to Single-cell RNA-seq
Introduction to Single-cell RNA-seqIntroduction to Single-cell RNA-seq
Introduction to Single-cell RNA-seqTimothy Tickle
 
System Programming - Interprocess communication
System Programming - Interprocess communicationSystem Programming - Interprocess communication
System Programming - Interprocess communicationHelpWithAssignment.com
 
Scheduler Activations - Effective Kernel Support for the User-Level Managemen...
Scheduler Activations - Effective Kernel Support for the User-Level Managemen...Scheduler Activations - Effective Kernel Support for the User-Level Managemen...
Scheduler Activations - Effective Kernel Support for the User-Level Managemen...Kasun Gajasinghe
 
Making powerful science: an introduction to NGS and beyond
Making powerful science: an introduction to NGS and beyondMaking powerful science: an introduction to NGS and beyond
Making powerful science: an introduction to NGS and beyondAdamCribbs1
 
Operating-System-(1-3 group) Case study on windows Mac and linux among variou...
Operating-System-(1-3 group) Case study on windows Mac and linux among variou...Operating-System-(1-3 group) Case study on windows Mac and linux among variou...
Operating-System-(1-3 group) Case study on windows Mac and linux among variou...ssuser4a97d3
 
20120907 microbiome-intro
20120907 microbiome-intro20120907 microbiome-intro
20120907 microbiome-introLeo Lahti
 
Introduction to Convolutional Neural Networks
Introduction to Convolutional Neural NetworksIntroduction to Convolutional Neural Networks
Introduction to Convolutional Neural NetworksHannes Hapke
 
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekGenomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekData Driven Innovation
 
Australian Bioinformatics Conference (ABiC) 2014 Talk - Doing bioinformatics ...
Australian Bioinformatics Conference (ABiC) 2014 Talk - Doing bioinformatics ...Australian Bioinformatics Conference (ABiC) 2014 Talk - Doing bioinformatics ...
Australian Bioinformatics Conference (ABiC) 2014 Talk - Doing bioinformatics ...The University of Queensland
 
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817Ben Busby
 
Linux Internals - Kernel/Core
Linux Internals - Kernel/CoreLinux Internals - Kernel/Core
Linux Internals - Kernel/CoreShay Cohen
 

Similar to Toast 2015 qiime_talk (20)

GLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics WorkshopGLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics Workshop
 
sequencea.ppt
sequencea.pptsequencea.ppt
sequencea.ppt
 
Gwas.emes.comp
Gwas.emes.compGwas.emes.comp
Gwas.emes.comp
 
Chapter 1: Introduction to Unix / Linux Kernel
Chapter 1: Introduction to Unix / Linux KernelChapter 1: Introduction to Unix / Linux Kernel
Chapter 1: Introduction to Unix / Linux Kernel
 
RNA-Seq_analysis_course(2).pptx
RNA-Seq_analysis_course(2).pptxRNA-Seq_analysis_course(2).pptx
RNA-Seq_analysis_course(2).pptx
 
Automation using Puppet 3
Automation using Puppet 3 Automation using Puppet 3
Automation using Puppet 3
 
Lightning
LightningLightning
Lightning
 
Introduction to Single-cell RNA-seq
Introduction to Single-cell RNA-seqIntroduction to Single-cell RNA-seq
Introduction to Single-cell RNA-seq
 
System Programming - Interprocess communication
System Programming - Interprocess communicationSystem Programming - Interprocess communication
System Programming - Interprocess communication
 
Scheduler Activations - Effective Kernel Support for the User-Level Managemen...
Scheduler Activations - Effective Kernel Support for the User-Level Managemen...Scheduler Activations - Effective Kernel Support for the User-Level Managemen...
Scheduler Activations - Effective Kernel Support for the User-Level Managemen...
 
Making powerful science: an introduction to NGS and beyond
Making powerful science: an introduction to NGS and beyondMaking powerful science: an introduction to NGS and beyond
Making powerful science: an introduction to NGS and beyond
 
Operating-System-(1-3 group) Case study on windows Mac and linux among variou...
Operating-System-(1-3 group) Case study on windows Mac and linux among variou...Operating-System-(1-3 group) Case study on windows Mac and linux among variou...
Operating-System-(1-3 group) Case study on windows Mac and linux among variou...
 
20120907 microbiome-intro
20120907 microbiome-intro20120907 microbiome-intro
20120907 microbiome-intro
 
Introduction to Convolutional Neural Networks
Introduction to Convolutional Neural NetworksIntroduction to Convolutional Neural Networks
Introduction to Convolutional Neural Networks
 
Critical section operating system
Critical section  operating systemCritical section  operating system
Critical section operating system
 
Intro to illumina sequencing
Intro to illumina sequencingIntro to illumina sequencing
Intro to illumina sequencing
 
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekGenomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
 
Australian Bioinformatics Conference (ABiC) 2014 Talk - Doing bioinformatics ...
Australian Bioinformatics Conference (ABiC) 2014 Talk - Doing bioinformatics ...Australian Bioinformatics Conference (ABiC) 2014 Talk - Doing bioinformatics ...
Australian Bioinformatics Conference (ABiC) 2014 Talk - Doing bioinformatics ...
 
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
 
Linux Internals - Kernel/Core
Linux Internals - Kernel/CoreLinux Internals - Kernel/Core
Linux Internals - Kernel/Core
 

Toast 2015 qiime_talk

  • 1. QIIME: Quantitative Insights Into Microbial Ecology (part 1) Thomas Jeffries Federico M. Lauro Grazia Marina Quero Tiziano Minuzzo The Omics Analysis Sydney Tutorial Australian Museum 23rd -24th February 2015
  • 2. QIIME • Open source software package for taxonomic analysis of 16S rRNA sequences • UC Colorado & Northern Arizona • www.qiime.org (great resource…..) • Good community support • Can google most problems • Multi-platform • Widely used Caporaso Knight
  • 3. Getting QIIME Linux: https://github.com/qiime/qiime-deploy Mac: http://www.wernerlab.org/software/macqiime Ubuntu virtualbox: http://qiime.org/install/virtual_box.html Linux remote machine e.g UTS FEIT cluster, NECTAR: http://nectar.org.au/research-cloud http://qiime.org/install/install.html
  • 4. Data formats • 454: DNA sequences (FASTA, .fna) Quality (.qual) Mapping file (.txt) • Illumina Sequences and quality in same file (.fastq) Also supports paired end
  • 5. Getting into QIIME • Command line interface • Some very basic commands needed for QIIME: example: /folder$ programme.py -i file_in -o file_out ls :list files in working directory cd : changes directory cd .. : goes back to parent directory ‘tab’ key: magically fills out file names mkdir : makes a directory pwd : tells you where you are
  • 6. QIIME tutorial and example data • Many tutorials @ http://qiime.org/tutorials/index.html • Good place to start: http://qiime.org/tutorials/tutorial.html • Great Microbial Ecology course (includes QIIME): http://edamame-course.org/ • A few of the commands have changed in the new version – the current commands are in this talk - and I have renamed the files to make it easier to follow
  • 8. Diversityα Alpha diversity is the diversity within ONE sample
  • 15. 1. Check mapping file format • Checks that format of mapping file is ok validate_mapping_file.py -m my_mapping_file.txt -o validate_mapping_file_output “No errors or warnings were found in mapping file”
  • 16. 1. Check mapping file Name (ID) of sample Primer Sequencing barcode Sample categories (treatments) Tab separated !!!
  • 17. Hands on – validate your mapping file validate_mapping_file.py -o moving_pictures_tutorial- 1.8.0/illumina/cid_l1/ -m moving_pictures_tutorial- 1.8.0/illumina/raw/filtered_mapping_l1.txt
  • 18. 2. De-multiplex - 454 • Using sample specific barcodes, identify each sequence with a sample (renames sequences) • Performs some QC:  Removes sequences < 200bp  Removes sequences with a quality score <25  Removes sequences with >6 ambiguous bases or >6 homopolymer runs split_libraries.py -m my_mapping_file.txt -f my_sequence_file.fna -q my_quality_file.qual -o split_library_output • Produces seqs.fna
  • 19. 2. De-multiplex - Illumina (Step 1) • If the samples contain paired-end reads, you first need to join them and update the barcodes using: join_paired_ends.py -f my_forw_reads.fastq -r my_rev_reads.fastq -b my_barcodes.fastq -o my_joined.fastq
  • 20. 2. De-multiplex - Illumina (Step 2)  Then you can proceed to the split libraries step. If the sequences are NOT paired-ends go directly to split_libraries_fastq.py. This step also performs the Illumina reads QC: split_libraries_fastq.py -m my_mapping_file.txt -i my_sequence_file.fastq -b my_barcodes.fastq -o split_library_output • Data from multiple lanes can be processed together by separating inputs with a comma (,) • Produces seqs.fna
  • 21. 1.8.0/illumina/raw/subsampled_s_1_sequence.fastq,moving_pictures_tutorial- 1.8.0/illumina/raw/subsampled_s_2_sequence.fastq,moving_pictures_tutorial- 1.8.0/illumina/raw/subsampled_s_3_sequence.fastq,moving_pictures_tutorial- 1.8.0/illumina/raw/subsampled_s_4_sequence.fastq,moving_pictures_tutorial- 1.8.0/illumina/raw/subsampled_s_5_sequence.fastq,moving_pictures_tutorial- 1.8.0/illumina/raw/subsampled_s_6_sequence.fastq -b moving_pictures_tutorial- 1.8.0/illumina/raw/subsampled_s_1_sequence_barcodes.fastq,moving_pictures_tutorial- 1.8.0/illumina/raw/subsampled_s_2_sequence_barcodes.fastq,moving_pictures_tutorial- 1.8.0/illumina/raw/subsampled_s_3_sequence_barcodes.fastq,moving_pictures_tutorial- 1.8.0/illumina/raw/subsampled_s_4_sequence_barcodes.fastq,moving_pictures_tutorial- 1.8.0/illumina/raw/subsampled_s_5_sequence_barcodes.fastq,moving_pictures_tutorial- 1.8.0/illumina/raw/subsampled_s_6_sequence_barcodes.fastq -m moving_pictures_tutorial- 1.8.0/illumina/raw/filtered_mapping_l1.txt,moving_pictures_tutorial- 1.8.0/illumina/raw/filtered_mapping_l2.txt,moving_pictures_tutorial- 1.8.0/illumina/raw/filtered_mapping_l3.txt,moving_pictures_tutorial- 1.8.0/illumina/raw/filtered_mapping_l4.txt,moving_pictures_tutorial- 1.8.0/illumina/raw/filtered_mapping_l5.txt,moving_pictures_tutorial- 1.8.0/illumina/raw/filtered_mapping_l6.txt count_seqs.py -i moving_pictures_tutorial-1.8.0/illumina/slout/seqs.fna Hands on: split your libraries
  • 22. 3. OTU picking strategies • De Novo OTU picking: clustering of sequences at 97% Overlapping sequences No reference database necessary computationally expensive • Closed-Reference non overlapping reads needs reference database discards sequences with no match - e.g. no erroneous reads • Open-reference Overlapping reads reads clustered against reference and non matching reads are clustered de- novo
  • 23. Hands on – picking O.T.U.s pick_open_reference_otus.py -o moving_pictures_tutorial-1.8.0/illumina/otus/ -i moving_pictures_tutorial-1.8.0/illumina/slout/seqs.fna -r gg_13_8_otus/rep_set/97_otus.fasta -p moving_pictures_tutorial- 1.8.0/uc_fast_params.txt pick_de_novo_otus.py -o moving_pictures_tutorial-1.8.0/illumina/otus_denovo/ -i moving_pictures_tutorial-1.8.0/illumina/slout/seqs.fna
  • 24. 3. Pick OTUs Note: following steps can be automated by (what we are doing): pick_de_novo_otus.py –i seqs.fna -o otus pick_otus.py -i seqs.fna -o picked_otus_default •Will cluster your sequences at 97% similarity (can change this if you wish) and produce ‘seqs_otus.txt’ which maps each sequence to a cluster •Uses UCLUST algorithm (Edgar, 2010, Bioinformatics)
  • 25. 3. Pick OTUs Generate OTUs by clustering reads based on similarity (default is 97%) Sort reads according to size (long -> short) Cluster OTU1 OTU2 OTU3 OTU4 OTU5
  • 26. 4. Pick representative sequences • We want a representative sequence for each OTU – time consuming to annotate each sequence and they are already clustered…… • This will take the most abundant sequence in each OTU and make a file that has 1 sequence for each OTU (rep_set1.fna) pick_rep_set.py -i seqs_otus.txt -f seqs.fna -o rep_set1.fna
  • 27. 5. Annotate (assign taxonomy to each OTU) • Compare each representative sequence to a database using one of several algorithms: • UCLUST, BLAST, RDP Classifier, et al…..…. • New Defaults: UCLUST against the Greengenes database assign_taxonomy.py -i rep_set1.fna (output in directory: uclust_assigned_taxonomy) • BLAST example (reference sequences and taxonomy downloaded from database): assign_taxonomy.py -i rep_set1.fna -r ref_seq_set.fna -t id_to_taxonomy.txt -m blast
  • 28. 5. Annotate • Some useful databases that are compatible with QIIME: http://greengenes.secondgenome.com Good for everything and default in QIIME http://unite.ut.ee Fungal Internal Transcribed Spacer (ITS) Good for soil fungi http://www.arb-silva.de Contains both 16S and 18S rRNA (Eukaryotes…) Good representation of marine taxa
  • 29. Recap
  • 30. Species A Species B Species C mixed amplicons Sample 1 Sample 2 Sample 3 OTU 1 OTU 2 OTU 3 Split library into samples using barcodes Used clustering to choose OTUs Picked a representative sequences and assigned taxonomy Reference database
  • 31. 6. Putting it all together: making an OTU table • Need to combine the OTU identity with the abundance information in the clusters and link back to each sample so we can do ECOLOGY • The table is in .biom format: • http://biom-format.org/documentation/biom_format.html • Convert to text file: • biom convert -i otu_table.biom -o otu_table.txt --table-type "otu table" --header-key taxonomy –b make_otu_table.py -i seqs_otus.txt -t rep_set1_tax_assignments.txt -o otu_table.biom
  • 32. Closed reference O.T.U. picking pick_closed_reference_otus.py -i seqs.fna -r reference.fna -o otus_w_tax/ -t taxa_map.txt •Reference is database i.e. greengenes unaligned 97% otus and matching taxa map (same files as for BLAST) •Output has all of your sequences aligned to greengenes and an OTU table •So this picks OTUs and Assign taxonomy in 1 step (but loose non- matching sequences….do we care? – taxa summaries no, beta- diversity maybe….) •Quick – good for illumina
  • 33. 7. Aligning sequences • Back to our representative sequences…. • How closely related are the organisms present in the samples i.e. what is the phylogeny of our community and how does this shift between samples • Default: PYNAST to align samples to a reference set of pre- aligned sequences (e.g. greengenes ALIGNED) – more computationally efficient than de novo alignment • Can also select other methods e.g. MUSCLE, align_seqs.py -i rep_set1.fna –o pynast_aligned/
  • 34. 7. Aligning sequences • Not all regions of the rRNA gene are informative or useful for phylogenetic inference • Gaps – short length sequence vs full length rRNA gene • filter_alignment.py -i rep_set1_aligned.fasta -o filtered_alignment/ • Optional lanemask template that defines informative regions for some databases • filter_alignment.py -i seqs_rep_set_aligned.fasta -m lanemask_in_1s_and_0s -o filtered_alignment/ • If you are going to use this alignment for making a phylogenetic tree this step is essential…..
  • 35. A note on chimera removal •Chimeras sequences formed from DNA of 2 or more organisms (artifact of PCR amplification) •QIIME uses ChimeraSlayer to detect chimeric sequences using your alignment and a reference database •You should then remove these OTU’s from your OTU table and alignment before proceeding with tree building and visualization of results : •-e chimeric_seqs.txt when making OTU table, filter_fasta.py for alignment identify_chimeric_seqs.py -m ChimeraSlayer -i rep_set_aligned.fasta -a reference_set1_aligned.fasta -o chimeric_seqs.txt
  • 36. 8. Make a phylogenetic tree make_phylogeny.py -i rep_set1_aligned_pfiltered.fasta -o rep_phylo.tre • Builds a tree from the alignment using FastTree • Outputs a tree in newick format (.tre) which can be opened with software such as FigTree or can be used to calculate phylogenetic metrics • Also filter Chimeras from tree
  • 37. We now have 2 final outputs: • OTU Table 1.Taxonomic composition 2. -diversity (e.g. ‘species’ richness)α 3. -diversity (e.g. abundance similarity between samples)β • Phylogenetic tree 1.Phylogenetic -diversityβ QIIME has powerful visualization and statistical tools
  • 38. Hands on – reformatting outputs biom convert -i "otu table" --header-key taxonomy -b moving_pictures_tutorial- 1.8.0/illumina/otus_denovo/otu_table.biom -o moving_pictures_tutorial- 1.8.0/illumina/otus_denovo/otu_table.txt --table-type filter_alignment.py -i moving_pictures_tutorial- 1.8.0/illumina/otus_denovo/pynast_aligned_seqs/seqs_rep_se t_aligned.fasta -o moving_pictures_tutorial- 1.8.0/illumina/otus_denovo/pynast_aligned_seqs/filtered_align ment We have automated (piped) most of the steps I have talked about We need to convert the OTU table to a text file and filter the alignment
  • 39. 9. Merging the mapping files • We started with 6 lanes of Illumina but now we have a single OTU table. The merged mapping file will have duplicated barcodes but these are not used anymore (already demultiplexed): • merge_mapping_files.py -o combined_mapping_file.txt -m mapfile1.txt,mapfile2.txt…,mapfilexxx.txt
  • 40. Hands on – merge your mapping files merge_mapping_files.py -o moving_pictures_tutorial- 1.8.0/illumina/combined_mapping_file.txt -m moving_pictures_tutorial- 1.8.0/illumina/raw/filtered_mapping_l1.txt,moving_pictures_tutorial- 1.8.0/illumina/raw/filtered_mapping_l2.txt,moving_pictures_tutorial- 1.8.0/illumina/raw/filtered_mapping_l3.txt,moving_pictures_tutorial- 1.8.0/illumina/raw/filtered_mapping_l4.txt,moving_pictures_tutorial- 1.8.0/illumina/raw/filtered_mapping_l5.txt,moving_pictures_tutorial- 1.8.0/illumina/raw/filtered_mapping_l6.txt biom summarize-table -i moving_pictures_tutorial- 1.8.0/illumina/otus_denovo/otu_table.biom -o moving_pictures_tutorial- 1.8.0/illumina/otus_denovo/otu_table.summary
  • 41. Visualizing diversity 1 – community composition biom summarize-table –i otu_table.biom –o otu_table_summary.txt Counts/Sample detail: L3S237: 138.0 L3S235: 187.0 L3S372: 205.0 L3S373: 228.0 L3S367: 259.0 L3S370: 273.0 L3S368: 274.0 L3S369: 284.0 • Summary of OTU table: we want to standardize the number of sequences (sampling depth) to allow accurate comparison Ie. 146 sequences single_rarefaction.py -i otu_table.biom -o otu_table_even146.biom -d 138 alpha_rarefaction.py -i otu_table.biom -m combined_mapping_file.txt -o rarefaction/ -t rep_set.tre
  • 42. • How ‘deep’ do we need to go to adequately sample community? = Rarefaction analysis • number of species increase until a point where producing more sequence does not significantly increase the number of observed species • repeated subsampling of your data at different intervals. Plots subsamples against the number of observed species. If curves flatten, then you have sequenced at sufficient depth. • Rarefaction trade off between ‘keeping’ samples below a given sequence cut-off and loosing diversity Visualizing diversity 1 – community composition
  • 43. Hands on - Rarefaction single_rarefaction.py -i moving_pictures_tutorial- 1.8.0/illumina/otus_denovo/otu_table.biom -o moving_pictures_tutorial- 1.8.0/illumina/otus_denovo/otu_table_even138.biom -d 138 alpha_rarefaction.py -i moving_pictures_tutorial- 1.8.0/illumina/otus_denovo/otu_table.biom -o moving_pictures_tutorial- 1.8.0/illumina/otus_denovo/rarefaction/ -m moving_pictures_tutorial- 1.8.0/illumina/combined_mapping_file.txt -t moving_pictures_tutorial- 1.8.0/illumina/otus_denovo/rep_set.tre
  • 45. Software references: QIIME Caporaso et al 2010. QIIME allows analysis of high-throughput community sequencing data. Nature Methods 7(5): 335-336. UCLUST Edgar RC. 2010. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26(19):2460-2461. BLAST Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local alignment search tool. J Mol Biol 215(3):403-410. GRENGENES McDonald et al 2012. An improved Greengenes taxonomy with explicit ranks for ecological and  evolutionary analyses of bacteria and archaea. ISME J 6(3): 610–618.  RDP Classifier Wang Q, Garrity GM, Tiedje JM, Cole JR. 2007. Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl Environ Microb 73(16): 5261-5267. PyNAST Caporaso JG et al 2010. PyNAST: a flexible tool for aligning sequences to a template alignment. Bioinformatics 26:266-267. ChimeraSlayer Haas BJ, Gevers D, Earl AM, Feldgarden M, Ward DV, Giannoukos G, et al. 2011. Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons. Genome Research 21:494-504. MUSCLE Edgar, R.C. 2004 MUSCLE: multiple sequence alignment with high accuracy and high throughput  Nucleic Acids Res:1792-1797 FasttTree Price MN, Dehal PS, Arkin AP. 2010. FastTree 2-Approximately Maximum-Likelihood Trees for Large Alignments. Plos One 5(3) UNIFRAC Lozupone C, Knight R. 2005. UniFrac: a new phylogenetic method for comparing microbial communities. Appl Environ Microbiol 71(12): 8228-8235. Emperor Vazquez-Baeza Y, Pirrung M, Gonzalez A, Knight R. 2013. Emperor: A tool for visualizing high-throughput microbial community data. Gigascience 2(1):16.