QIIME: Quantitative Insights Into
Microbial Ecology (part 1)
Thomas Jeffries
Federico M. Lauro
Grazia Marina Quero
Tiziano Minuzzo
The Omics Analysis Sydney Tutorial
Australian Museum
February 2015
• Open source software package for taxonomic analysis of 16S
rRNA sequences
• UC Colorado & Northern Arizona
• (great resource…..)
• Good community support
• Can google most problems
• Multi-platform
• Widely used
Getting QIIME
Ubuntu virtualbox:
Linux remote machine e.g UTS FEIT cluster, NECTAR:
Data formats
• 454:
DNA sequences (FASTA, .fna)
Quality (.qual)
Mapping file (.txt)
• Illumina
Sequences and quality in same file (.fastq)
Also supports paired end
Getting into QIIME
• Command line interface
• Some very basic commands needed for QIIME:
/folder$ -i file_in -o
ls :list files in working directory
cd : changes directory
cd .. : goes back to parent directory
‘tab’ key: magically fills out file names
mkdir : makes a directory
pwd : tells you where you are
QIIME tutorial and example data
• Many tutorials @
• Good place to start:
• Great Microbial Ecology course (includes QIIME):
• A few of the commands have changed in the new version – the current
commands are in this talk - and I have renamed the files to make it easier to
Some useful terminology
Alpha diversity is the diversity within ONE sample
Diversity: Richnessα
Common metric: Pielou’s evenness
Tutorial dataset
Tutorial dataset
1. Check mapping file format
• Checks that format of mapping file is ok -m my_mapping_file.txt -o
“No errors or warnings were found in mapping file”
1. Check mapping file
Name (ID) of
Sample categories
Tab separated !!!
Hands on – validate your mapping
file -o
1.8.0/illumina/cid_l1/ -m
2. De-multiplex - 454
• Using sample specific barcodes, identify each sequence
with a sample (renames sequences)
• Performs some QC:
 Removes sequences < 200bp
 Removes sequences with a quality score <25
 Removes sequences with >6 ambiguous bases or >6
homopolymer runs -m my_mapping_file.txt -f
my_sequence_file.fna -q my_quality_file.qual -o
• Produces seqs.fna
2. De-multiplex - Illumina (Step 1)
• If the samples contain paired-end reads, you first need to
join them and update the barcodes using: -f my_forw_reads.fastq -r
my_rev_reads.fastq -b my_barcodes.fastq -o
2. De-multiplex - Illumina (Step 2)
 Then you can proceed to the split libraries step. If the
sequences are NOT paired-ends go directly to This step also performs the
Illumina reads QC: -m my_mapping_file.txt -i
my_sequence_file.fastq -b my_barcodes.fastq -o
• Data from multiple lanes can be processed together by separating
inputs with a comma (,)
• Produces seqs.fna
1.8.0/illumina/raw/subsampled_s_6_sequence.fastq -b moving_pictures_tutorial-
1.8.0/illumina/raw/subsampled_s_6_sequence_barcodes.fastq -m
1.8.0/illumina/raw/filtered_mapping_l6.txt -i moving_pictures_tutorial-1.8.0/illumina/slout/seqs.fna
Hands on: split your libraries
3. OTU picking strategies
• De Novo OTU picking: clustering of sequences at 97%
Overlapping sequences
No reference database necessary
computationally expensive
• Closed-Reference
non overlapping reads
needs reference database
discards sequences with no match - e.g. no erroneous reads
• Open-reference
Overlapping reads
reads clustered against reference and non matching reads are clustered de-
Hands on – picking O.T.U.s -o moving_pictures_tutorial-1.8.0/illumina/otus/ -i
moving_pictures_tutorial-1.8.0/illumina/slout/seqs.fna -r
gg_13_8_otus/rep_set/97_otus.fasta -p moving_pictures_tutorial-
1.8.0/uc_fast_params.txt -o moving_pictures_tutorial-1.8.0/illumina/otus_denovo/ -i
3. Pick OTUs
Note: following steps can be automated by (what we are doing): –i seqs.fna -o otus -i seqs.fna -o picked_otus_default
•Will cluster your sequences at 97% similarity (can change this if
you wish) and produce ‘seqs_otus.txt’ which maps each
sequence to a cluster
•Uses UCLUST algorithm (Edgar, 2010, Bioinformatics)
3. Pick OTUs
Generate OTUs by clustering reads based on similarity (default is
Sort reads according to size (long -> short)
4. Pick representative sequences
• We want a representative sequence for each OTU – time
consuming to annotate each sequence and they are already
• This will take the most abundant sequence in each OTU and
make a file that has 1 sequence for each OTU (rep_set1.fna) -i seqs_otus.txt -f seqs.fna -o rep_set1.fna
5. Annotate (assign taxonomy to each
• Compare each representative sequence to a database using one
of several algorithms:
• UCLUST, BLAST, RDP Classifier, et al…..….
• New Defaults: UCLUST against the Greengenes database -i rep_set1.fna
(output in directory: uclust_assigned_taxonomy)
• BLAST example (reference sequences and taxonomy
downloaded from database): -i rep_set1.fna -r ref_seq_set.fna -t
id_to_taxonomy.txt -m blast
5. Annotate
• Some useful databases that are compatible with QIIME:
Good for everything and default in
Fungal Internal Transcribed Spacer (ITS)
Good for soil fungi
Contains both 16S and 18S rRNA (Eukaryotes…)
Good representation of marine taxa
Species A
Species B
Species C
Sample 1
Sample 2
Sample 3
Split library into
samples using
Used clustering
to choose OTUs
Picked a
sequences and
6. Putting it all together: making an
OTU table
• Need to combine the OTU identity with the abundance
information in the clusters and link back to each sample so we
can do ECOLOGY
• The table is in .biom format:
• Convert to text file:
• biom convert -i otu_table.biom -o otu_table.txt --table-type "otu table" --header-key
taxonomy –b -i seqs_otus.txt -t
rep_set1_tax_assignments.txt -o otu_table.biom
Closed reference O.T.U. picking -i seqs.fna -r reference.fna -o
otus_w_tax/ -t taxa_map.txt
•Reference is database i.e. greengenes unaligned 97% otus and
matching taxa map (same files as for BLAST)
•Output has all of your sequences aligned to greengenes and an OTU
•So this picks OTUs and Assign taxonomy in 1 step (but loose non-
matching sequences….do we care? – taxa summaries no, beta-
diversity maybe….)
•Quick – good for illumina
7. Aligning sequences
• Back to our representative sequences….
• How closely related are the organisms present in the samples i.e.
what is the phylogeny of our community and how does this shift
between samples
• Default: PYNAST to align samples to a reference set of pre-
aligned sequences (e.g. greengenes ALIGNED) – more
computationally efficient than de novo alignment
• Can also select other methods e.g. MUSCLE, -i rep_set1.fna –o pynast_aligned/
7. Aligning sequences
• Not all regions of the rRNA gene are informative or useful for phylogenetic
• Gaps – short length sequence vs full length rRNA gene
• -i rep_set1_aligned.fasta -o
• Optional lanemask template that defines informative regions for some
• -i seqs_rep_set_aligned.fasta -m
lanemask_in_1s_and_0s -o filtered_alignment/
• If you are going to use this alignment for making a phylogenetic tree this step
is essential…..
A note on chimera removal
•Chimeras sequences formed from DNA of 2 or more organisms (artifact of PCR
•QIIME uses ChimeraSlayer to detect chimeric sequences using your alignment and a
reference database
•You should then remove these OTU’s from your OTU table and alignment before
proceeding with tree building and visualization of results :
•-e chimeric_seqs.txt when making OTU table, for alignment -m ChimeraSlayer -i rep_set_aligned.fasta -a
reference_set1_aligned.fasta -o chimeric_seqs.txt
8. Make a phylogenetic tree -i rep_set1_aligned_pfiltered.fasta -o
• Builds a tree from the alignment using FastTree
• Outputs a tree in newick format (.tre) which can be
opened with software such as FigTree or can be
used to calculate phylogenetic metrics
• Also filter Chimeras from tree
We now have 2 final outputs:
• OTU Table
1.Taxonomic composition
2. -diversity (e.g. ‘species’ richness)α
3. -diversity (e.g. abundance similarity between samples)β
• Phylogenetic tree
1.Phylogenetic -diversityβ
QIIME has powerful visualization and statistical
Hands on – reformatting outputs
biom convert -i "otu table" --header-key taxonomy -b
1.8.0/illumina/otus_denovo/otu_table.biom -o
1.8.0/illumina/otus_denovo/otu_table.txt --table-type -i moving_pictures_tutorial-
t_aligned.fasta -o moving_pictures_tutorial-
We have automated (piped) most of the steps I have talked about
We need to convert the OTU table to a text file and filter the alignment
9. Merging the mapping files
• We started with 6 lanes of Illumina but now we have a single OTU table. The
merged mapping file will have duplicated barcodes but these are not used
anymore (already demultiplexed):
• -o combined_mapping_file.txt -m
Hands on – merge your mapping files -o moving_pictures_tutorial-
1.8.0/illumina/combined_mapping_file.txt -m moving_pictures_tutorial-
biom summarize-table -i moving_pictures_tutorial-
1.8.0/illumina/otus_denovo/otu_table.biom -o moving_pictures_tutorial-
Visualizing diversity 1 – community
biom summarize-table –i otu_table.biom –o otu_table_summary.txt
Counts/Sample detail:
L3S237: 138.0
L3S235: 187.0
L3S372: 205.0
L3S373: 228.0
L3S367: 259.0
L3S370: 273.0
L3S368: 274.0
L3S369: 284.0
• Summary of OTU table: we want to standardize the number of
sequences (sampling depth) to allow accurate comparison
Ie. 146 sequences -i otu_table.biom -o otu_table_even146.biom -d 138 -i otu_table.biom -m combined_mapping_file.txt -o
rarefaction/ -t rep_set.tre
• How ‘deep’ do we need to go to adequately
sample community? = Rarefaction analysis
• number of species increase until a point
where producing more sequence does not
significantly increase the number of
observed species
• repeated subsampling of your data at
different intervals. Plots subsamples against
the number of observed species. If curves
flatten, then you have sequenced at
sufficient depth.
• Rarefaction trade off between ‘keeping’
samples below a given sequence cut-off and
loosing diversity
Visualizing diversity 1 – community
Hands on - Rarefaction -i moving_pictures_tutorial-
1.8.0/illumina/otus_denovo/otu_table.biom -o moving_pictures_tutorial-
1.8.0/illumina/otus_denovo/otu_table_even138.biom -d 138 -i moving_pictures_tutorial-
1.8.0/illumina/otus_denovo/otu_table.biom -o moving_pictures_tutorial-
1.8.0/illumina/otus_denovo/rarefaction/ -m moving_pictures_tutorial-
1.8.0/illumina/combined_mapping_file.txt -t moving_pictures_tutorial-
Visualizing and comparing diversity
Software references:
QIIME Caporaso et al 2010. QIIME allows analysis of high-throughput community sequencing data. Nature Methods
7(5): 335-336.
UCLUST Edgar RC. 2010. Search and clustering orders of magnitude faster than BLAST. Bioinformatics
BLAST Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local alignment search tool. J Mol Biol
GRENGENES McDonald et al 2012. An improved Greengenes taxonomy with explicit ranks for ecological and 
evolutionary analyses of bacteria and archaea. ISME J 6(3): 610–618. 
RDP Classifier Wang Q, Garrity GM, Tiedje JM, Cole JR. 2007. Naive Bayesian classifier for rapid assignment of
rRNA sequences into the new bacterial taxonomy. Appl Environ Microb 73(16): 5261-5267.
PyNAST Caporaso JG et al 2010. PyNAST: a flexible tool for aligning sequences to a template alignment.
Bioinformatics 26:266-267.
ChimeraSlayer Haas BJ, Gevers D, Earl AM, Feldgarden M, Ward DV, Giannoukos G, et al. 2011. Chimeric 16S
rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons. Genome Research
MUSCLE Edgar, R.C. 2004 MUSCLE: multiple sequence alignment with high accuracy and high throughput
 Nucleic Acids Res:1792-1797
FasttTree Price MN, Dehal PS, Arkin AP. 2010. FastTree 2-Approximately Maximum-Likelihood Trees for Large
Alignments. Plos One 5(3)
UNIFRAC Lozupone C, Knight R. 2005. UniFrac: a new phylogenetic method for comparing microbial communities.
Appl Environ Microbiol 71(12): 8228-8235.
Emperor Vazquez-Baeza Y, Pirrung M, Gonzalez A, Knight R. 2013. Emperor: A tool for visualizing high-throughput
microbial community data. Gigascience 2(1):16. 

  • 1. QIIME: Quantitative Insights Into Microbial Ecology (part 1) Thomas Jeffries Federico M. Lauro Grazia Marina Quero Tiziano Minuzzo The Omics Analysis Sydney Tutorial Australian Museum 23rd -24th February 2015
  • 2. QIIME • Open source software package for taxonomic analysis of 16S rRNA sequences • UC Colorado & Northern Arizona • (great resource…..) • Good community support • Can google most problems • Multi-platform • Widely used Caporaso Knight
  • 3. Getting QIIME Linux: Mac: Ubuntu virtualbox: Linux remote machine e.g UTS FEIT cluster, NECTAR:
  • 4. Data formats • 454: DNA sequences (FASTA, .fna) Quality (.qual) Mapping file (.txt) • Illumina Sequences and quality in same file (.fastq) Also supports paired end
  • 5. Getting into QIIME • Command line interface • Some very basic commands needed for QIIME: example: /folder$ -i file_in -o file_out ls :list files in working directory cd : changes directory cd .. : goes back to parent directory ‘tab’ key: magically fills out file names mkdir : makes a directory pwd : tells you where you are
  • 6. QIIME tutorial and example data • Many tutorials @ • Good place to start: • Great Microbial Ecology course (includes QIIME): • A few of the commands have changed in the new version – the current commands are in this talk - and I have renamed the files to make it easier to follow
  • 8. Diversityα Alpha diversity is the diversity within ONE sample
  • 15. 1. Check mapping file format • Checks that format of mapping file is ok -m my_mapping_file.txt -o validate_mapping_file_output “No errors or warnings were found in mapping file”
  • 16. 1. Check mapping file Name (ID) of sample Primer Sequencing barcode Sample categories (treatments) Tab separated !!!
  • 17. Hands on – validate your mapping file -o moving_pictures_tutorial- 1.8.0/illumina/cid_l1/ -m moving_pictures_tutorial- 1.8.0/illumina/raw/filtered_mapping_l1.txt
  • 18. 2. De-multiplex - 454 • Using sample specific barcodes, identify each sequence with a sample (renames sequences) • Performs some QC:  Removes sequences < 200bp  Removes sequences with a quality score <25  Removes sequences with >6 ambiguous bases or >6 homopolymer runs -m my_mapping_file.txt -f my_sequence_file.fna -q my_quality_file.qual -o split_library_output • Produces seqs.fna
  • 19. 2. De-multiplex - Illumina (Step 1) • If the samples contain paired-end reads, you first need to join them and update the barcodes using: -f my_forw_reads.fastq -r my_rev_reads.fastq -b my_barcodes.fastq -o my_joined.fastq
  • 20. 2. De-multiplex - Illumina (Step 2)  Then you can proceed to the split libraries step. If the sequences are NOT paired-ends go directly to This step also performs the Illumina reads QC: -m my_mapping_file.txt -i my_sequence_file.fastq -b my_barcodes.fastq -o split_library_output • Data from multiple lanes can be processed together by separating inputs with a comma (,) • Produces seqs.fna
  • 21. 1.8.0/illumina/raw/subsampled_s_1_sequence.fastq,moving_pictures_tutorial- 1.8.0/illumina/raw/subsampled_s_2_sequence.fastq,moving_pictures_tutorial- 1.8.0/illumina/raw/subsampled_s_3_sequence.fastq,moving_pictures_tutorial- 1.8.0/illumina/raw/subsampled_s_4_sequence.fastq,moving_pictures_tutorial- 1.8.0/illumina/raw/subsampled_s_5_sequence.fastq,moving_pictures_tutorial- 1.8.0/illumina/raw/subsampled_s_6_sequence.fastq -b moving_pictures_tutorial- 1.8.0/illumina/raw/subsampled_s_1_sequence_barcodes.fastq,moving_pictures_tutorial- 1.8.0/illumina/raw/subsampled_s_2_sequence_barcodes.fastq,moving_pictures_tutorial- 1.8.0/illumina/raw/subsampled_s_3_sequence_barcodes.fastq,moving_pictures_tutorial- 1.8.0/illumina/raw/subsampled_s_4_sequence_barcodes.fastq,moving_pictures_tutorial- 1.8.0/illumina/raw/subsampled_s_5_sequence_barcodes.fastq,moving_pictures_tutorial- 1.8.0/illumina/raw/subsampled_s_6_sequence_barcodes.fastq -m moving_pictures_tutorial- 1.8.0/illumina/raw/filtered_mapping_l1.txt,moving_pictures_tutorial- 1.8.0/illumina/raw/filtered_mapping_l2.txt,moving_pictures_tutorial- 1.8.0/illumina/raw/filtered_mapping_l3.txt,moving_pictures_tutorial- 1.8.0/illumina/raw/filtered_mapping_l4.txt,moving_pictures_tutorial- 1.8.0/illumina/raw/filtered_mapping_l5.txt,moving_pictures_tutorial- 1.8.0/illumina/raw/filtered_mapping_l6.txt -i moving_pictures_tutorial-1.8.0/illumina/slout/seqs.fna Hands on: split your libraries
  • 22. 3. OTU picking strategies • De Novo OTU picking: clustering of sequences at 97% Overlapping sequences No reference database necessary computationally expensive • Closed-Reference non overlapping reads needs reference database discards sequences with no match - e.g. no erroneous reads • Open-reference Overlapping reads reads clustered against reference and non matching reads are clustered de- novo
  • 23. Hands on – picking O.T.U.s -o moving_pictures_tutorial-1.8.0/illumina/otus/ -i moving_pictures_tutorial-1.8.0/illumina/slout/seqs.fna -r gg_13_8_otus/rep_set/97_otus.fasta -p moving_pictures_tutorial- 1.8.0/uc_fast_params.txt -o moving_pictures_tutorial-1.8.0/illumina/otus_denovo/ -i moving_pictures_tutorial-1.8.0/illumina/slout/seqs.fna
  • 24. 3. Pick OTUs Note: following steps can be automated by (what we are doing): –i seqs.fna -o otus -i seqs.fna -o picked_otus_default •Will cluster your sequences at 97% similarity (can change this if you wish) and produce ‘seqs_otus.txt’ which maps each sequence to a cluster •Uses UCLUST algorithm (Edgar, 2010, Bioinformatics)
  • 25. 3. Pick OTUs Generate OTUs by clustering reads based on similarity (default is 97%) Sort reads according to size (long -> short) Cluster OTU1 OTU2 OTU3 OTU4 OTU5
  • 26. 4. Pick representative sequences • We want a representative sequence for each OTU – time consuming to annotate each sequence and they are already clustered…… • This will take the most abundant sequence in each OTU and make a file that has 1 sequence for each OTU (rep_set1.fna) -i seqs_otus.txt -f seqs.fna -o rep_set1.fna
  • 27. 5. Annotate (assign taxonomy to each OTU) • Compare each representative sequence to a database using one of several algorithms: • UCLUST, BLAST, RDP Classifier, et al…..…. • New Defaults: UCLUST against the Greengenes database -i rep_set1.fna (output in directory: uclust_assigned_taxonomy) • BLAST example (reference sequences and taxonomy downloaded from database): -i rep_set1.fna -r ref_seq_set.fna -t id_to_taxonomy.txt -m blast
  • 28. 5. Annotate • Some useful databases that are compatible with QIIME: Good for everything and default in QIIME Fungal Internal Transcribed Spacer (ITS) Good for soil fungi Contains both 16S and 18S rRNA (Eukaryotes…) Good representation of marine taxa
  • 29. Recap
  • 30. Species A Species B Species C mixed amplicons Sample 1 Sample 2 Sample 3 OTU 1 OTU 2 OTU 3 Split library into samples using barcodes Used clustering to choose OTUs Picked a representative sequences and assigned taxonomy Reference database
  • 31. 6. Putting it all together: making an OTU table • Need to combine the OTU identity with the abundance information in the clusters and link back to each sample so we can do ECOLOGY • The table is in .biom format: • • Convert to text file: • biom convert -i otu_table.biom -o otu_table.txt --table-type "otu table" --header-key taxonomy –b -i seqs_otus.txt -t rep_set1_tax_assignments.txt -o otu_table.biom
  • 32. Closed reference O.T.U. picking -i seqs.fna -r reference.fna -o otus_w_tax/ -t taxa_map.txt •Reference is database i.e. greengenes unaligned 97% otus and matching taxa map (same files as for BLAST) •Output has all of your sequences aligned to greengenes and an OTU table •So this picks OTUs and Assign taxonomy in 1 step (but loose non- matching sequences….do we care? – taxa summaries no, beta- diversity maybe….) •Quick – good for illumina
  • 33. 7. Aligning sequences • Back to our representative sequences…. • How closely related are the organisms present in the samples i.e. what is the phylogeny of our community and how does this shift between samples • Default: PYNAST to align samples to a reference set of pre- aligned sequences (e.g. greengenes ALIGNED) – more computationally efficient than de novo alignment • Can also select other methods e.g. MUSCLE, -i rep_set1.fna –o pynast_aligned/
  • 34. 7. Aligning sequences • Not all regions of the rRNA gene are informative or useful for phylogenetic inference • Gaps – short length sequence vs full length rRNA gene • -i rep_set1_aligned.fasta -o filtered_alignment/ • Optional lanemask template that defines informative regions for some databases • -i seqs_rep_set_aligned.fasta -m lanemask_in_1s_and_0s -o filtered_alignment/ • If you are going to use this alignment for making a phylogenetic tree this step is essential…..
  • 35. A note on chimera removal •Chimeras sequences formed from DNA of 2 or more organisms (artifact of PCR amplification) •QIIME uses ChimeraSlayer to detect chimeric sequences using your alignment and a reference database •You should then remove these OTU’s from your OTU table and alignment before proceeding with tree building and visualization of results : •-e chimeric_seqs.txt when making OTU table, for alignment -m ChimeraSlayer -i rep_set_aligned.fasta -a reference_set1_aligned.fasta -o chimeric_seqs.txt
  • 36. 8. Make a phylogenetic tree -i rep_set1_aligned_pfiltered.fasta -o rep_phylo.tre • Builds a tree from the alignment using FastTree • Outputs a tree in newick format (.tre) which can be opened with software such as FigTree or can be used to calculate phylogenetic metrics • Also filter Chimeras from tree
  • 37. We now have 2 final outputs: • OTU Table 1.Taxonomic composition 2. -diversity (e.g. ‘species’ richness)α 3. -diversity (e.g. abundance similarity between samples)β • Phylogenetic tree 1.Phylogenetic -diversityβ QIIME has powerful visualization and statistical tools
  • 38. Hands on – reformatting outputs biom convert -i "otu table" --header-key taxonomy -b moving_pictures_tutorial- 1.8.0/illumina/otus_denovo/otu_table.biom -o moving_pictures_tutorial- 1.8.0/illumina/otus_denovo/otu_table.txt --table-type -i moving_pictures_tutorial- 1.8.0/illumina/otus_denovo/pynast_aligned_seqs/seqs_rep_se t_aligned.fasta -o moving_pictures_tutorial- 1.8.0/illumina/otus_denovo/pynast_aligned_seqs/filtered_align ment We have automated (piped) most of the steps I have talked about We need to convert the OTU table to a text file and filter the alignment
  • 39. 9. Merging the mapping files • We started with 6 lanes of Illumina but now we have a single OTU table. The merged mapping file will have duplicated barcodes but these are not used anymore (already demultiplexed): • -o combined_mapping_file.txt -m mapfile1.txt,mapfile2.txt…,mapfilexxx.txt
  • 40. Hands on – merge your mapping files -o moving_pictures_tutorial- 1.8.0/illumina/combined_mapping_file.txt -m moving_pictures_tutorial- 1.8.0/illumina/raw/filtered_mapping_l1.txt,moving_pictures_tutorial- 1.8.0/illumina/raw/filtered_mapping_l2.txt,moving_pictures_tutorial- 1.8.0/illumina/raw/filtered_mapping_l3.txt,moving_pictures_tutorial- 1.8.0/illumina/raw/filtered_mapping_l4.txt,moving_pictures_tutorial- 1.8.0/illumina/raw/filtered_mapping_l5.txt,moving_pictures_tutorial- 1.8.0/illumina/raw/filtered_mapping_l6.txt biom summarize-table -i moving_pictures_tutorial- 1.8.0/illumina/otus_denovo/otu_table.biom -o moving_pictures_tutorial- 1.8.0/illumina/otus_denovo/otu_table.summary
  • 41. Visualizing diversity 1 – community composition biom summarize-table –i otu_table.biom –o otu_table_summary.txt Counts/Sample detail: L3S237: 138.0 L3S235: 187.0 L3S372: 205.0 L3S373: 228.0 L3S367: 259.0 L3S370: 273.0 L3S368: 274.0 L3S369: 284.0 • Summary of OTU table: we want to standardize the number of sequences (sampling depth) to allow accurate comparison Ie. 146 sequences -i otu_table.biom -o otu_table_even146.biom -d 138 -i otu_table.biom -m combined_mapping_file.txt -o rarefaction/ -t rep_set.tre
  • 42. • How ‘deep’ do we need to go to adequately sample community? = Rarefaction analysis • number of species increase until a point where producing more sequence does not significantly increase the number of observed species • repeated subsampling of your data at different intervals. Plots subsamples against the number of observed species. If curves flatten, then you have sequenced at sufficient depth. • Rarefaction trade off between ‘keeping’ samples below a given sequence cut-off and loosing diversity Visualizing diversity 1 – community composition
  • 43. Hands on - Rarefaction -i moving_pictures_tutorial- 1.8.0/illumina/otus_denovo/otu_table.biom -o moving_pictures_tutorial- 1.8.0/illumina/otus_denovo/otu_table_even138.biom -d 138 -i moving_pictures_tutorial- 1.8.0/illumina/otus_denovo/otu_table.biom -o moving_pictures_tutorial- 1.8.0/illumina/otus_denovo/rarefaction/ -m moving_pictures_tutorial- 1.8.0/illumina/combined_mapping_file.txt -t moving_pictures_tutorial- 1.8.0/illumina/otus_denovo/rep_set.tre
  • 45. Software references: QIIME Caporaso et al 2010. QIIME allows analysis of high-throughput community sequencing data. Nature Methods 7(5): 335-336. UCLUST Edgar RC. 2010. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26(19):2460-2461. BLAST Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local alignment search tool. J Mol Biol 215(3):403-410. GRENGENES McDonald et al 2012. An improved Greengenes taxonomy with explicit ranks for ecological and  evolutionary analyses of bacteria and archaea. ISME J 6(3): 610–618.  RDP Classifier Wang Q, Garrity GM, Tiedje JM, Cole JR. 2007. Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl Environ Microb 73(16): 5261-5267. PyNAST Caporaso JG et al 2010. PyNAST: a flexible tool for aligning sequences to a template alignment. Bioinformatics 26:266-267. ChimeraSlayer Haas BJ, Gevers D, Earl AM, Feldgarden M, Ward DV, Giannoukos G, et al. 2011. Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons. Genome Research 21:494-504. MUSCLE Edgar, R.C. 2004 MUSCLE: multiple sequence alignment with high accuracy and high throughput  Nucleic Acids Res:1792-1797 FasttTree Price MN, Dehal PS, Arkin AP. 2010. FastTree 2-Approximately Maximum-Likelihood Trees for Large Alignments. Plos One 5(3) UNIFRAC Lozupone C, Knight R. 2005. UniFrac: a new phylogenetic method for comparing microbial communities. Appl Environ Microbiol 71(12): 8228-8235. Emperor Vazquez-Baeza Y, Pirrung M, Gonzalez A, Knight R. 2013. Emperor: A tool for visualizing high-throughput microbial community data. Gigascience 2(1):16.