1. Statistical and Visualization Methods for Metagenomic Analysis
Héctor Corrada Bravo
Center for Bioinformatics and Computational Biology
2. • metagenomeSeq
– 16S differential abundance
– R/Bioconductor infrastructure for
metagenomic assays
– Longitudinal data
• metagenomicFeatures
– Incipient attempt regularizing 16S feature
annotations in R/Bioconductor
– E.g., greengenes13.5MgDb
• msd16s
– Example data, as infrastructure object
3. R/Bioconductor Strengths
• Infrastructure objects
– Interoperability, speed up startup time for method development
• Strict development practices
– Documentation, use cases, vignettes
• Annotation infrastructure
– Again, interoperability across experiments and data types
• Exploratory analysis
• Reproducibility
– Vignettes, Rmarkdown, etc.
• Recently, exploratory and interactive visualization
– Shiny, epiviz
4. Integrative, visual and computational
exploratory analysis of genomic data
• Browser-based
• Interactive
• Integration of data
• Reproducible dissemination
• Communication with R/Bioconductor: epivizr package
software systems to support creative exploratory analysis of large genome-wide datasets...
7. Dynamically extensible: Easily integrate new data sources, data
types and add new visualizations.
Data providers define coordinate
space
8. One interpretation of Big Data is many sources of relevant
contextual data
• Easily access/integrate contextual data
• Driven by exploratory analysis of immediate
data
• Iterative process
• Visual and computational exploration go
hand in hand
9. Visualization design goals
Context
• Integrate and align multiple data sources;
navigate; search
• Connect: brushing
• Encode: map visualization properties to
data on the fly
• Reconfigure: multiple views of the same
data
10. Visualization design goals
Data
• Select and filter: tight-knit integration with
R/Bioconductor
• (current work) filters on visualization
propagate to data environment
Model
• New 'measurements' the result of
modeling; suggested by data context
11. Metagenomic Visualization
• How to effectively navigate large datasets
where features are organized hierarchically?
• Metaviz: browser-based, interactive
exploratory analysis of metagenomic
data
• Connection to R/Bioconductor with
metavizr package
• Built on metagenomeSeq and
metagenomeFeatures infrastructure
12.
13.
14.
15. Metaviz
• Exploration of hierarchically organized
features
• Geared towards 16S for now
– Hierarchical organization relevant to WGS
• Integration is a big part of design
– Framework designed for data integration
16. Acknowledgements
Brianna Lindsey, O. Colin Stine, Owen White, Anup Mahurkar: University of Maryland Baltimore
Jim Nataro: University of Virginia
NIGMS, Genentech
Florin Chelaru
(now @ MIT)
Joseph Paulson
(now @ Harvard)
Mihai Pop
(@ UMD)