SlideShare una empresa de Scribd logo
1 de 27
Descargar para leer sin conexión
R tools for HiC data visualizationR tools for HiC data visualization
Nathalie Vialaneix, INRAE/MIATNathalie Vialaneix, INRAE/MIAT
Chrocogen, January 31st 2020Chrocogen, January 31st 2020
1 / 271 / 27
First ofall... a bit ofbibliographyFirst ofall... a bit ofbibliography
2 / 272 / 27
What did I use to make this presentation?
a github repository with a bunch of references, classified by themes
https://github.com/mdozmorov/HiC_tools#visualization
two reviews on the topic: [Yardimci & Noble, 2017] (5 tools, no R package),
[Ing-Simons & Vaquerizas, CB 2019] (12 tools, 2/3 R packages, half of the
tools are interactive)
bioconductor research tool
and I identi ed...
HiT-C - HiCBricks - DNARchitect (with a shiny interface) - GENOVA - Gviz
and GenomicInteraction - Sushi - HiCeekR (the last one sent by a colleague
the day after I finished my slides)
(+ Pierre's package adjclust that is on CRAN)
3 / 27
What did I learn fromthat?
a large number of interactive tools already exist
in another review [Lin et al, WSBM 2019], you can also find tools for 3D
visualization of Hi-C data
however, most tools seem to propose very common approaches for Hi-C
data visualization, even the interactive tools
a problem still remains: find appropriate standards to store the data and
load them into the software (multiple standards currently exist, with no
clear consensus yet)
4 / 27
Format speci cationsFormat speci cations
5 / 275 / 27
Import and format ofthe di erent tools
HiTC (bioconductor, 7.5 years): (function importC)
input: mandatory: a CSV (tab separated) file with bin pairs and a BED
file describing the bins (chr | start | end | bin nb) outputs of HiC-
Pro
class: HTCexp (for submatrices) or HTClist (for all matrices) with
slots intdata (interaction matrix; can be sparse) and xgi/ygi
(GRanges objects describing the bins) can be used directly
HiCBricks (bioconductor, 1 year): (functions Create_many_Bricks +
Brick_load_matrix / Create_many_Bricks_from_mcool)
input: mandatory TXT (space separated) files with the count matrices
for every chromosome and a BED file describing the bins (chr | start |
end) by order of appearance in the matrix OR .mcool files AND soon
available .hic files
class: BrickContainer that does not incorporate the data
themselves but only information on the chromosomes (names and
lengths) and on files in which the information (bin description and
interactions) is stored. When creating this object, a directory is
created with HDF (Hierarchical Data Format) files with the data in
them
⇒
⇒
6 / 27
Import and format ofthe di erent tools
DNA_Rchitect (web shiny interface at
http://shiny.immgen.org/DNARchitect/)
input: TXT file, separated by comma, semicolumn or tabulations, with
the following columns (chrom1 | start1 | end1 | chrom2 | start2 |
end2 | score | samplenumber) BEDPE files
GENOVA (github repository https://github.com/robinweide/GENOVA, not
properly documented and full of bugs): (functions read_bedpe,
read.hicpro.matrix)
input: mandatory: a CSV (tab separated) file with bin pairs and a BED
file describing the bins (chr | start | end | bin nb) outputs of HiC-
Pro OR BEDPE files. It is said that it can handle .cool files OR .hic
files but I haven't found where
class(?): contacts that contains the slots MAT (triplet interaction
matrix), IDX (bin descriptions, BED), CHRS (chr description),
CENTROMERES (location of the centromeres, BED)
⇒
⇒
7 / 27
Import and format ofthe di erent tools
GenomicInteractions (bioconductor, 5 years, based on Gviz): (functions
makeGenomicInteractionsFromFile or directly using
GenomicInteractions)
input: mandatory: BEDPE files OR HOMER files (TXT files with 20
columns;
http://homer.ucsd.edu/homer/interactions/HiCinteractions.html)
class: GenomicInteractions that contains two GRanges objects (bin
pairs) and a count object (numeric vector) can be used directly
Sushi (bioconductor, 5.5 years):
input: BEDPE files or interaction matrix (with genomic coordinates in
row/column names) as TXT files. No dedicated import function; data
passed to the package functions as simple data.frame
⇒
8 / 27
Import and format ofthe di erent tools
HiCeekR (github repository https://github.com/lucidif/HiCeekR, 1 year,
well documented, shiny application to run locally)
input: BAM file and FASTA reference. Makes all the processing and
creates local files and stores intermediate results (report also created)
adjclust (CRAN, 2 years)
input: a CSV (tab separated) file with bin pairs OR an interaction
matrix OR HTC-exp objects. No dedicated import function; data
passed to the package functions as simple (sparse) matrices
9 / 27
Summary
X: possible
XX: tested (by myself)
~: possible but not quite direct
TXT file
(bin pairs)
TXT file
(matrix)
BEDPE .cool .hic custom
HiTC XX ~XX ~X XX
HiCBricks X X X?
DNA_Rchitect X
GENOVA X X X X ?
GenomicInteractions ~XX X XX
Sushi X X X X
adjclust XX XX ~X XX
Only very recent (and still unmature) tools handle HiC specific formats like
.cool and .hic. HiCeekR handles only raw BAM files.
10 / 27
Visualization ofHiC-dataVisualization ofHiC-data
organized by types of visualization and tasksorganized by types of visualization and tasks
11 / 2711 / 27
Heatmaps
# using GENOVA
maria_90_chr7 <- load_contacts(signal_path = "../../data/forTests/ch
indices_path = "../../data/forTests/c
sample_name = "dg90-chr7", colour =
hic.matrixplot(maria_90_chr7, chr = 7, start = 0, end = 5000000)
12 / 27
Recommandations for heatmaps
whole-genome heatmaps used to highlight genomic rearrangement /
zoomed heatmaps used to highlight TADs and loops
colour coding should scale with log$_10$ rather than linearly and should
be made with a colour scale consisting of only one color to avoid artificial
transitions (also use multiple hues for colorblinds). Two colour scales can
be used to represent a correlation matrix (compartments) or a comparison
between matrices (see below)
comparisons can be made with side by side heatmaps or (better) with a
heatmap of the log$_2$ ratio
linear tracks can be added to heatmaps and in this case triangular
heatmaps should be preferred (the tracks are then placed below)
tools that contain heatmaps: HiTC, HiCBricks, GENOVA, sushi and
adjclust (no heatmaps in DNA_Rchitect or in GenomicInteractions)
13 / 27
Features for heatmaps
rectang-
ular
triangular
custom
colors
zoom comparison
linear
tracks
HiTC
XX
(genome)
XX (chr)
log,
pos/neg
col
prior to plot
(start/end)
X (2,
triangular)
X (only
genomic
int.)
HiCBricks X X
X
(palette
and log)
X
(start/end/dist)
X (2)
GENOVA XX X(?)
X (but
limited)
X (2) X (?)
Sushi X X
X
(palette)
X
(start/end/dist)
HiCeekR X X (start/end)
X
(numeric/2)
adjclust XX
XX
(palette
and log)
14 / 27
Features for heatmaps
In addition: HiCBricks and adjclust can show TADs on the heatmap (maybe
also GENOVA) and GENOVA can highlight loops with circles on the maps.
15 / 27
Example ofvisualization with annotation tracks
16 / 27
Critical assessment ofthe tools
The simplest, more complete and nicest visualization function for heatmaps is
in HiCBricks (even if it can not display linear tracks) but unfortunately, the
import format of the tools is rather hard to use.
GENOVA is promising (including many functions to extract features (IS, TADs,
loops, ...) from HiC matrices) but impossible to use at that stage because of the 17 / 27
Interactions as arcs (or networks)
# with GenomicInteractions: how to create the data?
genomic_pos <- read.table("../../data/forTests/chr7_index.bed", sep
genomic_pos <- GRanges(genomic_pos[ ,1],
IRanges(genomic_pos[ ,2], width = 40000,
names = genomic_pos[ ,4]))
bin_pairs <- read.table("../../data/forTests/chr7_90.matrix", sep =
bins1 <- match(bin_pairs[ ,1], as.numeric(names(ranges(genomic_pos)
bins1 <- genomic_pos[bins1, ]
bins2 <- match(bin_pairs[ ,2], as.numeric(names(ranges(genomic_pos)
bins2 <- genomic_pos[bins2, ]
maria_90_chr7 <- GenomicInteractions(bins1, bins2, counts = bin_pai
18 / 27
Interactions as arcs (or networks)
maria_90_chr7
## GenomicInteractions object with 905906 interactions and 1 metadata column
## seqnames1 ranges1 seqnames2 ranges
## <Rle> <IRanges> <Rle> <IRanges
## [1] 7 0-39999 --- 7 0-3999
## [2] 7 0-39999 --- 7 40000-7999
## [3] 7 0-39999 --- 7 80000-11999
## [4] 7 0-39999 --- 7 120000-15999
## [5] 7 0-39999 --- 7 160000-19999
## ... ... ... ... ... .
## [905902] 7 134640000-134679999 --- 7 134680000-13471999
## [905903] 7 134640000-134679999 --- 7 134720000-13475999
## [905904] 7 134680000-134719999 --- 7 134680000-13471999
## [905905] 7 134680000-134719999 --- 7 134720000-13475999
## [905906] 7 134720000-134759999 --- 7 134720000-13475999
## | counts
## | <integer>
## [1] | 15
## [2] | 55
## [3] | 19
## [4] | 8 19 / 27
Interactions as arcs (or networks)
interaction_track <- InteractionTrack(maria_90_chr7, name = "HiC",
chromosome = "7")
plotTracks(interaction_track, chromosome = "7", from = 0, to = 50000
20 / 27
Interactions as arcs (or networks)
plotTracks(interaction_track, chromosome = "7", from = 0, to = 50000
21 / 27
Recommandations for arcs
usefull mainly to superimpose annotations or qualitative/quantitative
tracks (Gviz offers plently of solutions to do so)
but becomes unreadable for large regions and is unable to show the
interaction intensity (a solution would be to threshold the interaction
intensity before)
alternatives display the data as networks (but the genome linear structure
is lost and it is also restricted to very small regions) or as circos plot
(thresholding of interactions to keep only the strongest is mandatory, even
for a single chromosome)
22 / 27
Critical assessment oftools
DNA_Rchitect, Sushi and GenomicInteractions display the
interactions as arcs
DNA_Rchitect is interactive but I never managed to use it, even on the
example dataset (two many annotation information is required for a
proper use)
the other two propose approximately the same types of features
(GenomicInteractions is maybe more complete but Sushi easier to
customize)
HiCeekr can represent the data as a(n interactive) network, for a whole
chromosome or a selected region and with/without a threshold for the
edge value
23 / 27
Example ofvisualization with annotation tracks
24 / 27
Example ofvisualization with annotation tracks
with circlize (a bit sophisticated to use, similar to Gviz)
25 / 27
Other (quality control) graphics
in HiCeekR: quality control of the alignment (fragment length
distribution, insert size distributions)
in HiTC: inter/intra interaction barplot, interaction versus distance dot
plot, interaction distribution (histogram) for CIS/TRANS
in GenomicInteractions: inter/intra donut graphs (forget them!),
interaction distribution (histogram but cut; also forget them), donut
graphs with annotation of the interactions
26 / 27
References
Ing-Simmons E, Vaquerizas JM (2019) Visualising three-dimensional genome
organisation in two dimensions. Development, 146(19): dev177162.
Lin D, Bonora G, Yardimci GG, Noble WS (2017) Computational methods for
analyzing and modeling genome structure and organization. WIREs Systems
Biology and Medicine, 11: e1435.
Yardimci GG, Noble WS (2017) Software tools for visualizing Hi-C data. Genome
Biology, 18: 26.
27 / 27

Más contenido relacionado

La actualidad más candente

Tetrad analysis, positive and negative interference, mapping through somatic ...
Tetrad analysis, positive and negative interference, mapping through somatic ...Tetrad analysis, positive and negative interference, mapping through somatic ...
Tetrad analysis, positive and negative interference, mapping through somatic ...
Promila Sheoran
 
KARYOTYPING, CHROMOSOME BANDING AND CHROMOSOME PAINTING.pptx
KARYOTYPING, CHROMOSOME BANDING AND CHROMOSOME PAINTING.pptxKARYOTYPING, CHROMOSOME BANDING AND CHROMOSOME PAINTING.pptx
KARYOTYPING, CHROMOSOME BANDING AND CHROMOSOME PAINTING.pptx
PABOLU TEJASREE
 

La actualidad más candente (20)

Neurospora Tetrad Analysis.pptx
Neurospora Tetrad Analysis.pptxNeurospora Tetrad Analysis.pptx
Neurospora Tetrad Analysis.pptx
 
Investigating the 3D structure of the genome with Hi-C data analysis
Investigating the 3D structure of the genome with Hi-C data analysisInvestigating the 3D structure of the genome with Hi-C data analysis
Investigating the 3D structure of the genome with Hi-C data analysis
 
Introduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysisIntroduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysis
 
Bioinformatics data mining
Bioinformatics data miningBioinformatics data mining
Bioinformatics data mining
 
Protein Structure Alignment and Comparison
Protein Structure Alignment and ComparisonProtein Structure Alignment and Comparison
Protein Structure Alignment and Comparison
 
Blast
BlastBlast
Blast
 
Fungal genetics
Fungal geneticsFungal genetics
Fungal genetics
 
Tech Talk: UCSC Genome Browser
Tech Talk: UCSC Genome BrowserTech Talk: UCSC Genome Browser
Tech Talk: UCSC Genome Browser
 
Fasta
FastaFasta
Fasta
 
Clustal X
Clustal XClustal X
Clustal X
 
Artificial chromosomes
Artificial chromosomesArtificial chromosomes
Artificial chromosomes
 
Protein database
Protein  databaseProtein  database
Protein database
 
Gemome annotation
Gemome annotationGemome annotation
Gemome annotation
 
Blast
BlastBlast
Blast
 
Tetrad analysis, positive and negative interference, mapping through somatic ...
Tetrad analysis, positive and negative interference, mapping through somatic ...Tetrad analysis, positive and negative interference, mapping through somatic ...
Tetrad analysis, positive and negative interference, mapping through somatic ...
 
clustal omega.pptx
clustal omega.pptxclustal omega.pptx
clustal omega.pptx
 
KARYOTYPING, CHROMOSOME BANDING AND CHROMOSOME PAINTING.pptx
KARYOTYPING, CHROMOSOME BANDING AND CHROMOSOME PAINTING.pptxKARYOTYPING, CHROMOSOME BANDING AND CHROMOSOME PAINTING.pptx
KARYOTYPING, CHROMOSOME BANDING AND CHROMOSOME PAINTING.pptx
 
Interactomeee
InteractomeeeInteractomeee
Interactomeee
 
Mapping the genome of bacteria
Mapping the genome of bacteriaMapping the genome of bacteria
Mapping the genome of bacteria
 
Major databases in bioinformatics
Major databases in bioinformaticsMajor databases in bioinformatics
Major databases in bioinformatics
 

Similar a R tools for HiC data visualization

Frac paq user guide v2.0 release
Frac paq user guide v2.0 releaseFrac paq user guide v2.0 release
Frac paq user guide v2.0 release
joseicha
 
intro to knitr with RStudio
intro to knitr with RStudiointro to knitr with RStudio
intro to knitr with RStudio
Ben Bolker
 

Similar a R tools for HiC data visualization (20)

An introduction to knitr and R Markdown
An introduction to knitr and R MarkdownAn introduction to knitr and R Markdown
An introduction to knitr and R Markdown
 
Differential analyses of structures in HiC data
Differential analyses of structures in HiC dataDifferential analyses of structures in HiC data
Differential analyses of structures in HiC data
 
EBtree - Design for a Scheduler and Use (Almost) Everywhere
EBtree - Design for a Scheduler and Use (Almost) EverywhereEBtree - Design for a Scheduler and Use (Almost) Everywhere
EBtree - Design for a Scheduler and Use (Almost) Everywhere
 
Postgres indexes
Postgres indexesPostgres indexes
Postgres indexes
 
La famille *down
La famille *downLa famille *down
La famille *down
 
Graph operations in Git version control system
Graph operations in Git version control systemGraph operations in Git version control system
Graph operations in Git version control system
 
Seeing Like Software
Seeing Like SoftwareSeeing Like Software
Seeing Like Software
 
Frac paq user guide v2.0 release
Frac paq user guide v2.0 releaseFrac paq user guide v2.0 release
Frac paq user guide v2.0 release
 
R basics
R basicsR basics
R basics
 
La famille *down
La famille *downLa famille *down
La famille *down
 
Graph computation
Graph computationGraph computation
Graph computation
 
Crosstalk
CrosstalkCrosstalk
Crosstalk
 
Opensource gis development - part 3
Opensource gis development - part 3Opensource gis development - part 3
Opensource gis development - part 3
 
Developing R Graphical User Interfaces
Developing R Graphical User InterfacesDeveloping R Graphical User Interfaces
Developing R Graphical User Interfaces
 
Postgres indexes: how to make them work for your application
Postgres indexes: how to make them work for your applicationPostgres indexes: how to make them work for your application
Postgres indexes: how to make them work for your application
 
intro to knitr with RStudio
intro to knitr with RStudiointro to knitr with RStudio
intro to knitr with RStudio
 
Data visualization in python/Django
Data visualization in python/DjangoData visualization in python/Django
Data visualization in python/Django
 
Hivemall meets Digdag @Hackertackle 2018-02-17
Hivemall meets Digdag @Hackertackle 2018-02-17Hivemall meets Digdag @Hackertackle 2018-02-17
Hivemall meets Digdag @Hackertackle 2018-02-17
 
Designing Architecture-aware Library using Boost.Proto
Designing Architecture-aware Library using Boost.ProtoDesigning Architecture-aware Library using Boost.Proto
Designing Architecture-aware Library using Boost.Proto
 
Class[3][5th jun] [three js]
Class[3][5th jun] [three js]Class[3][5th jun] [three js]
Class[3][5th jun] [three js]
 

Más de tuxette

Más de tuxette (20)

Racines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathsRacines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en maths
 
Méthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènesMéthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènes
 
Méthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiquesMéthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiques
 
Projets autour de l'Hi-C
Projets autour de l'Hi-CProjets autour de l'Hi-C
Projets autour de l'Hi-C
 
Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?
 
Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...
 
ASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquesASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiques
 
Autour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeanAutour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWean
 
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
 
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquesApprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
 
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
 
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
 
Journal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation dataJournal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation data
 
Overfitting or overparametrization?
Overfitting or overparametrization?Overfitting or overparametrization?
Overfitting or overparametrization?
 
Selective inference and single-cell differential analysis
Selective inference and single-cell differential analysisSelective inference and single-cell differential analysis
Selective inference and single-cell differential analysis
 
SOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricesSOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatrices
 
Graph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype PredictionGraph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype Prediction
 
A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction models
 
Explanable models for time series with random forest
Explanable models for time series with random forestExplanable models for time series with random forest
Explanable models for time series with random forest
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICS
 

Último

Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Lokesh Kothari
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Sérgio Sacani
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
Areesha Ahmad
 

Último (20)

Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 

R tools for HiC data visualization

  • 1. R tools for HiC data visualizationR tools for HiC data visualization Nathalie Vialaneix, INRAE/MIATNathalie Vialaneix, INRAE/MIAT Chrocogen, January 31st 2020Chrocogen, January 31st 2020 1 / 271 / 27
  • 2. First ofall... a bit ofbibliographyFirst ofall... a bit ofbibliography 2 / 272 / 27
  • 3. What did I use to make this presentation? a github repository with a bunch of references, classified by themes https://github.com/mdozmorov/HiC_tools#visualization two reviews on the topic: [Yardimci & Noble, 2017] (5 tools, no R package), [Ing-Simons & Vaquerizas, CB 2019] (12 tools, 2/3 R packages, half of the tools are interactive) bioconductor research tool and I identi ed... HiT-C - HiCBricks - DNARchitect (with a shiny interface) - GENOVA - Gviz and GenomicInteraction - Sushi - HiCeekR (the last one sent by a colleague the day after I finished my slides) (+ Pierre's package adjclust that is on CRAN) 3 / 27
  • 4. What did I learn fromthat? a large number of interactive tools already exist in another review [Lin et al, WSBM 2019], you can also find tools for 3D visualization of Hi-C data however, most tools seem to propose very common approaches for Hi-C data visualization, even the interactive tools a problem still remains: find appropriate standards to store the data and load them into the software (multiple standards currently exist, with no clear consensus yet) 4 / 27
  • 5. Format speci cationsFormat speci cations 5 / 275 / 27
  • 6. Import and format ofthe di erent tools HiTC (bioconductor, 7.5 years): (function importC) input: mandatory: a CSV (tab separated) file with bin pairs and a BED file describing the bins (chr | start | end | bin nb) outputs of HiC- Pro class: HTCexp (for submatrices) or HTClist (for all matrices) with slots intdata (interaction matrix; can be sparse) and xgi/ygi (GRanges objects describing the bins) can be used directly HiCBricks (bioconductor, 1 year): (functions Create_many_Bricks + Brick_load_matrix / Create_many_Bricks_from_mcool) input: mandatory TXT (space separated) files with the count matrices for every chromosome and a BED file describing the bins (chr | start | end) by order of appearance in the matrix OR .mcool files AND soon available .hic files class: BrickContainer that does not incorporate the data themselves but only information on the chromosomes (names and lengths) and on files in which the information (bin description and interactions) is stored. When creating this object, a directory is created with HDF (Hierarchical Data Format) files with the data in them ⇒ ⇒ 6 / 27
  • 7. Import and format ofthe di erent tools DNA_Rchitect (web shiny interface at http://shiny.immgen.org/DNARchitect/) input: TXT file, separated by comma, semicolumn or tabulations, with the following columns (chrom1 | start1 | end1 | chrom2 | start2 | end2 | score | samplenumber) BEDPE files GENOVA (github repository https://github.com/robinweide/GENOVA, not properly documented and full of bugs): (functions read_bedpe, read.hicpro.matrix) input: mandatory: a CSV (tab separated) file with bin pairs and a BED file describing the bins (chr | start | end | bin nb) outputs of HiC- Pro OR BEDPE files. It is said that it can handle .cool files OR .hic files but I haven't found where class(?): contacts that contains the slots MAT (triplet interaction matrix), IDX (bin descriptions, BED), CHRS (chr description), CENTROMERES (location of the centromeres, BED) ⇒ ⇒ 7 / 27
  • 8. Import and format ofthe di erent tools GenomicInteractions (bioconductor, 5 years, based on Gviz): (functions makeGenomicInteractionsFromFile or directly using GenomicInteractions) input: mandatory: BEDPE files OR HOMER files (TXT files with 20 columns; http://homer.ucsd.edu/homer/interactions/HiCinteractions.html) class: GenomicInteractions that contains two GRanges objects (bin pairs) and a count object (numeric vector) can be used directly Sushi (bioconductor, 5.5 years): input: BEDPE files or interaction matrix (with genomic coordinates in row/column names) as TXT files. No dedicated import function; data passed to the package functions as simple data.frame ⇒ 8 / 27
  • 9. Import and format ofthe di erent tools HiCeekR (github repository https://github.com/lucidif/HiCeekR, 1 year, well documented, shiny application to run locally) input: BAM file and FASTA reference. Makes all the processing and creates local files and stores intermediate results (report also created) adjclust (CRAN, 2 years) input: a CSV (tab separated) file with bin pairs OR an interaction matrix OR HTC-exp objects. No dedicated import function; data passed to the package functions as simple (sparse) matrices 9 / 27
  • 10. Summary X: possible XX: tested (by myself) ~: possible but not quite direct TXT file (bin pairs) TXT file (matrix) BEDPE .cool .hic custom HiTC XX ~XX ~X XX HiCBricks X X X? DNA_Rchitect X GENOVA X X X X ? GenomicInteractions ~XX X XX Sushi X X X X adjclust XX XX ~X XX Only very recent (and still unmature) tools handle HiC specific formats like .cool and .hic. HiCeekR handles only raw BAM files. 10 / 27
  • 11. Visualization ofHiC-dataVisualization ofHiC-data organized by types of visualization and tasksorganized by types of visualization and tasks 11 / 2711 / 27
  • 12. Heatmaps # using GENOVA maria_90_chr7 <- load_contacts(signal_path = "../../data/forTests/ch indices_path = "../../data/forTests/c sample_name = "dg90-chr7", colour = hic.matrixplot(maria_90_chr7, chr = 7, start = 0, end = 5000000) 12 / 27
  • 13. Recommandations for heatmaps whole-genome heatmaps used to highlight genomic rearrangement / zoomed heatmaps used to highlight TADs and loops colour coding should scale with log$_10$ rather than linearly and should be made with a colour scale consisting of only one color to avoid artificial transitions (also use multiple hues for colorblinds). Two colour scales can be used to represent a correlation matrix (compartments) or a comparison between matrices (see below) comparisons can be made with side by side heatmaps or (better) with a heatmap of the log$_2$ ratio linear tracks can be added to heatmaps and in this case triangular heatmaps should be preferred (the tracks are then placed below) tools that contain heatmaps: HiTC, HiCBricks, GENOVA, sushi and adjclust (no heatmaps in DNA_Rchitect or in GenomicInteractions) 13 / 27
  • 14. Features for heatmaps rectang- ular triangular custom colors zoom comparison linear tracks HiTC XX (genome) XX (chr) log, pos/neg col prior to plot (start/end) X (2, triangular) X (only genomic int.) HiCBricks X X X (palette and log) X (start/end/dist) X (2) GENOVA XX X(?) X (but limited) X (2) X (?) Sushi X X X (palette) X (start/end/dist) HiCeekR X X (start/end) X (numeric/2) adjclust XX XX (palette and log) 14 / 27
  • 15. Features for heatmaps In addition: HiCBricks and adjclust can show TADs on the heatmap (maybe also GENOVA) and GENOVA can highlight loops with circles on the maps. 15 / 27
  • 16. Example ofvisualization with annotation tracks 16 / 27
  • 17. Critical assessment ofthe tools The simplest, more complete and nicest visualization function for heatmaps is in HiCBricks (even if it can not display linear tracks) but unfortunately, the import format of the tools is rather hard to use. GENOVA is promising (including many functions to extract features (IS, TADs, loops, ...) from HiC matrices) but impossible to use at that stage because of the 17 / 27
  • 18. Interactions as arcs (or networks) # with GenomicInteractions: how to create the data? genomic_pos <- read.table("../../data/forTests/chr7_index.bed", sep genomic_pos <- GRanges(genomic_pos[ ,1], IRanges(genomic_pos[ ,2], width = 40000, names = genomic_pos[ ,4])) bin_pairs <- read.table("../../data/forTests/chr7_90.matrix", sep = bins1 <- match(bin_pairs[ ,1], as.numeric(names(ranges(genomic_pos) bins1 <- genomic_pos[bins1, ] bins2 <- match(bin_pairs[ ,2], as.numeric(names(ranges(genomic_pos) bins2 <- genomic_pos[bins2, ] maria_90_chr7 <- GenomicInteractions(bins1, bins2, counts = bin_pai 18 / 27
  • 19. Interactions as arcs (or networks) maria_90_chr7 ## GenomicInteractions object with 905906 interactions and 1 metadata column ## seqnames1 ranges1 seqnames2 ranges ## <Rle> <IRanges> <Rle> <IRanges ## [1] 7 0-39999 --- 7 0-3999 ## [2] 7 0-39999 --- 7 40000-7999 ## [3] 7 0-39999 --- 7 80000-11999 ## [4] 7 0-39999 --- 7 120000-15999 ## [5] 7 0-39999 --- 7 160000-19999 ## ... ... ... ... ... . ## [905902] 7 134640000-134679999 --- 7 134680000-13471999 ## [905903] 7 134640000-134679999 --- 7 134720000-13475999 ## [905904] 7 134680000-134719999 --- 7 134680000-13471999 ## [905905] 7 134680000-134719999 --- 7 134720000-13475999 ## [905906] 7 134720000-134759999 --- 7 134720000-13475999 ## | counts ## | <integer> ## [1] | 15 ## [2] | 55 ## [3] | 19 ## [4] | 8 19 / 27
  • 20. Interactions as arcs (or networks) interaction_track <- InteractionTrack(maria_90_chr7, name = "HiC", chromosome = "7") plotTracks(interaction_track, chromosome = "7", from = 0, to = 50000 20 / 27
  • 21. Interactions as arcs (or networks) plotTracks(interaction_track, chromosome = "7", from = 0, to = 50000 21 / 27
  • 22. Recommandations for arcs usefull mainly to superimpose annotations or qualitative/quantitative tracks (Gviz offers plently of solutions to do so) but becomes unreadable for large regions and is unable to show the interaction intensity (a solution would be to threshold the interaction intensity before) alternatives display the data as networks (but the genome linear structure is lost and it is also restricted to very small regions) or as circos plot (thresholding of interactions to keep only the strongest is mandatory, even for a single chromosome) 22 / 27
  • 23. Critical assessment oftools DNA_Rchitect, Sushi and GenomicInteractions display the interactions as arcs DNA_Rchitect is interactive but I never managed to use it, even on the example dataset (two many annotation information is required for a proper use) the other two propose approximately the same types of features (GenomicInteractions is maybe more complete but Sushi easier to customize) HiCeekr can represent the data as a(n interactive) network, for a whole chromosome or a selected region and with/without a threshold for the edge value 23 / 27
  • 24. Example ofvisualization with annotation tracks 24 / 27
  • 25. Example ofvisualization with annotation tracks with circlize (a bit sophisticated to use, similar to Gviz) 25 / 27
  • 26. Other (quality control) graphics in HiCeekR: quality control of the alignment (fragment length distribution, insert size distributions) in HiTC: inter/intra interaction barplot, interaction versus distance dot plot, interaction distribution (histogram) for CIS/TRANS in GenomicInteractions: inter/intra donut graphs (forget them!), interaction distribution (histogram but cut; also forget them), donut graphs with annotation of the interactions 26 / 27
  • 27. References Ing-Simmons E, Vaquerizas JM (2019) Visualising three-dimensional genome organisation in two dimensions. Development, 146(19): dev177162. Lin D, Bonora G, Yardimci GG, Noble WS (2017) Computational methods for analyzing and modeling genome structure and organization. WIREs Systems Biology and Medicine, 11: e1435. Yardimci GG, Noble WS (2017) Software tools for visualizing Hi-C data. Genome Biology, 18: 26. 27 / 27