SlideShare una empresa de Scribd logo
1 de 83
Descargar para leer sin conexión
Making	sense	of	cancer	
somatic	SNVs	and	indels:	
from	variant	effects	to	pathways
Thu	24	May
Daniele	Merico,	PhD
Director	of	Molecular	Genetics,	Deep	Genomics	Inc.
Visiting	Scientist,	The	Hospital	for	Sick	Children
(Toronto,	Canada)
Outline
1. Functional	interpretation	of	somatic	variants:	overview	[5	min]
2. From	variants	to	genes	[10	min]
1. Variant	gene	product	effect	
2. Missense	impact	prediction	and	beyond
3. Genes	with	significant	somatic	burden
3. From	genes	to	functions,	pathways	&	networks	[40	min]
1. Gene-set	analysis	[30	min]
1. Overview
2. Gene-set	types,	Gene	Ontology	&	pathway	resources
3. Gene-set	results	visualization:	Cytoscape Enrichment	Map
4. Types	of	gene-set	analysis	tests
5. Competitive	tests:	GSEA	for	gene	expression	data
6. Self-contained	tests:	gene-set	somatic	burden
7. General	tips
2. Network	analysis	[10	min]
1. Network	visualization	and	gene	network	types
2. GeneMANIA
3. Reactome FI
4. Q&A	[5	min]
1.	Functional	Interpretation	of	
Somatic	Variants:	Overview
Criteria	to	Interpret	Somatic	Variants	
• What’s	the	effect	on	the	gene	product?
• Stop-gain,	frameshift,	splice	site	alteration,	missense,	splicing	consensus	sequence,	synonymous,	5’UTR,	3’UTR,	
intronic,	upstream,	downstream,	ncRNA exon,	ncRNA intron
• Truncating	loss-of-function,	missense	(loss-of-function	or	gain-of-function?)
• Is	a	missense	variant	recurrent,	or	overlapping	a	known	mutation	hotspot?
• Is	a	missense	variant	predicted	damaging	by	impact	predictors?
• Is	the	gene	an	established	oncogene	or	tumour	suppressor?
• Is	the	gene	significantly	mutated	à could	act	as	a	novel	oncogene	or	tumour	suppressor?
• Otherwise,	is	the	gene	under	negative	selection	for	truncating	loss-of-function	or	missense	variants?
• Does	the	gene	belong	to	a	pathway	or	subnetwork	with	other	cancer	driver	genes	or	enriched	in	
somatic	mutation	à could	act	as	a	novel	oncogene	or	tumour	suppressor?
Cancer	somatic	
mutation	data
Established	
cancer	genes
Novel
cancer	genes
Tumour	suppressor Oncogene Significant	burden
(or	genetic	constraint)
Truncating	LOF Missense
Truncating	LOF Missense	LOF?
Gene-set	and	
network	analysis
Missense	GOF?
y
Established	hostpotRecurrent Impact	prediction
y h h
h
2.	From	Variants	to	Genes
2.1.	Variant	gene	product	effect
SNV	and	Indel Variant	Annotations
• Variant	database	mapping
• Germline	allele	frequencies,	dbSNP
• COSMIC	(somatic	variant	database)
• Gene	mapping
• Gene	product	effect	type
• Stop-gain,	frameshift,	splice	site	alteration,	missense,	splicing	consensus	sequence,	synonymous,	
5’UTR,	3’UTR,	intronic,	upstream,	downstream,	ncRNA exon,	ncRNA intron
• Stop-gain,	frameshift,	splice	site	alteration	à expected	to	cause	complete	loss-of-function	(LOF)
• Missense,	other	à can	act	as	gain-of-function
• Missense	impact	prediction
• SIFT,	PolyPhen2,	MutationAssessor,	…
• Other	impact	predictions
• Splicing	(e.g.	MaxEntScan,	dbscSNV,	SPIDEX,	…)
• Genomic	conservation (e.g.	phyloP,	PhastCons,	…)
• Omnibus	meta-predictors	(CADD,	Eigen,	…)
2.2.	Missense	impact	prediction	
and	beyond
SIFT
• Broadly	used,	relatively	old	(2001)
• Based	uniquely	on	protein	sequence	(amino	acid)	conservation
1. Start	from	query	protein	sequence
2. Identify	similar	protein	sequences	(PSI-BLAST)
3. Multiple	alignment	of	protein	sequences	(orthologs and	paralogs)
4. Amino	acid	x	residue	probability	matrix	(PSSM)
5. For	every	residue,	amino	acid	probability	reweighted	by	amino	acid	diversity	at	the	position	(sum	of	
frequency	rank	*	frequency)
à Score:	probability	of	observing	amino	acid	normalized	by	residue	conservation
cut-off:	0.05	(based	on	case	studies)
Predicting	deleterious	amino	acid	substitutions.
Ng	PC,	Henikoff S.	Genome	Res.	2001	May;11(5):863-74.
PolyPhen2
• Integrates	multiple	features
• 8	sequence-based,	3	structure-based	(nucleotide	and	amino	acid	level)
(e.g.	side	chain	volume	change,	overlap	with	PFAM	domain,	multiple	alignment	metrics)
• Supervised	machine	learning	method	(Naïve	Bayes)	à Requires	training	set
• Set	1:	HumDiv
• Positive:	damaging	alleles	for	known	Mendelian disorders	(Uniprot)
• Negative:	nondamaging differences	between	human	proteins	and	related	mammalian	homologs
• Performance	5-fold	crossv:	(TP	~	80%,	FP	~10%),	(TP	~	90%,	FP	~	20%)
• Set	2:	HumVar
• Positive:	all	human	disease	causing	mutations	(Uniprot)
• Negative:	non-synonymous	SNPs	without	disease	association	
àRicher	model	than	SIFT
àMore	biased	towards	training	set(s)	than	SIFT
A	method	and	server	for	predicting	damaging	missense	mutations.
Adzhubei IA,	Schmidt	S,	Peshkin L,	[…],	Bork	P,	Kondrashov AS,	Sunyaev SR.	Nat	Methods.	2010	Apr;7(4):248-9.
CADD
• Intended	as	a	measure	of	“deleteriousness”	for	coding	and	non-coding	sequence,	
not	biased	to	known	disease	variation
• However	non	particularly	effective	for	non-coding	regulatory	sequence	(see	lecture)
• Supervised	machine	learning	model	(Linear	SVM)
• Negative	training	set:	nearly	fixed	human	alleles,	variant	if	compared	to	inferred	human-
chimp	ancestral	genome
• Positive	training	set:	simulated	variants	based	on	mutation	model	aware	of	sequence	context	
and	primate	substitution	rates
• Predictive	features	(63):	VEP	(Variant	Effect	Predictor)	output,	UCSC	tracks,	Encode	tracks	à
includes	missense	predictions	and	nucleotide-level	conservation
• Performance	assessment:	using	pathogenic	variants	from	ClinVar performs	a	bit	better	
PhyloP for	all	sites	and	PolyPhen/SIFT	for	missense	coding
A	general	framework	for	estimating	the	relative	pathogenicity	of	human	genetic	variants.
Kircher M,	Witten	DM,	Jain	P,	O'Roak BJ,	Cooper	GM,	Shendure J.	Nat	Genet.	2014	Mar;46(3):310-5.
Example	of	Mutation	Hotspots
L858R
G12D,	V,	C,	A,	S,	R
G13D
EGFR
KRAS
2.3.	Genes	with	significant	
somatic	burden
MutSigCV
• Goal:	identify	significantly	mutated	genes	
à Important	to	model	mutational	background	model
• Tumour-specific	global	mutation	rate
• Trinucleotide	context	and	substitution
• Expression	level	(impacting	transcription-couple	repair)
• Replication	timing	(later-replicating	regions	have	higher	tumour	rates)
• Residual	local	genomic	region	mutation	rate
Lawrence	MS,	...,	Getz	G.	Mutational	heterogeneity	in	cancer	and	the	
search	for	new	cancer-associated	genes.	Nature	2013.	PMID:	23770567
3.	From	genes	to	functions,	
pathways	&	networks
Activity	Maps
Spindle
Apoptosis
Gene.A
Gene.B
Gene.C
Gene.D
Gene.E
Gene.F
GENE	SETS NETWORKS PATHWAYS
Ca++ Channels
MAPK
Gene.G
Gene.H
Gene.I
Gene.L
Gene.M
Gene.N
Activity	Profiles	/	
Somatic	Mutations
Prior	Knowledge	about	genes
Spindle
Apoptosis
Gene.A
Gene.B
Gene.C
Gene.D
Gene.E
Gene.F
GENE	SETS NETWORKS PATHWAYS
Ca++ Channels
MAPK
Gene.G
Gene.H
Gene.I
Gene.L
Gene.M
Gene.N
Scoring	models
Search	algorithms
Informatics
3.1.	Gene-set	Analysis
3.1.1.	Gene-set	analysis	overview
Set p-value
Spindle 0.00001
Apoptosis 0.00025
Experiment
Gene-set
Databases
ENRICHMENT
TEST
Enrichment	TableExperimentally	
“positive”	genes
(e.g UP-regulated)
Experimentally	
“detectable”	genes
(aka	background set)
Gene-set	Analysis	Overview
Gene-sets	for	Gene-set	Analysis
Nuclear	Pore
Cell	Cycle
Gene.AAA
Gene.ABA
Gene.ABC
Gene.CC1
Gene.CC2
Gene.CC3
Gene.CC4
Gene.CC5
Ribosome
P53	signaling
Gene.RP1
Gene.RP2
Gene.RP3
Gene.RP4
Gene.CC1
Gene.CK1
Gene.PPP
From	cell	biology to	gene-sets
Gene-set
Databases
Gene-set	Analysis:	Overview
Spindle 0.00001
Apoptosis 0.00025
Enrichment	Table
FADD
TRADD
CYTC1
BAX
BAXL
CASP9
CASP10
….
SPP1
SPP2
CCCP
MTC1
…
Gene-sets
Experimental	data
(e.g.	gene	expression	table)
Gene-set	Enrichment	Test
The	P-value	assesses	
the	probability	that,
by	random	sampling
the	“detectable”	
genes,
the	overlap	is	at	least	
as	large	as	observed.
Random	samples
of	array	genes
The	output	of	an	enrichment	test	is	a	P-value
Most	used	statistical	model:
Fisher’s	Exact	Test
Fisher’s	Exact	Test	does	not	require	to	actually	
perform	the	random	sampling,	it	is	based	on	
a	theoretical	null-hypothesis	distribution	
(Hypergeometric Distribution)
Fisher’s	Exact	Test	(FET)
b a
d c
Exp_positive=yes Exp_positive=no
Gene-Set=yes a b
Gene-Set=no c d
Fisher’s	Exact	Test:
2	x	2	Contingency	Table
Probability	of	one	table	to	occur	by	random	sampling:
Hypergeometric distribution formula:
Test	p-value:	sum	of	random	sampling	probabilities	for	tables	
as	extreme	or	more	extreme	than	the	real	table
The	Background	is	Important!
b a
d c
• Inappropriate	modeling	of	the	background	will	lead	to	
incorrectly	biased	results
– What	genes	are	detectable	by	the	experiment?	E.g.:	in	a	kinase	
phosphorylation	assay,	only	kinases	can	be	detected
– The	Fisher’s	Exact	Test,	GSEA	and	other	tests	assume	all	genes	have	
the	same	“prior”	probability	of	being	experimentally	positive	à
they	can	be	used	only	in	absence	of	systematic	selection	biases	
(example	of	bias:	if	you	select	genes	with	at	least	one	mutation,	
then	longer	genes	are	systematically	more	likely	to	be	selected)
Gene-set	Enrichment	Analysis:
Multiple	Test	Correction	by	BH-FDR
• FDR	(false	discovery	rate)	is	the	expected	proportion of	tests	passing	the	
significance	threshold	due	to	random	sampling
• Benjamini-Hochberg	(BH)	FDR:
for	a	given	FDR	q-value	threshold	alpha	(e.g.	25%),	
for	m total	tests	(e.g.	1,000	gene-sets),	
find	the	largest	k number	of	tests,	so	that:
P-value	(k)	<=	k	/	m	*	alpha
so	alpha	>=	P-value	(k)	*	m	/	k
(e.g.	0.0125	*	1,000	/	50	<=	0.25)
Gene-set	Enrichment	Analysis:
Multiple	Test	Correction	by	BH-FDR
P-valueCategory P-value	*	m	/	kRank
FDR	
q-value
1
2
3
4
5
…
52
53
Transcriptional	regulation
Transcription	factor
Initiation	of	transcription
Nuclear	localization
Chromatin	modification
…
Cytoplasmic localization
Translation
0.001			x	53/1			=	0.053
0.002			x	53/2			=	0.053
0.003			x	53/3			=	0.053
0.0031	x	53/4			= 0.040
0.005			x	53/5			=	0.053
…
0.985			x	53/52		=	1.004
0.99					x	53/53		=	0.99
In	other	words:	(1)	walk	the	list	of	tests	from	most	significant,	(2)	estimate	how	many	
tests	would	pass	at	each	p-value	if	they	were	random	draws,	(3)	compute	fraction	of	
false	positives,	transform	to	monotonic	1	<=	q-value	<=	0
0.040
0.040
0.040
0.040
0.053
…
0.99
0.99
P-value	threshold	for	FDR	<	0.05
0.001
0.002
0.003
0.0031
0.005
…
0.97
0.99
Red:	non-significant
Green:	significant	at	FDR	<	0.05
3.1.2.	Gene-set	types,	Gene	Ontology	&	
pathway	resources
Gene-set	Types
• Functions	(e.g.	Gene	Ontology)
• Pathways	(e.g.	KEGG,	Reactome)
• Genotype-phenotype/disease	association	(e.g.	HPO)
• Protein	Families	/	Domains	(e.g.	PFAM)
• Genomic	position	(e.g.	cytobands)
• Gene	expression	signatures	(e.g.	MSigDB Cancer	Hallmarks)
• Up/down	after	treatment	or	in	relation	to	disease
• Targets	of	regulators
• Transcription	factor	targets
• miRNA	targets
• Network-derived	modules,	e.g.	protein-protein	interactions
• Drug	targets
Gene	Ontology	(GO)	/	1
• Effort	to	standardize	functional	description	of	eukaryotic	gene	products
• Launched	in	1998
• Many	organism	species	supported
• Normal	function	(e.g.	cell	cycle),	not	disorder	/	disease	(e.g.	metastasis	formation)
• Ontology	defined	by	core	team	of	curators	who	receive	input	from	domain	experts
• Corpus	of	gene	annotations	based	on	expert	curation	of	the	literature	(>	140,000	published	
papers	in	2018),	review	of	high-throughput	data,	or	annotations	in	existing	databases;	
performed	by	curators	at	specific	organism	genome	databases	(human:	UniProtKB)
Gene	Ontology	(GO)	/	2
• Ontology,	intended	as	controlled	structured	vocabulary
• Terms	=	functional	concepts	(e.g.	cell	cycle,	proteasome)
• Three	main	ontologies:	molecular	function (i.e.	biochemical	activity),	cellular	component,	
biological	process (pathways	and	other	processes)
• Relations	between	terms:	is-a,	part-of	/	has-part,	regulates,	occurs-in	
à DAG	(directed	acyclic	graph),	supports	logical	inference
• Most	of	the	relations	are	within	each	main	ontology,	ongoing	effort	to	link	processes	and	
molecular	functions	to	components	using	occurs-in
Biological	Process:
DNA	repair
Cellular	Component:
Replication	fork
Cellular	Component:
Single-strand	break	
containing	DNA	
binding
CHILD
PARENT
ABB1
ACAP3
TRAC1
LUC2
POF5
ZUMM
C5A75
DUCZ
Pathways
• Depict	mechanistic	details	of	metabolic,	signaling	and	other	biological	processes
• Can	be	computationally	exported	as	complex	graph,	but	often	just	analyzed	as	gene-sets
• Advantages:
• Curated,	accurate,	cause	and	effect	captured
• Human-interpretable	visualizations
• Disadvantages:
• More	sparse	coverage	of	genome	than	functional	sets
• More	complex	models	are	required	to	score	pathways
• Static	model	of	dynamic	systems
• Main	resources:	KEGG,	Reactome
KEGG
Cell	cycle
Reactome
Cell	cycle,
G1/S	transition
Resources	to	Download	Gene-sets
BaderLab (University	of	Toronto)
http://baderlab.org/GeneSets
• Gene	Ontology;	Reactome,	Panther,	NetPath,	NCI,	MSigDB C2	(Biocarta,	...),	HumanCyc pathways;	MSigDB cancer	
hallmarks;	MSigDB C3	(miRNA	and	TF	targets)
• updated	on	a	monthly	basis
MSigDB (Broad	Institute)
https://software.broadinstitute.org/gsea/msigdb/
• Gene	Ontology;	KEGG,	Reactome,	Biocarta,	other	pathways;	cancer	hallmarks;	expression	signatures;	miRNA	and	TF	
targets;	interaction	modules;	Cytobands (positional)
• last	update	Oct	2017,	several	gene-set	collections	are	derived	from	old	research	works	(2004-2005)
Bioconductor	org.Hs.eg.db
http://bioconductor.org/packages/release/data/annotation/html/org.Hs.eg.db.html
• Gene	Ontology;	KEGG	pathways;	PFAM	(protein	domains);	Cytobands (positional)
• updated	every	4	months
Notes:	
• KEGG	stopped	being	freely	available	on	2011,	so	freely-available	resources	have	largely	outdated	gene-sets
• Carefully	check	how	GO	annotations	are	exported	(e.g.	all	evidence	codes,	or	excluding	IEA)
3.1.3.	Gene-set	results	visualization:	
Cytoscape Enrichment	Map
GO.id GO.name p.value covercover.rat Deg.mdn Deg.iqr
GO:0042330 taxis 2.18E-06 23 0.056930693 54.94499375 9.139238998
GO:0006935 chemotaxis 2.18E-06 23 0.060209424 54.94499375 9.139238998
GO:0002460 adaptive immune response based on somatic recombination 7.10E-05 25 0.111111111 57.32306955 16.97054864
GO:0002250 adaptive immune response 7.10E-05 25 0.111111111 57.32306955 16.97054864
GO:0002443 leukocyte mediated immunity 0.000419328 23 0.097046414 58.27890582 15.58333739
GO:0019724 B cell mediated immunity 0.000683758 20 0.114285714 57.84161096 15.03496347
GO:0030099 myeloid cell differentiation 0.000691589 24 0.089219331 62.22171598 10.35284833
GO:0002252 immune effector process 0.000775626 31 0.090116279 58.27890582 23.86214773
GO:0050764 regulation of phagocytosis 0.000792138 8 0.2 53.54786293 5.742849971
GO:0050766 positive regulation of phagocytosis 0.000792138 8 0.216216216 53.54786293 5.742849971
GO:0002449 lymphocyte mediated immunity 0.00087216 22 0.101851852 57.84161096 16.13171132
GO:0019838 growth factor binding 0.000913285 15 0.068181818 83.0405088 10.58734852
GO:0051258 protein polymerization 0.00108876 17 0.080952381 57.97543252 17.31639968
GO:0005789 endoplasmic reticulum membrane 0.001178198 18 0.036072144 64.02284752 12.05209158
GO:0016064 immunoglobulin mediated immune response 0.001444464 19 0.113095238 58.27890582 15.58333739
GO:0007507 heart development 0.001991562 26 0.052313883 84.02538284 18.60761304
GO:0009617 response to bacterium 0.002552999 10 0.027173913 52.75249873 23.23104637
GO:0030100 regulation of endocytosis 0.002658555 11 0.099099099 56.38041132 16.02486889
GO:0002526 acute inflammatory response 0.002660742 24 0.103004292 57.80098769 24.94311116
GO:0045807 positive regulation of endocytosis 0.002903401 9 0.147540984 54.94499375 6.769909171
GO:0002274 myeloid leukocyte activation 0.002969661 7 0.077777778 54.94499375 16.07042339
GO:0008652 amino acid biosynthetic process 0.003502921 7 0.017241379 45.19797271 31.18248579
GO:0050727 regulation of inflammatory response 0.004999055 7 0.084337349 54.94499375 7.737346076
GO:0002253 activation of immune response 0.00500146 23 0.116161616 60.29679989 18.41103376
GO:0002684 positive regulation of immune system process 0.006581245 27 0.111570248 60.29679989 22.05051447
GO:0050778 positive regulation of immune response 0.006581245 27 0.113924051 60.29679989 22.05051447
GO:0019882 antigen processing and presentation 0.007244488 7 0.029661017 54.94499375 16.58797889
GO:0002682 regulation of immune system process 0.007252134 29 0.099656357 61.05645008 22.65935206
GO:0050776 regulation of immune response 0.007252134 29 0.102112676 61.05645008 22.65935206
GO:0043086 negative regulation of enzyme activity 0.008017022 9 0.040723982 53.28031076 17.48904224
GO:0006909 phagocytosis 0.008106069 10 0.080645161 55.66270253 12.47536747
GO:0002573 myeloid leukocyte differentiation 0.008174948 10 0.092592593 62.86577216 9.401887596
GO:0006959 humoral immune response 0.008396095 16 0.044568245 55.05654091 18.94209565
GO:0046649 lymphocyte activation 0.009044401 29 0.059917355 61.92213317 21.03553355
GO:0030595 leukocyte chemotaxis 0.009707319 7 0.101449275 56.33116709 6.945510559
GO:0006469 negative regulation of protein kinase activity 0.010782155 7 0.046357616 52.22863516 12.58524145
GO:0051348 negative regulation of transferase activity 0.010782155 7 0.04516129 52.22863516 12.58524145
GO:0007179 transforming growth factor beta receptor signaling pathw 0.012630825 13 0.071038251 83.49440788 12.63256309
GO:0005520 insulin-like growth factor binding 0.012950071 9 0.097826087 81.41963394 7.528247832
GO:0042110 T cell activation 0.013410548 20 0.064516129 59.77891783 26.06174863
GO:0002455 humoral immune response mediated by circulating immunogl 0.016780163 10 0.125 54.70766244 14.2572143
GO:0005830 cytosolic ribosome (sensu Eukaryota) 0.016907351 8 0.01843318 61.68933284 7.814673781
GO:0006487 protein amino acid N-linked glycosylation 0.01791078 7 0.044585987 56.50635337 6.780726553
GO:0051240 positive regulation of multicellular organismal process 0.017931228 31 0.096573209 62.2953212 23.86214773
GO:0042379 chemokine receptor binding 0.018849666 12 0.095238095 55.13915015 19.08254406
GO:0008009 chemokine activity 0.018849666 12 0.096774194 55.13915015 19.08254406
GO:0016055 Wnt receptor signaling pathway 0.020088086 18 0.04400978 85.47935979 20.92435897
Need	
visualization	
solution..!
Visualization:	Cytoscape Enrichment	Map
• Visualization	framework	for	gene-set	
analysis	results
• Cytoscape network:	nodes	correspond	to	
gene-sets,	edges	correspond	to	gene-set	
overlaps	(i.e.	share	a	fraction	of	their	genes)
• Intuitive	clustering	of	gene-sets	that	
converge	on	the	same	functional	themes
• Determined	by	automatic	network	layout	
algorithm,	based	on	edge	weights
• Overlaps	<	threshold	are	pruned,	otherwise	
network	layout	would	work	poorly
• Important:	don’t	confuse	with	gene	
networks
• Nodes	do	not	represent	genes,	they	represent	
gene-sets/pathways
• Edges	do	not	represent	physical	interactions,	they	
represent	overlaps	between	gene-sets
A
B
Edges	represent	
gene-set	overlap
Merico D,	Isserlin R,	Stueker O,	Emili A,	Bader	GD.	
Enrichment map:	a	network-based method for	gene-set	
enrichment visualization and	interpretation.	
PLoS One 2010.	PMID:	21085593
Visualization:	Cytoscape Enrichment	Map
ABB1
ACAP3
TRAC1
LUC2
POF5
ZUMM
C5A75
DUCZ
TP53
NTRK1
MAPK3
ANAAT
PIK1
PRKCA
gs1
gs2
gs3
gs4 gs5
PIRL2
TAZ
CAZ1
gs1
gs3
gs4
gs2
gs5
Example:	Differential	expression	after	estrogen	treatment	of	breast	cancer	cells,	GSEA	competitive	gene-set	analysis
• Using	the	native	Gene	Ontology	
relations	results	in	a	more	
disconnected	graph
3.1.4.	Types	of	gene-set	analysis	tests
Competitive	vs	Self-contained
CASES CONTROLS
GENE
TEST
GENE-SETS
ENRICHED	IN	SCORE
(e.g.	gene-sets	enriched	
in	up-regulated	genes)
CASES CONTROLS
GENE-SET
TEST
GENE-SET	SCORE
(e.g.	significant	
mutation	burden	
difference)
GENE	SCORE
(e.g.	differential	
expression)
COMPETITIVE
(aka	ENRICHMENT	
aka	OVER-REPRESENTATION)
SELF-CONTAINED
SUPPORTING
GENES
Nam D,	Kim	SY.	Gene-set	approach for expression
pattern analysis.	Brief Bioinform 2008.	PMID:	
18202032
• Competitive à gene-set	genes	“compete”	
with	all	other	genes	(for	enrichment)
• Self-contained à gene-set	scored	
independently	of	other	genes
Competitive	Test	Types
UP
DOWN
ENRICHMENT
TEST
Threshold-
dependent
e.g.	
FET,	
g:Profiler *
Threshold-
independent
e.g.	GSEA
UP
DOWN
• More	suitable	for	
significantly	
mutated	genes
• More	suitable	for	
differential	gene	
expression
*	g:Profiler also	contains	a	“hybrid”	approach	that	selects	
the	most	optimal	cutoff for	gene-set	analysis
3.1.5.	Competitive	tests:	
GSEA	for	gene	expression	data
Gene	Expression	Analysis	Workflow
Generate	the	expression	
data
Collect	the	biological	
samples
Identify	the
Differential	Genes
Identify	the
Functional	Groups
Define	the	experimental	
design
GSEA:	Gene-Set	Enrichment	Analysis
• Popular	threshold-free	gene-set	test
• Identifies	gene-sets	enriched	in	top- or	bottom-ranking	genes
• Suggest	typically	used	as	competitive	test	(see	permutation	settings),	which	takes	
in	input	a	ranked	gene	list
• Statistical	test:	empirical	test	based	on	permutations;	includes	permutation-
based	FDR
• The	NES	(normalized	enrichment	score)	is	a	particularly	valuable	measure	of	
enrichment	effect	size	for	visualization
GSEA:	Gene-Set	Enrichment	Analysis
High	ES	score	<-->	High	local	enrichment
ES	score	calculation
Distribution	of	ES	from	
N	permutations	(e.g.	2000)
Number	of
instances
Real	ES	score	value
Randomized	with	
ES	≥	real:	4	/	2000	
==>	Empirical	p-
value	=	0.002
ES	Score
GSEA	Permutation	Settings
• The	permutation	setting	completely	changes	the	nature	of	the	GSEA	test
• Gene-set permutations	(aka	pre-ranked)
• Takes	in	input	a	ranked gene	list	and	permutes	the	genes	in	the	gene-sets
• à competitive
• Recommended	in	presence	of	differential	gene	expression	data	for	small	or	medium-scale	
experiments	(2-4	biological	replicates	per	condition)	with	modest	expression	heterogeneity
• Phenotype permutation
• Permute	the	phenotype	labels	(e.g.	treated,	untreated),	then	repeat	gene	scoring;	gene	
scoring	is	performed	within	GSEA
• à competitive	/	self-contained	hybrid
• Recommended	for	larger	scale	gene	expression	data	(>	10	biological	replicates	per	condition)	
with	high	expression	heterogeneity
• As	an	alternative,	consider	a	pure	self-contained	test,	or	a	self-contained	test	with	a	different	
competitive	correction
3.1.6.	Self-contained	tests:	
gene-set	somatic	burden
OICR	PanCuRx:	Dataset	Summary
• 200	primary	tumours	and	41	metastases	(pancreatic	cancer)
• Whole	genome	sequencing	à detection	of	SNVs,	indels,	SVs,	copy	number	gains	and	losses
• Mutation	load	outlier	removal	criterion:	median	+	2	IQR
à Samples	retained:	190/200	primaries	and	41/41	metastases	
Met Pri
3.54.04.55.0
SNV count
Log10(SNVcount)
Met Pri
1.52.02.53.03.54.04.5
Indel count
Log10(indelcount)
Met Pri
0.00.51.01.52.02.53.0
SV count
Log10(SVcount)
Unpublished	data
OICR	PanCuRx:	Gene-set	Analysis	Strategy
1. Perform	gene-set	burden	test,	primaries	vs	metastases
• Logistic	regression	(metastases	vs.	primary),	separating	each	variant	type:	
M0	=	y	~	ns_tot +	ms_tot +	ss_tot +	sv_tot +	cL_tot +	cG_tot
M1	=	y	~	ns_tot +	ms_tot +	ss_tot +	sv_tot +	cL_tot +	cG_tot +	
ns_gs +	ms_gs +	ss_gs +	sv_gs +	cL_gs +	cG_gs
• Multiple	test	correction	by	BH-FDR	(significant	when	BH-FDR	<	27.5%)
2. For	significant	gene-sets,	categorize	driver	variant	type(s)	and	extract	genes	
more	often	mutated	in	metastases	for	such	variant	types	(“leading	edge”	gene)
3. Cluster	pathways	based	on	leading	gene	overlaps,	visualize	using	Cytoscape
enrichment	map	plugin
4. Overlay	key	genes	(even	more	stringent	filter:	mutation	rate	met/pri >	4.5)
5. Formulate	hypotheses	à correlation	with	other	tumour	properties
• RNA-seq based	proliferation	index	(CCP)	and	missense	mutations	in	cell	cycle	genes
Unpublished	results;	
Gallinger,	PanCuRx TRI,	Toronto
REACT:TELOMERE MAINTENANCE
REACT:ION CHANNEL TRANSPORT
KEGG:BASE EXCISION REPAIR
REACT:RESOLUTION OF ABASIC SITES
(AP SITES)
KEGG:MINERAL ABSORPTION
REACT:CHROMOSOME MAINTENANCE
REACT:BASE EXCISION REPAIR
REACT:TRANSMEMBRANE TRANSPORT
OF SMALL MOLECULES
REACT:NUCLEOSOME ASSEMBLY
REACT:HDACS DEACETYLATE HISTONES
REACT:DEPOSITION OF NEW
CENPA-CONTAINING NUCLEOSOMES AT
THE CENTROMERE
REACT:DNA REPLICATION
PRE-INITIATION
REACT:FORMATION OF THE
BETA-CATENIN:TCF TRANSACTIVATING
COMPLEX
REACT:G2/M CHECKPOINTS
KEGG:ECM-RECEPTOR INTERACTION
REACT:CELL CYCLE, MITOTIC
REACT:M/G1 TRANSITION
REACT:G1/S TRANSITION
REACT:MITOTIC METAPHASE AND
ANAPHASE
REACT:TRANSCRIPTION-COUPLED
NUCLEOTIDE EXCISION REPAIR (TC-NER)
REACT:GAP-FILLING DNA REPAIR
SYNTHESIS AND LIGATION IN TC-NER
KEGG:SEROTONERGIC SYNAPSE
KEGG:GNRH SIGNALING PATHWAY
KEGG:CIRCADIAN ENTRAINMENT
Missense
(gain	and	loss	of	function?)
Nonsense	+	missense	
(loss	of	function?)
Nonsense
Nonsense	+	
copy	number	loss
Other	combination
Driver	variants
Copy	number	gain
Missense	+	SV
(loss	and	gain	of	function?)
For	all	clusters,	only	variants	driving	corresponding	gene-sets	
and	with	counts	met	>=	pri are	reported;	considering	the	number	
of	met	and	pri,	this	is	corresponds	to	an	enrichment	ratio	>	4.5
Unpublished	results;	
Gallinger,	PanCuRx TRI,	Toronto
REACT:TELOMERE MAINTENANCE
REACT:ION CHANNEL TRANSPORT
KEGG:BASE EXCISION REPAIR
REACT:RESOLUTION OF ABASIC SITES
(AP SITES)
KEGG:MINERAL ABSORPTION
REACT:CHROMOSOME MAINTENANCE
REACT:BASE EXCISION REPAIR
REACT:TRANSMEMBRANE TRANSPORT
OF SMALL MOLECULES
REACT:NUCLEOSOME ASSEMBLY
REACT:HDACS DEACETYLATE HISTONES
REACT:DEPOSITION OF NEW
CENPA-CONTAINING NUCLEOSOMES AT
THE CENTROMERE
REACT:DNA REPLICATION
PRE-INITIATION
REACT:FORMATION OF THE
BETA-CATENIN:TCF TRANSACTIVATING
COMPLEX
REACT:G2/M CHECKPOINTS
KEGG:ECM-RECEPTOR INTERACTION
REACT:CELL CYCLE, MITOTIC
REACT:M/G1 TRANSITION
REACT:G1/S TRANSITION
REACT:MITOTIC METAPHASE AND
ANAPHASE
REACT:TRANSCRIPTION-COUPLED
NUCLEOTIDE EXCISION REPAIR (TC-NER)
REACT:GAP-FILLING DNA REPAIR
SYNTHESIS AND LIGATION IN TC-NER
KEGG:SEROTONERGIC SYNAPSE
KEGG:GNRH SIGNALING PATHWAY
KEGG:CIRCADIAN ENTRAINMENT
Missense
(gain	and	loss	of	function?)
Nonsense	+	missense	
(loss	of	function?)
Nonsense
Nonsense	+	
copy	number	loss
Other	combination
Driver	variants Cell	cycle	(cell	cycle	progression	and	checkpoints),	DNA	replication	(polymerase,	replication	initiation,	
replication	fork	complexes),	chromosome	maintenance	and	segregation	(centromere	components,	
centrosome	components,	spindle	checkpoint)	– missense,	sometimes	also	sv [labelled]
CDT1 (4,0):	prevents	initiation	of	replication	when	DNA	replication	is	ongoing
POLA1 (1,0)	:	DNA	polymerases	[POLD1,	POLD3	and	other	DNA	polymerases	listed	only	in	repair	cluster]
MCM8 (2,0),	MCM3 (1,0),	MCM10 (1,1),	MCM7 (1,1):	replication	fork	complex – [MCM10 in	CCP]
CENPA (1,0),	CENPL (1,0),	CENPJ (1,1),	:	centromere	(chromosome	segregation)	– [CENPM,	CENPF in	CCP]	
NCAPD3 (1,0),	NIPBL (1,1):	chromosome	condensation	and/or	segregation
CEP57 (2,0),	CEP152 (2,1),	CNTRL (1,1):	microtubule	centrosome	(chromosome	segregation)	– [CEP55 in	CCP]
ERCC6L (2,1):	spindle	checkpoint;	CKAP5 (2,2):	spindle	formation;	CASC5/KNL1 (sv 1,0):	kinetochore
E2F1 (1,0),	E2F4 (1,0),	TFDP1 (1,0;	sv 1,1):	TFs	regulating	cell	cycle	progression
ANAPC11 (1,0),	ANAPC2 (1,0):	anaphase	promoting	complex	(cell	cycle	progression);	FBXO5 (1,0;	sv 1,0):	
anaphase	promoting	complex	inhibitor
ATM	(sv 1,1), TP53BP1 (1,1):	TP53	pathway	and	DNA	damage	response;	HMG20B (1,0):	DNA	damage	response
[histone	and	histone	(de)acetylation	listed	for	the	separate	subcluster]
Other:	AHCTF1	(2,2;	sv 2,1),	B9D2	(1,0),	BARD1	(1,0),	GORASP1	(1,0),	LEMD2	(1,0),	NEDD1	(1,0),	NUP205	(1,0),	
NUP88	(1,1),	NUP133	(1,1),	PPP1R12A	(1,0),	PSMA3	(1,1),	PSMD1	(1,1),	SDCCAG8	(1,1),	SGOL2	(1,1),	TUBGCP5	
(1,0),	UBB	(1,0),	YWHAH	(1,0),	XPO1	(1,0),	WRAP53	(sv 1,0),	ZW10	(1,0)
DNA	base	excision	repair	– missense,	sv
PARP1 (sv 1,0),	PARP2 (ms 1,0),	PARP4 (ms 1,0),	POLD3 (sv 1,0),	MPG (ms 1,0),	RPA1 (sv 2,0),	
RPA2 (1,0),	TDG (ms 1,0)
Transcription-coupled	nucleotide	excision	repair – only	missense
COPS2 (ms 1,0),	EP300 (ms 2,0),	ERCC3 (ms 2,0),	POLK (ms 1,0),	UBB (ms 1,0)
Both	– missense,	sv
LIG3 (ms 1,0),	POLD1 (ms 1,1;	sv 1,0),	XRCC1 (ms 1,0;	sv 1,0)
Beta	catenin	pathway	– only	missense
CTNNB1 (2,2):	beta	catenin
TCF7L2 (2,0):	TF	that	partners	with	CTNNB1	and	
activates	target	genes
Extracellular	matrix–receptor	interactions	
– only	missense
LAMB4	(1,1),	LAMC1	(1,1),	LAMC2	(1,0)
COL4A2	(1,1),	COL6A3	(2,2),	COL9A2	(1,1),	COL6A5	
(4,2),	HSPG2	(2,0)
COMP	(1,0),	TNR	(1,1)
ITGA1	(2,0),	ITGB4	(2,0),	ITGB3	(2,0),	ITGA2B	(1,0),	
ITGA11	(1,1),	ITGAV	(1,1)
CD47	(1,0),	CD36	(1,0)
Histones	and	histone	(de)acetylation	
– only	missense
HIST1H2BB (2,1),	HIST1H2BD (1,0),	HIST1H2BL (1,0),	
HIST1H2BO (sv 1,0),:	transcriptional	activation,	
response	to	DNA	damage	and	other	processes
H2AFB1 (sv 2,1)
CHD4 (1,0):	nucleosome	remodeling	and	histone	
deacetylase complex
EP300 (2,0):	histone	acetyltransferase	recognizing	
enhancers,	involved	in	cell	cycle,		DNA	damage	
response,	…
KAT5 (1,0):	histone	acetyltransferase
ARID4B (1,1):	histone	deacetylase
WHSC1 (1,1;	sv 1,0):	histone	methyltransferase	
NCOR1 (1,1),	TBL1XR1 (1,1):	nuclear	receptor	
corepressor	(N-CoR)	and	histone	deacetylase	3	
(HDAC	3)	complexes
Misc.	signalling	
– only	cnGain
ITPR2	(2,0)
ALOX12	(1,0)
GNAS	(1,0)
MAP3K3	(1,0)
PRKCG	(1,0)
Misc.	signalling
– only	nonsense
ADCY2	(1,0)
ADCY10	(1,1)
GUCY1A3	(1,0)
RYR3	(2,1)
Copy	number	gain
Missense	+	SV
(loss	and	gain	of	function?)
For	all	clusters,	only	variants	driving	corresponding	gene-sets	
and	with	counts	met	>=	pri are	reported;	considering	the	number	
of	met	and	pri,	this	is	corresponds	to	an	enrichment	ratio	>	4.5
Unpublished	results;	
Gallinger,	PanCuRxTRI,	Toronto
All	samples
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.42557 0.08620 4.937 2.39e-05 ***
# gsCC_ms_bin_stdz 0.14522 0.08739 1.662 0.1063
# gCDKN2ALOF_bin_stdz 0.16066 0.09449 1.700 0.0988 .
# vc_ms_tot_stdz 0.13934 0.08962 1.555 0.1298
Samples	with	<=	60	missense
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.31231 0.10020 3.117 0.00455 **
# gsCC_ms_bin_stdz 0.14051 0.09719 1.446 0.16068
# gCDKN2ALOF_bin_stdz 0.25673 0.11289 2.274 0.03181 *
# vc_ms_tot_stdz -0.09684 0.11187 -0.866 0.39489
Cell	cycle	missense	x	CDKN2A	LOF	(ns,	sv,	cL)
Met_CDKN2Ay_CCMSy
Met_CDKN2Ay_CCMSn
Met_CDKN2An_CCMSy
Met_CDKN2An_CCMSn
Pri_CDKN2Ay_CCMSy
Pri_CDKN2Ay_CCMSn
Pri_CDKN2An_CCMSy
Pri_CDKN2An_CCMSn
-3-2-1012
Met/pri x CDKN2A y/n x Cell Cycle ms y/n: ccp
ccpRNAindex
Met_CDKN2Ay_CCMSy
Met_CDKN2Ay_CCMSn
Met_CDKN2An_CCMSy
Met_CDKN2An_CCMSn
Pri_CDKN2Ay_CCMSy
Pri_CDKN2Ay_CCMSn
Pri_CDKN2An_CCMSy
Pri_CDKN2An_CCMSn
-3-2-1012
Met/pri x CDKN2A y/n x Cell Cycle ms y/n: ccp
ccpRNAindex
Unpublished	results;	
Gallinger,	PanCuRxTRI,	Toronto
General	Applications	of	Self-Contained	Tests
• Compare	different	tumour	subtypes
• Compare	tumours	by	survival	or	other	properties	(e.g.	clinical	grade,	response	to	
therapy)
• Important	to	address	systematic	differences	between	tumours	from	different	
groups	(mutation	load,	mutation	signatures,	etc.)
• Relatively	minor	differences	can	be	corrected	for,	whereas	large	differences	will	likely	prevent	
the	analysis	from	working	properly
• Correcting	for	total	number	of	variants	is	typically	recommended,	and	it	can	be	considered	a	
“competitive”	correction	of	the	self-contained	test	(i.e.	the	gene-set	is	more	predictive	of	the	
difference	between	sample	groups	than	all	genes)
3.1.7.	General	tips
General	Tips	for	Gene-set	Analysis	/	1
• Carefully	design	your	experiment
• Flaws	in	experimental	design,	like	presence	of	hidden	confounders	or	insufficient	number	of	
replicates,	will	result	in	confounded	or	negative	gene-set	results
• For	gene	expression	experiments,	perform	exploratory	analysis	(PCA,	MDS,	hierarchical	
clustering)	to	check	relations	among	samples	and	validate	the	experimental	design
• Choose	gene-set	types	and	filter	gene-sets	by	size
• Start	from	most	informative	gene-sets:	Gene	Ontology,	KEGG	and	Reactome pathways,	
MSigDB cancer	hallmarks
• Remove	small	gene-sets	to	improve	power	after	multiple	test	correction	(e.g.	<	15	genes	for	
competitive	tests	applied	to	differential	gene	expression)
• For	Gene	Ontology,	remove	large	gene-sets	(e.g.	>	500	genes)	as	they	tend	to	be	
uninformative
General	Tips	for	Gene-set	Analysis	/	2
• Chose	a	competitive	of	self-contained	test
Competitive:	
• requires	meaningful	gene	seletion or	ranking	à typically	suitable	for	differential	gene	
expression	or	genes	with	significant	mutation	burden
• if	analyzing	other	–omics,	model	carefully	the	background	distribution,	do	not	simply	assume	
Fisher’s	Exact	Test	or	GSEA	will	be	suitable	(e.g.	use	GREAT	for	ChIP-seq,	etc.)
Self-contained:	
• typically	suitable	for	sparser	mutations,	when	differences	are	significant	at	gene-set	level	only
• ensure	that	different	sample	groups	are	comparable,	correct	for	confounders
• Proper	visualization	is	important	to	interpret	results	and	to	identify	issues
• Use	visualization	solution	like	Enrichment	Map
• Visualize	the	full gene-set	results,	do	not	cherry-pick	based	on	prior	expectation
• Unexpected	results	can	suggest	issues	(e.g.	contamination,	statistical	bias)
• Do	not	forget	to	carefully	evaluate	genes	with	limited	or	no	gene-set	annotations	and	
network	interactions…!
General	Tips	for	Gene-set	Analysis	/	3
Time1
...
Zz34
13.56Aabc
Ranked List
1.07
...
Time3
PIK3CA
TP53
Gene List
VisualizeInterpret
Extractgenelist
froman'omics
experiment
Performpathway
enrichment
analysis
clusterMaker
Word
Cloud
Annotate
Auto
Cytoscape EnrichmentMap
REGULATION OF INTERFERON-GAMMA-MEDIATED
SIGNALING PATHWAY%GOBP%GO:0060334
Pathway P-value Q-value
POSITIVE REGULATION OF RHO PROTEIN
SIGNAL TRANSDUCTION%GOBP%GO:0035025
POSITIVE REGULATION OF RAS PROTEIN SIGNAL
TRANSDUCTION%GOBP%GO:0046579
0.00304414
0.0
0.004622496
0.0056384853
0.0038799183
0.008516296
positive regulation of small
GTPase mediated signal
transduction
positive regulation of Ras protein
signal transduction
regulation of
interferon-gamma-mediated
signaling pathwaypositive regulation of Rho protein
signal transduction
regulation of response to
interferon-gamma
gtpase signal transduction
regulation interferon gamma
Outputs
• Published	on	bioRxiv Jan	2017,	
provisionally	accepted	by	
Nature	Protocols
• General	concepts	and	
resources
• Step-by-step	instructions	for	
gene-set	analysis	of	gene	
expression	data
3.2.	Network	Analysis
3.2.1.	Network	visualization	and	gene	
network	types
Network	Representation	and	Visualization
Merico	D,	Gfeller D,	Bader	GD.	How	to	visually	interpret	biological	data	
using	networks.	Nature	Biotechnology	2009.	PMID:	19816451
Network	Visualization:	Automatic	Layout
Before layout After layout
• Yeast	proteins	annotated	to	GO	cellular	component	"chromosome”	
• Colored	based	on	sub-component	(nucleosome,	kinetochore,	replication	fork)
• The	layout	(force	directed)	meaningfully	arranges	nodes	(genes/proteins)	and	edges	(interactions)
Merico	D,	Gfeller D,	Bader	GD.	How	to	visually	interpret	biological	data	
using	networks.	Nature	Biotechnology	2009.	PMID:	19816451
Network	Visualization:	Cytoscape
• Rich	GUI	to	map	visual	markup	to	data
• Imports	tabular	data	(computational	biologist	friendly)
• Default	functions	for	visualization,	search,	layout
• Lots	of	“apps”	implementing	specific	algorithms	and	functionalities	(e.g.	Enrichment	Map)
Gene	Network	Types
• Protein-protein	(physical)	interactions
• Biochemical	reaction	adjacency (mainly	shared	output	/input	in	metabolic	pathways)
• Regulator-target	interactions (e.g.	TF/miRNA-target)
• Co-expression
• Genetic	interactions (e.g.	synthetic	lethality	in	double	KO)
• Semantic	similarity (e.g.	similarity	of	Gene	Ontology	annotations)
• Publication	co-citation
• Aggregate	functional	similarity (based	on	multi-omics)
Networks	vs	Pathways
Pathways
• Hand-curated	à more	accurate
• Represent	biochemical	
reactions,	or	molecular	events,	
or	regulatory	relations	among	
proteins,	protein	complexes,	
metabolites	and	other	bio-
entities	
Networks
• Derived	from	experimental	high	
throughput	methods	or	text	
mining	à more	noisy
• Represent	simple	relations	
among	genes	(e.g.	binds,	is	
similar	to,	is	co-expressed	with,	
regulates)
• Cover	a	larger	number	of	genes
Gene	Network	Resources
iRefWeb/iRefIndex wodaklab.org/iRefWeb
• Resource	integrating	different	databases
• Mainly	protein	interactions
• Useful	to	explore	specific	interactions,	or	bulk	download
GeneMANIA www.genemania.org
• Multiple	networks	available	(including	iRefIndex protein	interactions)
• Useful	to	construct,	visualize,	and	evaluate	networks	from	“seed”	genes	(network	propagation	
algorithm)
STRING string-db.org
• Integrated	network,	based	on	algorithm	for	function	prediction
• Protein	interactions,	pathway	interactions,	co-expression,	etc..
Network	Analysis	Overview
Most	common	analysis	types:
• Subnetwork	construction	from	seed	genes	à GeneMANIA
• Network	clustering	/	module	finding à ClusterMaker2	(MCODE,	MCL,	…)
• Enriched	sub-network	identification	à Reactome FI,	HyperModules,	HotNet
Other	types	of	analysis:
• Network	inference	from	expression	data	à ARACNE
• Pathway/network	activity	inference	à SPIA,	PARADIGM
• Overall	analysis	of	network	topology
• Motif	identification,	motif	content	analysis
Gene-set	vs	Network	Analysis
• Gene-set	pros
• Better	coverage	of	genes	and	known	biological	processes	/	components
• Simple	algorithmics,	a	few	well-established	analysis	options
• Gene-set	cons
• Simple	and	flat	structure,	do	not	represent	mechanistic	details
• Pre-constructed	based	on	“general	biology”
• Network	pros
• More	structured,	more	insight	on	mechanistic	details
• Can	reveal	new	gene-gene	associations
• Network	cons
• More	limited	coverage	of	genes	and	known	biological	processes	/	components
• More	complex	algorithmic,	more	analysis	options
3.2.2.	GeneMANIA
Component	1:
Weighted	network	combination
• Gene	Ontology	prediction
• Input	gene	connectivity
Component	2:
Label	propagation	algorithm
INPUT	=
Query	gene	list
(e.g.	DLG1,	SHANK)
OUTPUT	=
Query	genes	+	
interaction	neighbour	
network
GeneMANIA
3.2.3.	Reactome FI
Reactome FIViz
Components:
• Functional	Interaction	(FI)	Network
• Use	experimental	protein	interactions	in	human,	protein	interactions	in	model	organism,	
gene	expression,	to	predict	“functional	interactions”
• Positive	set:	pathway-based	interactions	from	Reactome
• Subnetwork	construction	algorithm
• Classical:	only	direct	connections,	or	additionally	linkers
• HotNet:	heat	kernel
• Clustering	Algorithm
• Edge-betweenness used	to	find	“local	interaction	communities”	in	the	sub-network
Cell	cycle	checkpoints,	DNA	damage	response
Adhesion	molecules
NOTCH	pathway
Glioblastoma	Subnetwork
a. DNA	copy	number	detection	for	206	
glioblastomas
b. detection	of	somatic	mutations	in	
601	selected	genes	for	91	matched	
tumor-normal	pairs
Growth	factor	signaling
Wu	G,	Feng	X,	Stein	L.	A	human	functional	protein	
interaction	network	and	its	application	to	cancer	
data	analysis.	Genome	Biol 2010.	PMID:	20482850
GeneMANIA or	Reactome FIViz?	
• GeneMANIA:	start	from	experimental	genes,	construct	a	larger	network	of	
related	genes	(without	further	using	the	same	experimental	data);	typically	works	
well	when	initial	genes	form	one	cluster,	when	genes	are	too	diverse	tends	to	
connect	them	using	less	specific	hubs
• Reactome FIViz:	start	from	experimental	genes,	inter-connect	them	using	
functional	interactions	and	potentially	including	some	linker	genes,	cluster	them	
into	modules
Nature	Methods	2015
For	More	Reading…
Thanks	for	your	attention!
Baked	by	Ruth	Isserlin

Más contenido relacionado

Similar a CDAC 2018 Merico making sense of cancer somatic snv

Apresentação Netsci 09
Apresentação Netsci 09Apresentação Netsci 09
Apresentação Netsci 09
nlemke
 
Fisher_URS 2015 poster
Fisher_URS 2015 posterFisher_URS 2015 poster
Fisher_URS 2015 poster
William Fisher
 
Deep learning methods in metagenomics: a review
Deep learning methods in metagenomics: a reviewDeep learning methods in metagenomics: a review
Deep learning methods in metagenomics: a review
ssuser6fc73c
 
Sunflower crop improvement through Integrated Omic apporach
Sunflower crop improvement through Integrated Omic apporach Sunflower crop improvement through Integrated Omic apporach
Sunflower crop improvement through Integrated Omic apporach
sreevathsasagar
 
MetaTox-Web-Application-for-Predicting-Structure-and-Toxicity-of-Xenobiotics-...
MetaTox-Web-Application-for-Predicting-Structure-and-Toxicity-of-Xenobiotics-...MetaTox-Web-Application-for-Predicting-Structure-and-Toxicity-of-Xenobiotics-...
MetaTox-Web-Application-for-Predicting-Structure-and-Toxicity-of-Xenobiotics-...
rosaline mishra
 
2Biological PsychologyStudent’s nameInstru.docx
2Biological PsychologyStudent’s nameInstru.docx2Biological PsychologyStudent’s nameInstru.docx
2Biological PsychologyStudent’s nameInstru.docx
standfordabbot
 
Example of syllabus
Example of syllabusExample of syllabus
Example of syllabus
Mkrd BD
 
Curriculum_Vitae_Mark_Ebbert-modern
Curriculum_Vitae_Mark_Ebbert-modernCurriculum_Vitae_Mark_Ebbert-modern
Curriculum_Vitae_Mark_Ebbert-modern
Mark Ebbert
 

Similar a CDAC 2018 Merico making sense of cancer somatic snv (20)

Clinical research Overview ppt
Clinical research Overview pptClinical research Overview ppt
Clinical research Overview ppt
 
5. IOHA - biomarkers and the internal exposome
5. IOHA - biomarkers and the internal exposome5. IOHA - biomarkers and the internal exposome
5. IOHA - biomarkers and the internal exposome
 
te_poster_sping_2016
te_poster_sping_2016te_poster_sping_2016
te_poster_sping_2016
 
Crofton PPDC Workshop AOP Framework 2013
Crofton PPDC Workshop AOP Framework 2013Crofton PPDC Workshop AOP Framework 2013
Crofton PPDC Workshop AOP Framework 2013
 
EDCS.pptx
EDCS.pptxEDCS.pptx
EDCS.pptx
 
Omics era
Omics eraOmics era
Omics era
 
Apresentação Netsci 09
Apresentação Netsci 09Apresentação Netsci 09
Apresentação Netsci 09
 
Functional genomics, a conceptual approach
Functional genomics, a conceptual approachFunctional genomics, a conceptual approach
Functional genomics, a conceptual approach
 
Fisher_URS 2015 poster
Fisher_URS 2015 posterFisher_URS 2015 poster
Fisher_URS 2015 poster
 
Discovery on Target 2014 - The Industry's Preeminent Event on Novel Drug Targets
Discovery on Target 2014 - The Industry's Preeminent Event on Novel Drug TargetsDiscovery on Target 2014 - The Industry's Preeminent Event on Novel Drug Targets
Discovery on Target 2014 - The Industry's Preeminent Event on Novel Drug Targets
 
Deep learning methods in metagenomics: a review
Deep learning methods in metagenomics: a reviewDeep learning methods in metagenomics: a review
Deep learning methods in metagenomics: a review
 
Sunflower crop improvement through Integrated Omic apporach
Sunflower crop improvement through Integrated Omic apporach Sunflower crop improvement through Integrated Omic apporach
Sunflower crop improvement through Integrated Omic apporach
 
Presentation development impact analysis
Presentation   development impact analysisPresentation   development impact analysis
Presentation development impact analysis
 
Poster rovida lorenzetti v2.0
Poster rovida lorenzetti v2.0Poster rovida lorenzetti v2.0
Poster rovida lorenzetti v2.0
 
MetaTox-Web-Application-for-Predicting-Structure-and-Toxicity-of-Xenobiotics-...
MetaTox-Web-Application-for-Predicting-Structure-and-Toxicity-of-Xenobiotics-...MetaTox-Web-Application-for-Predicting-Structure-and-Toxicity-of-Xenobiotics-...
MetaTox-Web-Application-for-Predicting-Structure-and-Toxicity-of-Xenobiotics-...
 
2Biological PsychologyStudent’s nameInstru.docx
2Biological PsychologyStudent’s nameInstru.docx2Biological PsychologyStudent’s nameInstru.docx
2Biological PsychologyStudent’s nameInstru.docx
 
Example of syllabus
Example of syllabusExample of syllabus
Example of syllabus
 
Curriculum_Vitae_Mark_Ebbert-modern
Curriculum_Vitae_Mark_Ebbert-modernCurriculum_Vitae_Mark_Ebbert-modern
Curriculum_Vitae_Mark_Ebbert-modern
 
Mapping to the Metabolomic Manifold
Mapping to the Metabolomic ManifoldMapping to the Metabolomic Manifold
Mapping to the Metabolomic Manifold
 
Sprig16 d leronni
Sprig16 d leronni Sprig16 d leronni
Sprig16 d leronni
 

Más de Marco Antoniotti

Más de Marco Antoniotti (14)

CDAC 2018 Angaroni optimal control
CDAC 2018 Angaroni optimal controlCDAC 2018 Angaroni optimal control
CDAC 2018 Angaroni optimal control
 
CDAC 2018 Ciccolella inferring
CDAC 2018 Ciccolella inferringCDAC 2018 Ciccolella inferring
CDAC 2018 Ciccolella inferring
 
CDAC 2018 Pellegrini clustering ppi networks
CDAC 2018 Pellegrini clustering ppi networksCDAC 2018 Pellegrini clustering ppi networks
CDAC 2018 Pellegrini clustering ppi networks
 
Cdac 2018 antoniotti cancer evolution trait
Cdac 2018 antoniotti cancer evolution traitCdac 2018 antoniotti cancer evolution trait
Cdac 2018 antoniotti cancer evolution trait
 
CDAC 2018 Boeva discovery
CDAC 2018 Boeva discoveryCDAC 2018 Boeva discovery
CDAC 2018 Boeva discovery
 
CDAC 2018 Boeva analysis chromatin
CDAC 2018 Boeva analysis chromatinCDAC 2018 Boeva analysis chromatin
CDAC 2018 Boeva analysis chromatin
 
CDAC 2018 Gonzales-Perez understanding cancer genomes
CDAC 2018 Gonzales-Perez understanding cancer genomesCDAC 2018 Gonzales-Perez understanding cancer genomes
CDAC 2018 Gonzales-Perez understanding cancer genomes
 
CDAC 2018 Gonzales-Perez interpretation of cancer genomes
CDAC 2018 Gonzales-Perez interpretation of cancer genomesCDAC 2018 Gonzales-Perez interpretation of cancer genomes
CDAC 2018 Gonzales-Perez interpretation of cancer genomes
 
CDAC 2018 Mishra immune system part b
CDAC 2018 Mishra immune system part bCDAC 2018 Mishra immune system part b
CDAC 2018 Mishra immune system part b
 
CDAC 2018 Mishra immune system part a
CDAC 2018 Mishra immune system part aCDAC 2018 Mishra immune system part a
CDAC 2018 Mishra immune system part a
 
CDAC 2018 Merico optimal scoring
CDAC 2018 Merico optimal scoringCDAC 2018 Merico optimal scoring
CDAC 2018 Merico optimal scoring
 
CDAC 2018 Elemento A precision medicine
CDAC 2018 Elemento A precision medicineCDAC 2018 Elemento A precision medicine
CDAC 2018 Elemento A precision medicine
 
CDAC 2018 Dubini microfluidic technologies for single cell manipulation
CDAC 2018 Dubini microfluidic technologies for single cell manipulationCDAC 2018 Dubini microfluidic technologies for single cell manipulation
CDAC 2018 Dubini microfluidic technologies for single cell manipulation
 
CDAC 2018 Cantor liquid biopsies
CDAC 2018 Cantor liquid biopsiesCDAC 2018 Cantor liquid biopsies
CDAC 2018 Cantor liquid biopsies
 

Último

GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
Lokesh Kothari
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
gindu3009
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 

Último (20)

GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
American Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptxAmerican Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptx
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 

CDAC 2018 Merico making sense of cancer somatic snv