Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Chemicals in Context: from SuperTarget and Matador to STITCH
1. Chemicals in Context: from
SuperTarget and Matador
to STITCH
Michael Kuhn
Peer Bork lab, EMBL Heidelberg
mkuhn@embl.de
2. Drug-Target Databases
Published online 16 October 2007 Nucleic Acids Research, 2008, Vol. 36, Database issue D919–D922
doi:10.1093/nar/gkm862
SuperTarget and Matador: resources for exploring
drug-target relationships
Stefan Gunther1, Michael Kuhn2, Mathias Dunkel1, Monica Campillos2,
¨
Christian Senger1, Evangelia Petsalaki2, Jessica Ahmed1,
Eduardo Garcia Urdiales2, Andreas Gewiess3, Lars Juhl Jensen2,
Reinhard Schneider2, Roman Skoblo3, Robert B. Russell2, Philip E. Bourne4,
Peer Bork2,5 and Robert Preissner1,*
1
´
Structural Bioinformatics Group, Institute of Molecular Biology and Bioinformatics, Charite—University Medicine
Berlin, Arnimallee 22, 14195 Berlin, EMBL—Biocomputing, Meyerhofstraße 1, 69117 Heidelberg, 3Institute for
2
Laboratory Medicine, Windscheidstr, 18, 10627 Berlin, Germany, 4Skaggs School of Pharmacy and
Pharmaceutical Sciences, University of California San Diego, 9500 Gilman Drive, La Jolla CA 92093, USA
and 5Max-Delbruck-Center for MolecularMedicine (MDC), 13092 Berlin-Buch, Germany
¨
Received August 15, 2007; Revised September 26, 2007; Accepted September 27, 2007
ABSTRACT INTRODUCTION
The molecular basis of drug action is often not Within the past two decades our knowledge about
well understood. This is partly because the very drugs, their mechanisms of action and target proteins
abundant and diverse information generated in the has increased rapidly. Nevertheless, knowledge on their
past decades on drugs is hidden in millions of molecular effects is far from complete. For some drugs
medical articles or textbooks. Therefore, we develo- even the primary targets are still unknown, for example,
Diloxanide, Niclosamide and Ambroxol are administered
ped a one-stop data warehouse, SuperTarget that
successfully although their effect on human metabolism is
integrates drug-related information about medical
3. Manual Curation
• look for abstracts in PubMed/MEDLINE
that mention genes and drugs
• create candidate list
• annotate candidate list
12. Chemicals in Context
D684–D688 Nucleic Acids Research, 2008, Vol. 36, Database issue Published online 15 December 2007
doi:10.1093/nar/gkm795
STITCH: interaction networks of chemicals
and proteins
Michael Kuhn1, Christian von Mering2, Monica Campillos1, Lars Juhl Jensen1,*
and Peer Bork1,3
1
European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany, 2University of Zurich,
Winterthurerstrasse 190, 8057 Zurich, Switzerland and 3Max-Delbruck-Centre for Molecular Medicine,
¨
¨
Robert-Rossle-Strasse 10, 13092 Berlin, Germany
Received August 14, 2007; Revised September 14, 2007; Accepted September 17, 2007
ABSTRACT basis for the integration of knowledge about chemicals
themselves, their biological interactions and their pheno-
The knowledge about interactions between typic effects. Thus, many problems in Chemical Biology
proteins and small molecules is essential for the are now becoming approachable by the academic research
understanding of molecular and cellular functions. community.
However, information on such interactions is Valuable information about the biological activity of
widely dispersed across numerous databases and chemicals is provided by large-scale experiments.
the literature. To facilitate access to this data, Phenotypic effects of chemicals were first made available
STITCH (‘search tool for interactions of chemicals’) on a large scale by the US National Cancer Institute (NCI),
integrates information about interactions from which conducts anti-cancer drug screens on 60 human
metabolic pathways, crystal structures, binding tumour cell lines (NCI60) (4). The patterns of growth
experiments and drug–target relationships. Inferred inhibition in the different cell lines by small molecules can
information from phenotypic effects, text mining not only be used to judge the efficacy of individual
compounds, but also to relate compounds by their
and chemical structure similarity is used to predict
18. Yao and Rzhetsky
within the network, although the drug
targets in the GeneWays network tend
to have slightly higher betweenness
values than average (P-value = 0.1943;
Fig. 2C). The increased average between-
ness of drug targets is most obvious in
the HPRD1 and HPRD 2 networks (P-
values = 0.0004 and 0.004, respectively),
suggesting that successful drug targets
tend to bridge two or more clusters of
relatively closely interacting molecules.
The clustering coefficients of drug tar-
gets are similar to those of the rest of the
network nodes in all five data sets (see
Table 2; Fig. 2D).
We next asked if proteins that are
successful drug targets are less polymor-
phic (considering only human, intraspe-
cies variation) than human genes on av-
Figure 1. Distribution of the number of human gene targets per successful drug. The plot is super- erage. To answer this question, we used a
imposed on a family classification of drug targets. large set (16,462 genes) of known hu-
man single-nucleotide polymorphisms
(SNPs) available at dbSNP (Sherry et al.
The connectivity of a node within a graph is simply the total 2001). To reduce any effects of SNP sampling bias (some genes
number of incoming and outgoing arcs (direct molecular inter- enjoy more attention on the part of the scientific community
actions, in our case). As has been previously established, the con- than others), instead of studying the absolute number of re-
nectivity distributions for real molecular networks are so-called ported SNPs for each gene, we used the ratio (Cratio) of nonsyn-
heavy-tail distributions resembling Zipf’s (Pareto’s or power-law) onymous to synonymous SNPs (with an expected value of 1 for
distribution (Fig. 2A; Barabasi and Bonabeau 2003). The success- a perfectly neutral mode of SNP accumulation). The assumption
ful drug targets occupy a rather narrow niche within this distri- underlying this analysis is that sampling bias for a gene affects
bution: their connectivity is significantly higher than that of an synonymous and nonsynonymous SNPs equally.
average node within the network (in GeneWays it is ∼9.1, P- Our analysis indicates (Fig. 2E,F) that Cratio for successful
1 2
value = 0.0064 [Fig. 2A,B,F]; in HPRD and HPRD , it is 10.9 and drug targets is significantly smaller than that for an average hu-
11.5, P-values = 0 and 0.0001, respectively; the same comparison man gene (P-value = 0.0007). This result suggests that successful
performed using the smaller Y2H and BIND networks revealed no drug targets tend to be less nonsynonymously polymorphic at
significant difference [see Table 2]). However, the average con- the human population level than are human genes on average.
nectivity of drug targets is relatively small compared to the maxi- Furthermore, Cratio is significantly negatively correlated with
mum connectivity observed in the network (9.1 vs. a maximum gene connectivity (Spearman rank correlation coefficient
of 346 in GeneWays). The most highly connected high-revenue 0.4841, P-value = 0.0000), consistent with the observation that
drug targets in the GeneWays network (ABL1, androgen receptor more highly conserved proteins tend to have higher connectivi-
[AR], BCHE, EGFR, INSR, NR3C1, TNF, and VEGFA; see Fig. 2G) ties (Fraser et al. 2002). Another line of evidence shows that
are targeted by drugs intended to provide relief for the most highly expressed genes tend to evolve more slowly than those
life-threatening phenotypes, such as cancer and autoimmune whose expression is low (Drummond et al. 2005). Furthermore,
disorders. The successful drugs targeting these highly connected some experimental techniques, such as yeast two-hybrid pro-
genes and proteins are associated with terrible side effects (think tein–protein interaction screening, may detect interactions of
of chemotherapy patients) that are tolerable only in life-or-death highly expressed proteins more readily (Bloom and Adami 2003).
situations. Hence, relationships between gene expression level, sequence
The betweenness of a network node is defined as the number conservation, and connectivity may involve data biases and
of times this node appears in the shortest path between two other should be interpreted with caution.
network nodes, summed over all node pairs in the network and We interpret the results of our SNP analysis as follows: a
divided by the total number of node pairs (e.g., Noh 2003). The drug designed to target a protein that is polymorphic among
clustering coefficient of a network node is the ratio of the actual
number of direct connections between the immediate neighbors Table 1. Comparison of different human molecular interaction
of the node to the maximum possible number of such direct arcs data sets
between its neighbors (e.g., Holme and Kim 2002). The clustering
No. of No. of No. of drug
coefficient is zero if a node’s neighbors do not interact directly genes/proteins interactions targets covered
(e.g., a professor who interacts with many graduate students, but
whose students avoid talking to one another). The highest clus- Y2H 2936 5722 49
tering coefficient is attained in a complete graph where every BIND 2886 4964 157
GeneWays 4458 14,124 197
node is connected to every other node. The betweenness values
HPRD1 7764 28,149 304
of the drug targets in the GeneWays, BIND, and Y2H networks HPRD2 9462 37,107 318
are not significantly different from those of the rest of genes
2 Genome Research
www.genome.org
28. Acknowledgements
• SuperTarget: Robert Preissner group
• Matador: Rob Russell / Peer Bork groups
• STITCH: Lars Juhl Jensen, Christian von
Mering and lab
• Data sources: PubChem, DrugBank, KEGG,
BindingDB, ...
29. Thank you for your
attention!
• SuperTarget:
http://insilico.charite.de/supertarget/
• Matador: http://matador.embl.de/
• STITCH: http://stitch.embl.de/