2. Background
• Primarily cheminforma0cs
– Data mining, algorithm development, soHware
– QSAR, diversity analysis, virtual screening,
fragments, polypharmacology, networks
– Work on a variety of Open Source projects
• Recently started moving into bioinforma0cs
– Suppor0ng RNAi screens
• Integrate small molecule informa1on &
biosystems – systems chemical biology
3. NIH Chemical Genomics Center
Assay development Compound
and op1miza1on Op1miza1on
Small Molecules
Biology Chemistry
NCGC
Informa0cs ACOM
Genome wide RNAi
SAR analysis, method & Automa1on, Compound
tool development management
4. Outline
• Small molecule screening at NCGC
• The NCGC RNAi infrastructure
• Making connec0ons
• RNAi challenges
6. Hun0ng for Leads
Target Lead Lead Clinical
Iden0fica0on Discovery Op0miza0on Development
HTS
Primary
Confirma0on
• Sensi0vity Screening • Select subset
• Scaling • Fluorescence to follow up • Counter
• High Content • Diversity screen
• Explore SAR
Assay
Cherry Picking
Op0miza0on
7. The qHTS Paradigm
• Tradi0onal single
point screens
can miss useful hits
• qHTS involves concentra0on response assays
on a high‐throughput scale
• The CRC allows us to
categorize hits in a more
fine‐grained manner
Inglese, J et al, Proc. Natl. Acad. Sci., 2006, 103, 11473‐11478
8. Conc. Response Curves
Inac1ve
• Heuris0c assessment of the significance of a
concentra0on response curve
• We aggregate certain curve classes into
“ac0ve”, “inconclusive” and “inac0ve”
categories
• Inconclusive is a “catch all”
Inconclusive
category (i.e., if it not clearly
‘ac0ve’ or ‘inac0ve’)
Ac1ve
8
9. Annota0ons
• NCGC employs a variety of screening libraries
– MLSMR (~ 300K)
– LOPAC (~ 1300)
– Prestwick, Sytravon, …
– Beyond structures and vendor ID’s, not a whole lot
of annota0on
– This is a required step for integra0on with RNAi
– Obviously not possible for large diverse libraries
• Use target predicBon models?
11. Trans‐NIH RNAi Ini0a0ve ‐ Mission
To establish a state of the art RNAi screening facility to perform
genome-wide RNAi screens with investigators in the intramural
NIH community.
• Gene func0on
• Pathway analysis
• Target ID
• Compound MoA
• Drug antagonist/
agonist
14. RNAi Analysis Workflow
Raw and GO
Processed annota0ons
Pathways
Data Interac0ons
• Summary
Normaliza0on
• Thresholding
Hit Triage
sta0s0cs • Median • Hypothesis • GO seman0c
• Correc0ons • Quar0le tes0ng similarity
• Background • Sum of ranks • Pathways
• Interac0ons
QC Hit Selec0on
Follow‐up Hit List
16. Back End Services
• Currently all computa0onal analysis performed
on the backend
• R & Bioconductor code
• Custom R package (ncgcrnai) to support NCGC
infrastructure
– Partly derived from cellHTS2
– Supports QC metrics, normaliza0on, adjustments,
selec0ons, triage, (sta0c) visualiza0on, reports
• Some Java tools for
– Data loading
– Library and plate registra0on
19. Deploying Data
• Small molecule HTS results are available via
PubChem
– RNAi data is also showing up in PubChem
• But what do we want to make available?
• How do we make it available?
– Standardized format (MIARE)
– cellHTS2 “format”
– Custom viewers
– Raw data? Calls?
20. Challenge ‐ RNAi & Small
Molecule Screens
What targets mediate activity
of siRNA and compound
Pathway elucidation,
• Reuse pre-existing MLI data identification of interactions
• Develop new annotated libraries
CAGCATGAGTACTACAGGCCA
TACGGGAACTACCATAATTTA
Target ID and validation
Link RNAi generated pathway
peturbations to small molecule
activities. Could provide insight
into polypharmacology
• Run parallel RNAi screen
Goal: Develop systems level view of small molecule activity
21. HTS for NF‐κB Antagonists
• NF‐κB controls DNA
transcrip0on
• Involved in cellular
responses to
s0muli
– Immune response,
memory forma0on
– Inflamma0on,
cancer, auto‐
immune diseases
hnp://www.genego.com
22. HTS for NF‐κB Antagonists
• ME‐180 cell line
• S0mulate cells using TNF, leading to NF‐κB
ac0va0on, readout via a β‐lactamase reporter
• Iden0fy small molecules and siRNA’s that
block the resultant ac0va0on
23. Small Molecule HTS Summary
Most Potent Actives
• 2,899 FDA‐approved !
!
! ! Proscillaridin A
0
compounds screened
!
!
!20
Activity
!
!40
!
• 55 compounds retested ac0ve
!
! !
!
!
!
!60
!
!9 !8 !7 !6 !5
log Concentration (uM)
Trabectidin
• Which components of the NF‐
! !
0
! !
!
!20
!
Activity
κB pathway do they hit?
!60
!
!100
!
!
!
! ! ! ! !
– 17 molecules have target/
!9 !8 !7 !6 !5
log Concentration (uM)
!
! !
Digoxin
0
!
pathway informa0on in GeneGO
!
!
!20
Activity
– Literature searches list a few
!40
! !
!
! !
!
!60
! !
!
more
!9 !8 !7 !6 !5
log Concentration (uM)
Miller, S.C. et al, Biochem. Pharmacol., 2010, ASAP
24. RNAi HTS Summary
• Qiagen HDG library – 6886 genes, 4 siRNA’s
per gene
• A total of 567 genes were knocked
down by 1 or more siRNA’s
– We consider >= 2 as a “reliable” hit
– 16 reliable hits
– Added in 66 genes for
follow up via triage procedure
25. The Obvious Conclusion
• The ac0ve compounds target the 16 hits (at
least) from the RNAi screen
– Useful if the RNAi screen was small & focused
• But what if we’re inves0ga0ng a larger system?
– Is there a way to get more specific?
– Can compound data suggest RNAi non‐hits?
29. A “Dic0onary” Based Approach
• Create a small‐ish annotated library
– “Seed” compounds
• Use it in parallel small molecule/RNAi screens
• Use a similarity based approach to priori0ze
larger collec0ons, in terms of an0cipated
targets
– Currently, we’d use structural similarity
– Diversity of priori0zed structures is dependent on
the diversity of the annotated library
30. Compound Networks ‐ Targets
• Predict targets for the ac0ves using SEA
• Target based compound network maps nearly
iden0cally to the
similarity based network
• But depending on the
predicted target quality
we get poor (or no)
mappings to the
RNAi targeted genes
Keiser, M.J. et al, Nat. Biotech., 2007, 25, 197‐206
31. Gene Networks ‐ Pathways
• Nodes are 1374 HDG
genes contained in the
NCI PID
• Edge indicates two
genes/proteins are
involved in the same
pathway
• “Good” hits tend to be
very highly connected
Wang, L. et al, BMC Genomics, 2009, 10, 220
32. (Reduced) Gene Networks – Pathways
• Nodes are 526 genes
with >= 1 siRNA
showing knockdown
• Edge indicates two
genes/proteins are
involved in the same
pathway
35. Integra0on Caveats
• Biggest bonleneck is lack of resolu0on
• Currently, both small molecule and RNAi data
are 1‐D
– Ac0ve or inac0ve, high/low signal
– CRC’s for small molecules alleviate this a bit
• High content screens can provide significantly
more informa0on and so bener resolu0on
– Data size & feature selec0on are of concern
36. Integra0on Caveats
• Compound annota0ons are key
• More comprehensive pathway data will be
required
• RNAi and small molecule inhibi0on do not
always lead to the same phenotype
– Could be indica0ve of promiscuity
– Could indicate true biological differences
Weiss, W.A. et al, Nat. Chem. Biol., 2007, 12, 739-744
37. CPT Sensi0za0on & “Central” Genes
Yves Pommier, Nat. Rev. Cancer, 2006.
TOP1 poisons prevent DNA religation resulting in replication-dependent double
strand breaks. Cell activates DNA damage response (e.g. ATR).
38. Screening Protocol
Screen conducted in the human breast cancer cell line MDA-MB-231.
Many variables to optimize including transfection conditions, cell seeding
density, assay conditions, and the selection of positive and negative
controls.
39. Hit Selection
Follow-Up Dose Response Analysis
ATR
Screen #1
siNeg
siATR-A
Viability (%)
siATR-B
siATR-C
Sensitization Ranked by Log2 Fold Change
CPT (Log M)
Screen #2
MAP3K7IP2
siNeg
siMAP3K7IP2-A
Viability (%)
siMAP3K7IP2-B
siMAP3K7IP2-C
siMAP3K7IP2-D
Sensitization Ranked by Log2 Fold Change
CPT (Log M)
Multiple active siRNAs for ATR, MAP3K7IP2, and BCL2L1.
40. Are These Genes Relevant?
• Some are well known to be CPT‐sensi0zers
• Consider a HPRD PPI sub‐network
corresponding to the Qiagen HDG gene set
• How “central” are these selected genes?
– Larger values of betweenness
3.0
indicate that the node lies on
2.5
many shortest paths
2.0
log Frequency
– Makes sense ‐ a number of
1.5
them are stress‐related
1.0
– But some of them have very low 0.5
betweenness values
0.0
0 2 4 6
log Betweenness