+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
Enabling Discoveries at High Throughput - Small molecule and RNAi HTS at the NCTT
1. Enabling
Discoveries
at
High
Throughput
Small
molecule
and
RNAi
HTS
at
the
NCTT
Rajarshi
Guha
NIH
Center
for
Transla6on
Therapeu6cs
May
3,
2011
2. Outline
• Informa6cs
for
small
molecule
&
RNAi
screening
• HCA
&
automated
decision
making
– Pre7y
pictures
can
lead
to
more
efficient
screens
• Large
scale
cheminforma6cs
– We
can
do
it,
but
do
we
need
to?
3. NIH Chemical Genomics Center
• Founded
2004
as
part
of
NIH
Roadmap
Molecular
Libraries
Ini6a6ve
– NCGC
staffed
with
90+
scien6sts
–
biologists,
chemists,
informa6cians,
engineers
– Post-‐doc
program
• Mission
– MLPCN
(screening
&
chemical
synthesis;
compound
repository;
PubChem
database;
funding
for
assay,
library
and
technology
development
)
– Develop
new
chemical
probes
for
basic
research
and
leads
for
therapeu6c
development,
par6cularly
for
rare/neglected
diseases
– New
paradigms
&
applica6ons
of
HTS
for
chemical
biology
/
chemical
genomics
• All
NCGC
projects
are
collabora6ons
with
a
target
or
disease
expert;
currently
>200
collabora6ons
with
inves6gators
worldwide
7. qHTS:
High
Throughput
Dose
Response
Assay concentration ranges over 4 logs Informatics pipeline. Automated curve fitting
A
(high:~ 100 μM)
1536-well plates, inter-plate dilution series
and classification. 300K samples
C
Assay volumes 2 – 5 μL
B
Automated concentration-response data collection
~1 CRC/sec
8. Informa?cs
Ac?vi?es
• High
throughput
curve
fieng
• Data
integra6on,
automated
cherry
picking
• SAR
algorithms
– QSAR
modeling
– Fragment
based
analysis
– Ac6vity
cliffs
• Tools
–
standardizer,
tautomers,
fragment
acDvity
browser,
kinome
browser
and
more
• RNAi
hit
selec6on,
OTE
analysis
• High
content
analysis
9. Kinome
Navigator
• Browse
kinase
panel
data
• Currently
focused
on
the
Abbot
dataset
• View
• Fragments
• Target
pairs
• Kinome
overlay
hip://tripod.nih.gov
10. Fragment
Browser
• View
ac6vi6es
on
a
fragment
wise
basis
• Compare
ac6vity
distribu6ons
by
fragment
• Currently
based
around
ChEMBL
assays
but
users
can
browse
their
own
compounds
&
ac6vi6es
hip://tripod.nih.gov
11. Structure
Ac?vity
Landscapes
• Rugged
gorges
or
rolling
hills?
– Small
structural
changes
associated
with
large
ac6vity
changes
represent
steep
slopes
in
the
landscape
– But
tradi6onally,
QSAR
assumes
gentle
slopes
– We
can
characterize
the
landscape
using
SALI
Maggiora,
G.M.,
J.
Chem.
Inf.
Model.,
2006,
46,
1535–1535
12. What
Can
We
Do
With
SALI’s?
• SALI
characterizes
cliffs
&
non-‐cliffs
• For
a
given
molecular
representa6on,
SALI’s
gives
us
an
idea
of
the
smoothness
of
the
SAR
landscape
• Models
try
and
encode
this
landscape
• Use
the
landscape
to
guide
descriptor
or
model
selec6on
Guha,
R.;
Van
Drie,
J.H.,
J.
Chem.
Inf.
Model.,
2008,
48,
646–658
13. Predic?ng
the
Landscape
• Rather
than
predic6ng
ac6vity
directly,
we
can
try
to
predict
the
SAR
landscape
• Implies
that
we
aiempt
to
directly
predict
cliffs
– Observa6ons
are
now
pairs
of
molecules
Original
pIC50
SALI,
AbsDiff
SALI,
GeoMean
RMSE
=
0.97
RMSE
=
1.10
RMSE
=
1.04
Scheiber
et
al,
StaDsDcal
Analysis
and
Data
Mining,
2009,
2,
115-‐122
14. Data
Integra?on
• It’s
nice
to
simplify
data,
but
we
can
s6ll
be
faced
with
a
mul6tude
of
data
types
• We
want
to
explore
these
data
in
a
linked
fashion
• How
we
explore
and
what
we
explore
is
generally
influenced
by
the
task
at
hand
• At
one
point,
make
inferences
over
all
the
data
15. Data
Integra?on
User’s
Network
Content:
-‐ Drugs
-‐ Compounds
-‐ Scaffolds
-‐ Assays
-‐ Genes
-‐ Targets
-‐ Pathways
-‐ Diseases
-‐ Clinical
Trials
-‐ Documents
Links:
Network
of
Public
Data
-‐Manually
curated
-‐Derived
from
algorithms
20. Going
Beyond
Explora?on?
• Simply
being
able
to
explore
data
in
an
integrated
manner
is
useful
as
an
idea
generator
• Can
we
integrate
heterogenous
data
types
&
sources
to
get
a
systems
level
view?
– Current
research
problem
in
genomics
and
systems
biology
– Some
aiempts
have
been
made
to
merge
chemical
data
with
other
data
types
Young,
D.W.
et
al,
Nat.
Chem.
Biol.,
2008,
4,
59-‐68
21. RNAi
Facility
Mission
• Perform
collabora6ve
genome-‐wide
RNAi
screening-‐
based
projects
with
intramural
inves6gators
• Advance
the
science
of
RNAi
and
miRNA
screening
and
informa6cs
via
technology
development
to
improve
efficiency,
reliability,
and
costs.
Simple Phenotypes Pathway (Reporter Complex Phenotypes
(Viability, cytotoxicity, assays, e.g. luciferase, (High-content imaging, cell
oxidative stress, etc)! β-lactamase)! cycle, translocation, etc)!
Range of Assays!
22. RNAi
Effectors
RNAi effectors provide an excellent way to conduct gene-specific loss of
function studies."
23. Issues
Using
RNAi
Effectors
• RNAi effectors give a knockdown not a knockout (70% - 80% is considered
good). Therefore, they may not silence enough to give a phenotype even if the
target is involved in what you are assaying for."
• RNAi effectors induce off-target effects!!!!! "
24. Examples of of
Current
Projects
Examples
Current Projects
•
Protein
Quality
Control
•
Poxvirus
•
DNA
Re-‐replica6on
•
Respiratory
Viruses
•
Base
Excision
Repair
•
Lysosomal
Storage
Disorders
•
DNA
Damage
–
ELG1
stabiliza6on
•
Parkinsons
–
Mitochondrial
Quality
Control
•
An6oxidant
Response
•
Ewings
Sarcoma
•
Hypoxia
•
Drug
Modifiers,
Pancrea6c
Cancer
•
TNFa
Response
•
Drug
Modifiers,
TOP1
Clinical
•
Interferon
Response
Agents
•
iPS
to
RPE
•
Immunotoxin-‐Mediated
Cell
Death
26. RNAi
Libraries
Ambion Human Genome- Ambion Mouse Genome-Wide
Wide Library, 21,585 genes, 3 Library, 17,582 genes, 3
unique siRNAs per gene. " unique siRNAs per gene."
Dharmacon Human Duet Human and Mouse miRNA
Genome-Wide siRNA Mimic Libraries &
Libraries, 18,236 genes, Human miRNA Inhibitor
siRNA pools." Library"
Qiagen Human Druggable Kinome Libraries"
Genome Library, > 7,000
Purchased from a number of
genes, 4 unique siRNAs per
vendors."
gene."
• Smaller libraries (e.g. kinome and miRNA mimics) will enable high-impact screens
in systems less amenable to high throughput applications."
• Considerations are being made for additional species and shRNA resources."
27. Druggable
Genome
Screening
Campaign
Pseudo-colored Blue/Green Ratio
(Normalized to plate Median)
• Over 7,000 genes, 4
unique siRNAs per gene
(≈36,000 wells).
• 85 genes were selected Significant enrichment for core
for follow-up through a NF-kB components
variety of threshold-based Percent Reduction in NF-kB Signal
100
selection schemes. Qiagen siRNAs
Ambion siRNAs
Average Inhibition (%)
80
• 27 genes were validated
as confident hits using 60
siRNAs from multiple 40
vendors.
20
0
TNFα Receptor IKKα
RELA NEMO
28. Druggable
Genome
Screening
Campaign
Significant enrichment for proteins that form the 28S
proteasome
Percent Reduction in NF-kB Signal Qiagen
Ambion RPN
100 19S
Regulator
particle
Average Inhibition (%)
80
RPT
60 α1-7 20S
ß1-7 Proteasome
40 α1-7
20 RPT
19S
Regulator
0 particle
RPN
D14
C4
C5
D2
D7
B2
B3
B4
A4
A5
A6
A7
A1
A2
A3
PSM Gene
Murata et al
PSM Protein α core 20S β core 20S RPT 19S RPN 19S Nature Reviews
Mol. Cell Biol.
An additional 34 genes remain inconclusive, but noteworthy hits that require further study.
Some of these tie into the core NF-kB pathway
29. Seed
Sequence
Analysis
Other instances of the seeds incorporated within siRNAs targeting PSMA3 do not
exhibit significant activity, adding to the likelihood of this being an on-target effect."
30. Seed
Sequence
Analysis
Other instances of the seeds within the active siRNAs targeting SLC24A1 tend to
downregulate NF-kB reporter, adding to the likelihood of this being an off-target effect."
31. RNAi
&
Small
Molecule
Screens
What
targets
mediate
ac6vity
of
siRNA
and
compound
Pathway
elucida6on,
iden6fica6on
•
Reuse
pre-‐exis6ng
MLI
data
of
interac6ons
•
Develop
new
annotated
libraries
CAGCATGAGTACTACAGGCCA
TACGGGAACTACCATAATTTA
Target
ID
and
valida6on
Link
RNAi
generated
pathway
peturba6ons
to
small
molecule
ac6vi6es.
Could
provide
insight
into
polypharmacology
•
Run
parallel
RNAi
screen
Goal:
Develop
systems
level
view
of
small
molecule
acUvity
33. Merging
Screening
Technologies
• Lead
iden6fica6on
High
throughput
screening
High
content
screening
• Single
(few)
read
outs
• Phenotypic
profiling
• High-‐throughput
• Mul6ple
parameters
• Moderate
data
volumes
• Moderate
throughput
• Very
large
data
volumes
• We’d
like
to
combine
the
technologies,
to
obtain
rich
high-‐resolu6on
data
at
high
speed
• Is
this
feasible?
What
are
the
trade-‐offs?
34. Merging
Screening
Technologies
• A
simple
solu6on
is
to
run
a
HTS
&
HCS
as
separate,
primary
&
secondary
screens
• Alterna6vely
–
Wells
to
Cells
– Integrate
HTS
&
HCS
in
a
single
screen
using
a
combined
plavorm
for
robo6cs
&
real
6me
automated
HTS
analy6cs
– Selec6ve
imaging
of
interes6ng
wells
35. Wells
to
Cells
Workflow
• Sequen6al
qHTS
using
laser
scanning
cytometry
followed
by
high-‐res
microscopy
• Unit
of
work
is
a
plate
series
• The
same
aliquot
is
analyzed
by
both
techniques
• A
message
based
system
• The
key
is
deciding
which
wells
go
through
the
workflow
36. Well
to
Cells
Assays
• Cell
cycle,
cell
transloca6on,
DNA
repreplica6on
• All
assays
run
against
LOPAC1280
• Consistency
between
cytometry
&
microscopy
is
measured
by
the
R2
between
log
AC50’s
– Cell
cycle,
0.94
–
0.96
– Cell
transloca6on,
0.66
–
0.94
– DNA
rereplica6on,
s6ll
in
progress
38. Informa?cs
Pla[orm
InCell
Layout
File
• Advanced
correc6on
and
normaliza6on
methods
• Sophis6cated
curve
fieng
algorithm
• Good
performance,
allows
paralleliza6on
of
the
en6re
workflow
39. Why
Messaging?
• A
messaging
architecture
allows
for
significant
flexibility
– Persistent,
can
be
kept
for
process
tracking,
repor6ng
– Asynchronous,
allows
individual
components
of
the
workflow
to
proceed
at
their
own
pace
– Modular,
new
components
can
be
introduced
at
any
6me
without
redesigning
the
whole
workflow
• We
employ
Oracle
AQ,
but
any
message
queue
can
be
employed
40. Handling
Mul?ple
Pla[orms
• Current
examples
employ
InCell
hardware
• We
also
use
Molecular
Devices
hardware
• As
a
result
we
have
two
orthogonal
image
stores
/
databases
• Need
to
integrate
them
– Support
seamless
data
browsing
across
mul6ple
screens
irrespec6ve
of
imaging
plavorm
used
– Support
analy6cs
external
to
vendor
code
41. A
Unified
Interface
• A
client
sees
a
single,
simple
interface
to
screening
image
data
hXp://host/rest/protocol/plate/well/image
• Transparently
extract
image
data
via
the
MetaXpress
database
or
via
custom
code
• Currently
the
interface
address
image
serving
• Unified
metadata
interface
in
the
works
42. Trade-‐offs
&
Opportuni?es
• Automa6on
reduces
the
ability
to
handle
unforeseen
errors
– Dispense
errors
and
other
plate
problems
– Well
selec6on
based
on
curve
classes
may
need
to
be
modified
on
the
fly
• Well
selec6on
does
not
consider
SAR
– Wells
are
selected
independently
of
each
other
– If
we
could
model
SAR
on
the
fly
(or
from
valida6on
screens),
we’d
select
mul6ple
wells,
to
obtain
posi6ve
and
nega?ve
results
43. Cloud
Compu?ng
&
Cheminforma?cs
• Cloud
compu6ng
is
a
hot
topic
• A
number
of
examples
of
computa6onal
chemistry
/
cheminforma6cs
on
the
cloud
– MolPlex,
hBar,
Numerate,
Wingu,
Sciligence,
Pfizer
• Many
examples
use
the
cloud
for
remote
storage
remote
(hosted)
computa6ons
• But
providers
such
as
Amazon
allow
us
to
run
distributed
compuDng
applica6ons
on
the
cloud
44. Map/Reduce
• Map/Reduce
is
a
programming
model
for
efficient
distributed
compu6ng
• M/R
made
“famous”
by
Google,
but
the
idea
has
been
around
for
a
long
6me
• It
works
like
a
Unix
pipeline:
– cat input | grep | sort | uniq -c | cat > output
–
Input
|
Map
|
Shuffle
&
Sort
|
Reduce
|
Output
• Efficiency
from
– Streaming
through
data,
reducing
seeks
– Pipelining
Owen
O’Malley,
hip://bit.ly/ecHPvB
46. Hadoop
&
Cheminforma?cs
• Hadoop
is
an
Open
Source
implementa6on
of
the
map/reduce
paradigm
• Hadoop
is
a
framework
for
scalable,
distributed
compu6ng
– Hadoop,
HDFS,
Hive,
PIG
• Importantly,
you
can
play
with
all
this
on
your
laptop
and
just
copy
files
to
the
big
cluster
when
you’re
ready
for
produc6on
47. Why
Hadoop?
• Simple
way
to
make
use
of
large
clusters
without
MPI
etc
• AWS
supports
Hadoop,
so
easy
to
scale
up
to
100’s
or
1000’s
of
cores
• Great
for
Java
code,
but
non-‐Java
code
can
also
make
use
of
Hadoop
• M/R
can
be
applied
to
a
lot
of
problems,
but
one
of
the
simplest
is
to
use
it
as
a
“chunker”
48. Cheminforma?cs
in
Parallel
• Many
cheminforma6cs
problems
are
data
parallel
– Chunk
the
data
and
apply
the
same
technique
over
each
chunk
• This
makes
many
problems
amenable
for
M/R
– Substructure
/
pharmacophore
search
– Descriptor
calcula6ons,
virtual
screening
– Model
development
(?)
• In
general,
each
chunk
is
processed
on
a
dis6nct
node
–
so
code
itself
can
be
non-‐parallel
50. Substructure
Searching
public class SubSearch {!
• Substructure
…!
public static class MoleculeMapper extends !
Mapper<Object, Text, Text, IntWritable> {!
searching
is
a
trivial
private Text matches = new Text();!
private String pattern;!
extension
of
atom
public void setup(Context context) {!
pattern = context.getConfiguration().get
("net.rguha.dc.data.pattern");!
coun6ng
}!
public void map(Object key, Text value, Context context) throws!
IOException, InterruptedException {!
• If
a
structure
try {!
IAtomContainer molecule = sp.parseSmiles(value.toString()); !
matches,
emit
sqt.setSmarts(pattern);!
boolean matched = sqt.matches(molecule);!
matches.set((String) molecule.getProperty(CDKConstants.TITLE));!
if (matched) context.write(matches, one);!
(name,1)!
else context.write(matches, zero);!
} catch (CDKException e) {!
e.printStackTrace();!
}!
• Otherwise
}!
}!
public static class SMARTSMatchReducer extends !
(name,0)
Reducer<Text, IntWritable, Text, IntWritable> {!
private IntWritable result = new IntWritable();!
• Reducer
simply
public void reduce(Text key, Iterable<IntWritable> values,!
Context context) throws IOException,
InterruptedException {!
for (IntWritable val : values) {!
outputs
tuples
of
the
if (val.compareTo(one) == 0) {!
result.set(1);!
context.write(key, result);!
form
(name,1)
}!
}!
}!
51. Running
on
AWS
• All
the
code
was
debugged
on
my
laptop
with
rela6vely
small
files
• To
test
the
scalability,
I
shi{ed
everything
to
AWS
– Pharmacophore
search
– 136K
structures,
single
conformer,
560MB
– Created
a
single
JAR
file
with
CDK
&
applica6on
code
– Uploaded
data
files
to
S3
• Total
cost
of
experiments
was
~
$10
52. But
I
Don’t
Want
to
Write
Programs
• All
these
examples
require
us
to
write
full
fledged
Java
classes
• An
easier
way
to
use
Pig
&
Pig
La6n
–
a
plavorm
and
query
language
built
on
top
of
Hadoop
• Lets
us
write
SQL-‐like
queries
that
make
use
of
Hadoop
underneath
• Flexible
due
to
user
defined
func6ons
(UDF’s)
– UDF’s
encapsulate
the
cheminforma6cs
53. Cheminforma?cs
&
Pig
A = load 'medium.smi' as (smiles:chararray);!
B = filter A by net.rguha.dc.pig.SMATCH(smiles, 'NC(=O)C(=O)N');!
store B into 'output.txt';!
• Iden6fy
molecules
in
medium.smi
that
match
the
SMARTS
paiern
and
dump
to
output.txt
• The
complexity
is
now
hidden
in
the
UDF
• Many
toolkit
func6ons
could
be
wrapped
as
UDF’s,
allowing
flexible
queries
with
much
simpler
code
• See
hip://blog.rguha.net/?p=748
for
the
code
54. Latency
• Hadoop
is
suited
for
batch
processing
• Significant
network
I/O
involved
in
distribu6ng
data
to
compute
nodes
• Not
good
for
– Random
ad
hoc
processing
of
small
subsets
– Small
volume
data
– Real
6me
(low
latency)
work
• But
latency
issues
can
be
addressed
somewhat
by
Hbase,
Hive
and
other
technologies
55. More
than
Chunking?
• But
all
the
examples
so
far
could
have
been
done
via
PBS/Condor
or
any
other
job
scheduler
– (With
Hadoop
we
don’t
have
to
worry
about
explicit
chunking
of
the
input
data)
• But
are
there
cheminforma6cs
algorithms
that
can
be
reworked
in
to
the
M/R
paradigm?
– Predic6ve
modeling?
– Graph
algorithms?
56. More
than
Chunking?
• Both
predic6ve
&
graph
algorithms
are
increasingly
supported
in
Hadoop
– Mahout
for
M/L
algorithms
on
massive
datasets
– Cloud9
for
graph
algorithms
• A
number
of
bioinforma6cs
applica6ons
make
use
of
M/R
at
the
algorithmic
level
• They
are
all
big
applica6ons
– Crossbow
aligns
3
billion
paired/unpaired
reads
• Cheminforma?cs
datasets
are
not
very
big
57. Summary
• HTS
data
is
an
ample
playground
for
interes6ng
analy6cs,
mul6ple
data
types
makes
it
more
fun
• A
major
challenge
in
our
informa6cs
infrastructure
is
dealing
with
proprietary
vendor
interfaces
• Hadoop
and
M/R
provide
great
opportuni6es
for
handling
large
data
in
a
flexible
manner
• But
can
cheminforma6cs
really
make
use
of
it?
58. Acknowledgments
InformaUcs
RNAi
&
Small
Molecule
• Ajit
Jadhav
• Scoi
Mar6n
• Trung
Nguyen
• Pinar
Tuzmen
• Noel
Southall
• Yu-‐Chi
Chen
• Ruili
Huang
• Carleen
Klump
• Min
Shen
• Craig
Thomas
• Hongmao
Sun
• Jim
Inglese
• Xin
Hu
• Ron
Johnson
• Tongan
Zhao
• Sam
Michael
• Jennifer
Wichterman
59.
60. Coun?ng
Atoms
• The
canonical
Hadoop
program
is
to
count
the
frequency
of
words
in
a
text
file
– Mapper
reads
a
line,
outputs
a
tuple
–
(word,
1)
– Reducer
will
receive
tuples,
keyed
on
word!
• Summing
up
the
1’s
gives
us
the
frequency
of
word
• By
default,
Hadoop
works
on
a
line-‐by-‐line
basis
• For
cheminforma6cs
problems,
SMILES
files
sa6sfy
this
requirement
–
one
line,
one
molecule
61. Coun?ng
Atoms
public class HeavyAtomCount {!
• Uses
the
CDK
to
static SmilesParser sp = new SmilesParser(DefaultChemObjectBuilder.getInstance());!
public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>
{!
!
parse
SMILES
private final static IntWritable one = new IntWritable(1);!
private Text word = new Text();!
• For
each
public void map(Object key, Text value, Context context) throws !
IOException, InterruptedException {!
try {!
IAtomContainer molecule = sp.parseSmiles(value.toString());!
molecule
loop
for (IAtom atom : molecule.atoms()) {!
word.set(atom.getSymbol());!
context.write(word, one);!
}!
over
atoms
} catch (InvalidSmilesException e) {!
// do nothing for now!
}!
}!
}!
– Emit
public static class IntSumReducer extends Reducer<Text, IntWritable, !
Text, IntWritable> {!
private IntWritable result = new IntWritable();!
(symbol,1)! public void reduce(Text key, Iterable<IntWritable> values,!
Context context) throws IOException, InterruptedException {!
int sum = 0;!
• Reducer
simply
for (IntWritable val : values) {!
sum += val.get();!
}!
result.set(sum);!
sums
the
1’s
for
context.write(key, result);!
}!
}!
….!
each
symbol
}!
62. Mul?line
Records
• Lots
of
cheminforma6cs
applica6ons
require
3D
–
SMILES
won’t
do.
Need
to
support
SDF
• We
implement
a
custom
RecordReader to
process
SD
files!
• We’re
now
ready
to
tackle
preiy
much
most
cheminforma6cs
tasks
63. Why
Hadoop?
• Java
and
C++
APIs
– In
Java
use
Objects,
while
in
C++
bytes
• Each
task
can
process
data
sets
larger
than
RAM
• Automa6c
re-‐execu6on
on
failure
– In
a
large
cluster,
some
nodes
are
always
slow
or
flaky
– Framework
re-‐executes
failed
tasks
• Locality
op6miza6ons
– M/R
queries
HDFS
for
loca6ons
of
input
data
– Map
tasks
are
scheduled
close
to
the
inputs
when
possible
Owen
O’Malley,
hip://bit.ly/ecHPvB