SlideShare una empresa de Scribd logo
1 de 48
Descargar para leer sin conexión
Robots, 
Small 
Molecules 
& 
R 
Ingredients 
for 
Exploring 
and 
Predic<ng 
Biological 
Effects 
Rajarshi 
Guha 
September 
13, 
2014 
hEp://blog.rguha.net/
Target 
Iden<fica<on 
Lead 
Discovery 
Lead 
Op<miza<on 
Clinical 
Development 
• Sensi<vity 
• Scaling 
Assay 
Op<miza<on 
Primary 
Screening 
• Fluorescence 
• High 
Content 
• Select 
subset 
to 
follow 
up 
• Diversity 
Cherry 
Picking 
Confirma<on 
• Counter 
screen 
• Explore 
SAR 
HTS 
Hun<ng 
for 
Leads
High 
Throughput 
Screening 
• Test 
thousands 
to 
hundreds 
of 
thousands 
of 
compounds 
in 
one 
or 
more 
assays 
• Employs 
a 
robo<c 
plaXorm 
• Rapidly 
iden<fy 
novel 
modulators 
of 
biological 
systems 
– Infec<ous 
agents 
– Cellular 
basis 
of 
diseases
Robots 
for 
Screening
Robots 
for 
Screening
HTS 
Workflow 
• Rapidly 
screen 
large 
compound 
collec<ons 
• Efficiently 
iden<fy 
real 
ac<ves 
– Test 
them 
in 
slower, 
accurate, 
expensive 
screens 
• Use 
the 
data 
to 
learn 
what 
types 
of 
compounds 
tend 
to 
be 
ac<ve 
• Use 
the 
model 
to 
suggest 
more 
compounds 
to 
screen 
300K 
HTS 
1000 
300 
Number of Molecules 
Cherry 
Picks
Data 
Science 
Problems 
• Predic<ve 
models 
for 
highlight 
imbalanced 
datasets 
• Global 
versus 
local 
models? 
• Feature 
selec<on 
– 
data 
driven? 
Domain 
driven? 
• Clustering 
& 
enrichment 
• Similarity 
– 
defini<on, 
computa<on, 
performance 
• Integra<on 
– 
chemical 
structures, 
numerical 
data, 
text 
(papers, 
patents), 
images
The 
Roles 
of 
R 
Data Access 
ROracle 
RMyQSL 
RPostgreSQL 
rpubchem 
chemblr 
Chemistry 
rcdk 
ChemmineR 
fingerprint 
HTS QC 
displayHTS 
spdep 
Imaging 
EBImage 
rflowcyt 
ripa 
raster 
Visualization 
grid 
ggplot 
Shiny 
ggvis 
igraph 
Data Analysis 
drc 
igraph 
randomForest 
svm 
... 
Also 
see 
ChemPhys 
CRAN 
Task 
View
HTS 
Data 
Types 
– 
Single 
Point 
100 
75 
50 
25 
0 
9.50 9.75 10.00 10.25 10.50 
Concentration 
Response
HTS 
Data 
Types 
– 
Dose 
Response 
120 
90 
60 
30 
0.01 1.00 
log10 Concentration 
Response 
y = S0 + 
Sinf − S0 
1+10(log AC50−x)H
HTS 
Data 
Types 
– 
Mul<ple 
Readouts 
(and 
have 
this 
at 
mul<ple 
doses!)
HTS 
Data 
Types 
-­‐ 
Combina<ons 
+
Independent 
Variable(s) 
Activity = f ( )
Features, 
Features, 
Features 
• How 
do 
we 
“quan<fy” 
a 
chemical 
structure?
Features, 
Features, 
Features 
Charges 
Dipole 
moments 
Topological 
invariants 
Surface 
proper<es 
1 0 1 1 0 0 0 1 0
Working 
with 
Molecules 
in 
R 
• A 
number 
of 
OSS 
libraries 
are 
available 
• ChemmineR 
and 
rcdk 
are 
the 
main 
packages 
that 
allow 
you 
to 
manipulate 
molecules 
in 
R 
• Uses 
rJava 
to 
interface 
with 
JOELib 
and 
CDK 
respec<vely
rcdk 
• Idioma<c 
R 
interface 
to 
the 
CDK 
library 
– I/O 
support 
for 
chemical 
file 
formats 
– Manipula<on 
of 
atoms, 
bonds, 
molecules 
– Generate 
molecular 
descriptors, 
fingerprints 
library(rcdk) 
mol <- parse.smiles(‘CCCC’)[[1]] 
mols <- load.molecules(‘http://www.rguha.net/mipe100.smi’)
rcdk 
• rcdk 
works 
with 
references 
to 
Java 
objects 
– Can’t 
save 
them 
in 
a 
workspace 
(trivially) 
> mol 
[1] "Java-Object{AtomContainer(2040919865, #A:4, Atom(2131361171, S:C, H:3, 
AtomType(2131361171, FC:0, Isotope(2131361171, Element(2131361171, S:C, AN:6)))), 
Atom(1759969037, S:C, H:2, AtomType(1759969037, FC:0, Isotope(1759969037, 
Element(1759969037, S:C, AN:6)))), Atom(359851081, S:C, H:2, AtomType(359851081, FC:0, 
Isotope(359851081, Element(359851081, S:C, AN:6)))), Atom(703168415, S:C, H:3, 
AtomType(703168415, FC:0, Isotope(703168415, Element(703168415, S:C, AN:6)))), #B:3, 
Bond(549041464, #O:SINGLE, #S:NONE, #A:2, Atom(2131361171, S:C, H:3, 
AtomType(2131361171, FC:0, Isotope(2131361171, Element(2131361171, S:C, AN:6)))), 
Atom(1759969037, S:C, H:2, AtomType(1759969037, FC:0, Isotope(1759969037, 
Element(1759969037, S:C, AN:6)))), ElectronContainer(549041464EC:2)), Bond(2654289, 
#O:SINGLE, #S:NONE, #A:2, Atom(1759969037, S:C, H:2, AtomType(1759969037, FC:0, 
Isotope(1759969037, Element(1759969037, S:C, AN:6)))), Atom(359851081, S:C, H:2, 
AtomType(359851081, FC:0, Isotope(359851081, Element(359851081, S:C, AN:6)))), 
ElectronContainer(2654289EC:2)), Bond(1660962283, #O:SINGLE, #S:NONE, #A:2, 
Atom(359851081, S:C, H:2, AtomType(359851081, FC:0, Isotope(359851081, 
Element(359851081, S:C, AN:6)))), Atom(703168415, S:C, H:3, AtomType(703168415, FC:0, 
Isotope(703168415, Element(703168415, S:C, AN:6)))), ElectronContainer(1660962283EC: 
2)))}" 
>
Calcula<ng 
Molecular 
Features 
• Evaluate 
a 
matrix 
of 
numerical 
features 
mols <- load.molecules("mipe100.smi") 
dnames <- get.desc.names('topological') 
descs <- eval.desc(mols, dnames) 
• End 
up 
with 
a 
rectangular 
data.frame 
> str(descs) 
'data.frame': 99 obs. of 195 variables: 
$ nRings7 : num 1 0 1 0 0 0 0 0 0 0 ... 
$ nRings8 : num 0 0 0 0 0 0 0 0 0 0 ... 
$ nRings9 : num 0 0 0 0 0 0 0 0 0 0 ... 
$ tpsaEfficiency : num 0.1856 0.2035 0.0118 0.0602 ...
Calcula<ng 
Fingerprints 
• Binary 
string 
representa<on 
of 
molecular 
structure 
– Objec<vely 
defined, 
fast 
to 
calculate 
– Good 
for 
searching, 
clustering, 
predic<on 
library(fingerprint) 
fps <- lapply(mols, get.fingerprint) 
• The 
fingerprint 
package 
is 
used 
to 
represent 
them 
as 
S4 
objects
Calcula<ng 
Fingerprints 
• Methods 
to 
compute 
similari<es, 
generate 
summaries 
& 
manipulate 
fingerprints 
> fps[[1]] 
Fingerprint object 
name = 
length = 1024 
folded = FALSE 
source = CDK 
bits on = 15 18 45 73 77 78 79 85 87 96 107 109 129 139 149 159 
162 166 172 179 194 209 214 223 225 227 239 254 266 272 301 312 327 
335 350 354 359 392 393 395 397 415 435 455 486 491 492 499 534 535 
541 543 544 545 546 559 575 600 605 618 621 622 626 635 638 644 645 
647 690 723 728 742 743 753 754 800 819 831 832 889 893 913 922 930 
936 954 985 988 1005 1008 1016 
>
Use 
Case 
-­‐ 
SAR 
• Cluster 
molecules 
by 
structure 
and 
examine 
whether 
clusters 
are 
enriched 
in 
ac<vity 
library(chemblr); library(rcdk) 
d <- get.activity(chembl.id='CHEMBL857155', type='assay') 
cmpds <- lapply(d$ingredient_cmpd_chemblid, get.compound, 
type='chemblid') 
cmpds <- do.call(rbind, 
lapply(cmpds, function(x) 
data.frame(x$chemblId, x$smiles, 
stringsAsFactors=FALSE))) 
mols <- parse.smiles(cmpds$x.smiles) 
fps <- lapply(mols, get.fingerprint) 
sm <- fp.sim.matrix(fps) 
rownames(sm) <- cmpds$x.chemblId 
dm <- as.dist(1-sm) 
clus <- hclust(dm)
Use 
Case 
-­‐ 
SAR 
CHEMBL331502 
CHEMBL328164 
CHEMBL52551 
CHEMBL331120 
CHEMBL120497 
CHEMBL331759 
CHEMBL120547 
CHEMBL324064 
CHEMBL318208 
CHEMBL328627 
CHEMBL99803 
CHEMBL317562 
CHEMBL332678 
CHEMBL100312 
CHEMBL119963 
CHEMBL334031 
CHEMBL323657 
CHEMBL118406 
CHEMBL118162 
CHEMBL120137 
CHEMBL331722 
CHEMBL120078 
CHEMBL121953 
CHEMBL331783 
CHEMBL333066 
CHEMBL116832 
CHEMBL316512 
CHEMBL318471 
CHEMBL98153 
CHEMBL95827 
CHEMBL119932 
CHEMBL99037 
CHEMBL120355 
CHEMBL430574 
CHEMBL120941 
CHEMBL299756 
CHEMBL317964 
CHEMBL98501 
CHEMBL317150 
CHEMBL120030 
CHEMBL99779 
CHEMBL98554 
CHEMBL318911 
CHEMBL97844 
CHEMBL316485 
CHEMBL296586 
CHEMBL100309 
CHEMBL98360 
CHEMBL316940 
CHEMBL120664 
CHEMBL419054 
CHEMBL119989 
CHEMBL121958 
CHEMBL121957 
CHEMBL329505 
CHEMBL121543 
CHEMBL121492 
CHEMBL333894 
CHEMBL333006 
CHEMBL50894 
CHEMBL116545 
CHEMBL331190 
CHEMBL325403 
CHEMBL99423 
CHEMBL330398 
CHEMBL95477 
CHEMBL545053 
CHEMBL329063 
CHEMBL331000 
CHEMBL319373 
CHEMBL431634 
CHEMBL325654 
CHEMBL332359 
CHEMBL334084 
CHEMBL328194
1.00 
0.75 
0.50 
0.25 
0.00 
0 250 500 750 
Bit Position 
Normalized Frequency 
Use 
Case 
-­‐ 
Bit 
Spectrum 
• Vector 
summary 
of 
the 
fingerprints 
for 
a 
dataset 
• Defined 
as 
the 
frac<on 
of 
<mes 
a 
bit 
posi<on 
is 
set 
to 
1, 
for 
each 
bit 
posi<on 
0 0 1 
0 1 0 
1 1 1 
1 0 1 
0.5 0.5 0.75 
... 
... 
... 
... 
... 
~ 
10K 
molecules
• Comparison 
• Simply 
e.g.: 
Compare 
~ 
800 
solubles 
with 
> 
30k 
insolubles 
1.0 
Use 
Case 
-­‐ 
Bit 
Spectrum 
of 
two 
datasets 
is 
now 
O(n) 
take 
the 
difference 
of 
the 
two 
bit 
spectra 
Frequency 
0.5 
Normalized 0.0 
-0.5 
Δ -1.0 
Bit Position 0 50 100 150 
## make two subsets and generate bit spectra 
sol.idx <- which(sol$label == 'high') 
insol.idx <- which(sol$label != 'high') 
sol.bs <- bit.spectrum(fps[sol.idx]) 
insol.bs <- bit.spectrum(fps[insol.idx]) 
## display a difference plot 
bsdiff <- sol.bs - insol.bs 
d <- data.frame(x=1:length(sol.bs), y=bsdiff) 
ggplot(d, aes(x=x,y=y))+geom_line()+ 
xlab('Bit Position')+ 
ylab('Normalized Frequency')+ 
ylim(c(-1,1))
PREDICTIVE 
MODELS 
-­‐ 
CAVEATS
Building 
Models 
is 
the 
Easy 
Part 
• Given 
a 
descriptor 
data.frame 
or 
fingerprint 
list 
we’re 
ready 
to 
build 
models 
– caret, 
caretEnsemble 
• Ques<on 
is 
whether 
the 
model(s) 
can 
generalize 
• Applicability 
is 
a 
key 
considera<on 
when 
predic<ng 
bioac<vity 
– Has 
economic 
& 
safety 
ramifica<ons 
in 
regulatory 
enviroments
Domain 
Applicability 
• How 
Training 
Set 
Test 
Set 
dissimilar 
to 
the 
training 
set 
do 
you 
have 
to 
be 
before 
the 
predic<on 
is 
meaningless? 
– Distance 
to 
training 
set? 
Inside/outside 
convex 
hull 
– Comparison 
of 
bit 
spectra
Global 
vs 
Local 
Models 
• Bioassay 
data 
is 
not 
really 
big 
data 
• Can 
big 
data 
be 
too 
big? 
• AID 
1996 
– 57K 
measurements 
of 
aqueous 
solubility 
• Do 
we 
build 
one 
model? 
• Or 
mul<ple 
local 
models? 
PCA 
of 
166 
Binary 
Features
RESPONSE 
SURFACES
Screening 
Drug 
Combina<ons 
• Increased 
efficacy 
• Delay 
resistance 
• AEenuate 
toxicity 
• Inform 
signaling 
pathway 
connec<vity 
• Iden<fy 
synthe<c 
lethality 
• Polypharmacology 
Transla'onal 
Interest 
Basic 
Interest
How 
to 
Test 
Combina<ons 
• Many 
procedures 
described 
in 
the 
literature 
– Fixed 
dose 
ra<o 
(aka 
ray) 
– Ray 
contour 
– Checkerboard 
– Gene<c 
algorithm 
C5,D5 C5 
C4,D4 C4 
C3,D3 C3 
C2,D2 C2 
C1,D5 C1,D4 C1,D3 C1,D2 C1,D1 C1 
D5 D4 D3 D2 D1 0
How 
to 
Test 
Combina<ons 
• Many 
procedures 
described 
in 
the 
literature 
– Fixed 
dose 
ra<o 
(aka 
ray) 
– Ray 
contour 
– Checkerboard 
– Gene<c 
algorithm 
Vargatef DCC-2036 PD-166285 GDC-0941 
PI-103 GDC-0980 Bardoxolone methyl AATT-77551199 
SNS-032 NCGC00188382-01 Lestaurtinib CNF-2024 
ISOX Belinostat PF-477736 AZD-7762
• Vargatef 
Why 
Similarity? 
exhibited 
anomalous 
matrix 
response 
compared 
to 
other 
VEGFR 
inhibitors 
Vargatef 
Linifanib Axitinib Sorafenib Vatalanib 
Motesanib Tivozanib Brivanib Telatinib 
Cabozantinib Cediranib BMS-794833 Lenvatinib 
OSI-632 Foretinib Regorafenib
When 
are 
Combina<ons 
Similar? 
• Differences 
and 
their 
aggregates 
such 
as 
RMSD 
can 
lead 
to 
degeneracy 
• Instead 
we’re 
interested 
in 
the 
shape 
of 
the 
surface 
• How 
to 
characterize 
shape? 
– Parametrized 
fits 
– Distribu<on 
of 
responses 
0.010 
0.005 
0.000 
0 25 50 75 100 
0.06 
0.04 
0.02 
0.00 
0 25 50 75 100 
0.15 
0.10 
0.05 
0.00 
0 50 100 
D, p value
Similarity 
via 
the 
Syrjala 
Test 
10.0 
7.5 
5.0 
2.5 
0.0 
0.00 0.25 0.50 0.75 
D 
density 
• Syrjala 
test 
used 
to 
compare 
popula<on 
distribu<ons 
over 
a 
spa<al 
grid 
– Invariant 
to 
grid 
orienta<on 
– Provides 
an 
empirical 
p-­‐value 
• Less 
degenerate 
than 
just 
considering 
1D 
distribu<ons 
Syrjala, 
S.E., 
“A 
Sta<s<cal 
Test 
for 
a 
Difference 
between 
the 
Spa<al 
Distribu<ons 
of 
Two 
Popula<ons”, 
Ecology, 
1996, 
77(1), 
75-­‐80
Clustering 
Response 
Surfaces 
0.0 0.2 0.4 0.6 0.8 
C1 
(24) 
C3(35) 
C2(47) 
C4(24)
Working 
in 
“Combina<on 
Space” 
• Each 
cell 
line 
is 
represented 
as 
a 
vector 
of 
response 
matrices 
• “Distance” 
between 
two 
cell 
lines 
is 
a 
func<on 
of 
the 
distance 
between 
component 
response 
matrices 
• F 
can 
be 
min, 
max, 
mean, 
… 
L1 
L2 
= 
d1 
= 
d2 
= 
d3 
= 
d4 
= 
d5 
D L1, L2 ( ) = F({d1, d2,…, dn}) 
, 
, 
, 
, 
,
Many 
Choices 
to 
Make 
0 1 2 3 4 
KMS-34 
INA-6 
L363 
OPM-1 
XG-2 
FR4 
AMO-1 
XG-6 
MOLP-8 
ANBL-6 
KMS-20 
XG-7 
OCI-MY1 
XG-1 
8226 
EJM 
U266 
KMS-11LB 
SKMM-1 
MM-MM1 
sum 
0.0 0.1 0.2 0.3 0.4 0.5 0.6 
L363 
OPM-1 
XG-2 
KMS-20 
XG-1 
XG-7 
ANBL-6 
OCI-MY1 
U266 
XG-6 
INA-6 
MOLP-8 
AMO-1 
KMS-34 
KMS-11LB 
SKMM-1 
MM-MM1 
EJM 
FR4 
8226 
max 
0.00 0.05 0.10 0.15 0.20 0.25 
INA-6 
MM-MM1 
8226 
XG-1 
U266 
ANBL-6 
SKMM-1 
EJM 
OPM-1 
XG-2 
OCI-MY1 
KMS-20 
L363 
KMS-11LB 
AMO-1 
XG-6 
FR4 
KMS-34 
MOLP-8 
XG-7 
min 
0.0 0.2 0.4 0.6 0.8 1.0 1.2 
L363 
OPM-1 
XG-2 
KMS-34 
INA-6 
KMS-11LB 
SKMM-1 
EJM 
U266 
MM-MM1 
FR4 
AMO-1 
XG-6 
8226 
MOLP-8 
ANBL-6 
OCI-MY1 
XG-1 
KMS-20 
XG-7 
euc
NETWORKS
Networks 
& 
Integra<on 
• Network 
models 
of 
molecules, 
and 
targets 
are 
common 
– Allows 
for 
the 
incorpora<on 
of 
lots 
of 
associated 
informa<on 
– Diseases, 
pathways, 
OTE’s, 
• When 
linked 
with 
clinical 
data 
& 
outcomes, 
we 
can 
generate 
massive 
networks 
– Adverse 
events 
(FDA 
AERS) 
– Analysis 
by 
Cloudera 
considered 
> 
10E6 
drug-­‐drug-­‐ 
reac<on 
triples 
Yildirim, 
M.A. 
et 
al
Networks 
& 
integra<on 
• SAR 
data 
can 
be 
viewed 
in 
a 
network 
form 
– SALI, 
SARI 
based 
networks 
– Usually 
requires 
pairwise 
calcula<ons 
of 
the 
metric 
• Current 
studies 
have 
focused 
on 
small 
datasets 
(< 
1000 
molecules) 
• Hadoop 
+ 
Giraph 
could 
let 
us 
apply 
this 
to 
HTS-­‐ 
scale 
datasets 
Peltason, 
L 
et 
al 
hEp://sali.rguha.net/
Networks 
& 
integra<on 
• When 
we 
apply 
a 
network 
view 
we 
can 
consider 
many 
interes<ng 
applica<ons 
& 
make 
use 
of 
cloud 
scale 
infrastructure 
– Network 
based 
similarity 
– Community 
detec<on 
(aka 
clustering) 
– PageRank 
style 
ranking 
(of 
targets, 
compounds, 
…) 
– Generate 
network 
metrics, 
which 
can 
be 
used 
as 
input 
to 
predic<ve 
models 
(for 
interac<ons, 
effects, 
…) 
Bauer-­‐Mehren 
et 
al
Combina<ons 
as 
Networks 
Combina<on 
screens 
lend 
themselves 
naturally 
to 
network 
representa<ons 
● 
● 
● 
● 
● ● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● 
● ● 
● 
● 
● 
Δ Bliss+ 
0.0 
−0.5 
−1.0 
−1.4 
−1.9 
−2.4 
−2.9 
−3.3 
−3.8 
−4.3 
● 
● ● 
● 
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● 
Δ Bliss+ 
0.0 
−0.4 
−0.8 
−1.2 
−1.5 
−1.9 
−2.3 
−2.7 
−3.1 
−3.4 
immune system process 
apoptotic process 
transcription from RNA 
polymerase II promoter 
protein phosphorylation 
cell communication 
immune response
Combina<ons 
as 
Networks 
• Things 
get 
more 
interes<ng 
when 
we 
have 
n 
m 
screens 
• Can 
be 
simplified 
using 
a 
variety 
of 
methods 
– Neighborhoods 
– Minimum 
● 
● ● 
Spanning 
Tree 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
×
Comparing 
Neighborhoods 
Combina<ons 
that 
have 
DBSumNeg 
< 
1st 
quar<le 
value 
for 
that 
strain 
3D7 DD2 HB3
Iden<fying 
the 
Most 
Synergis<c 
Pairs 
● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
●
Summary 
• The 
HTS 
workflow 
presents 
mul<ple 
data 
science 
problems 
involving 
(unique) 
data 
types 
• R 
can 
play 
a 
role 
at 
several 
stages, 
but 
model 
building 
is 
straighXorward 
• Representa<on 
is 
key 
and 
guides 
the 
types 
and 
nature 
of 
analyses

Más contenido relacionado

La actualidad más candente

Pjb Probes 2009
Pjb Probes 2009Pjb Probes 2009
Pjb Probes 2009toluene
 
Perspective on QSAR modeling of transport
Perspective on QSAR modeling of transportPerspective on QSAR modeling of transport
Perspective on QSAR modeling of transportSean Ekins
 
Applying computational models for transporters to predict toxicity
Applying computational models for transporters to predict toxicityApplying computational models for transporters to predict toxicity
Applying computational models for transporters to predict toxicitySean Ekins
 
Ph.D. Defense 2021.pptx
Ph.D. Defense 2021.pptxPh.D. Defense 2021.pptx
Ph.D. Defense 2021.pptxHinaKhalid38
 
AAPS 2015_W3081_Biomarker Screening Poster_Russell
AAPS 2015_W3081_Biomarker Screening Poster_RussellAAPS 2015_W3081_Biomarker Screening Poster_Russell
AAPS 2015_W3081_Biomarker Screening Poster_RussellLawrence Hwang
 
SF and PE CTR-IN 2016 Poster_FInal
SF and PE CTR-IN 2016 Poster_FInalSF and PE CTR-IN 2016 Poster_FInal
SF and PE CTR-IN 2016 Poster_FInalSteve Flynn
 
Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013
Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013
Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013Prof. Wim Van Criekinge
 
Pp 90505-lc-ms-serum-profiling-biomarker-msacleu2019-pp90505-en
Pp 90505-lc-ms-serum-profiling-biomarker-msacleu2019-pp90505-enPp 90505-lc-ms-serum-profiling-biomarker-msacleu2019-pp90505-en
Pp 90505-lc-ms-serum-profiling-biomarker-msacleu2019-pp90505-enAlexander Boichenko
 
Oncodesign aacr 2018 development of a high throughput in vitro screening pl...
Oncodesign aacr 2018   development of a high throughput in vitro screening pl...Oncodesign aacr 2018   development of a high throughput in vitro screening pl...
Oncodesign aacr 2018 development of a high throughput in vitro screening pl...Florence Fombertasse
 
Host Cell Protein Analysis - Measuring the Forest, or Counting theTrees
Host Cell Protein Analysis - Measuring the Forest, or Counting theTreesHost Cell Protein Analysis - Measuring the Forest, or Counting theTrees
Host Cell Protein Analysis - Measuring the Forest, or Counting theTreesTed Kocot
 
Rna editing as a drug target identification of inhibitors of rel 1 bsp 210
Rna editing as a drug target identification of inhibitors of rel 1 bsp 210Rna editing as a drug target identification of inhibitors of rel 1 bsp 210
Rna editing as a drug target identification of inhibitors of rel 1 bsp 210Laurence Dawkins-Hall
 
High-throughput capillary-flow LC-MS proteomics with maximum MS utilisation
High-throughput capillary-flow LC-MS proteomics with maximum MS utilisationHigh-throughput capillary-flow LC-MS proteomics with maximum MS utilisation
High-throughput capillary-flow LC-MS proteomics with maximum MS utilisationAlexander Boichenko
 
Collaborative Drug Discovery: A Platform For Transforming Neglected Disease R...
Collaborative Drug Discovery: A Platform For Transforming Neglected Disease R...Collaborative Drug Discovery: A Platform For Transforming Neglected Disease R...
Collaborative Drug Discovery: A Platform For Transforming Neglected Disease R...Sean Ekins
 
ASMS 2014 poster final
ASMS 2014 poster finalASMS 2014 poster final
ASMS 2014 poster finalNathan Bond
 
Poster_Kamber_XenoScreen YES YAS
Poster_Kamber_XenoScreen YES YASPoster_Kamber_XenoScreen YES YAS
Poster_Kamber_XenoScreen YES YASNicolas NICAISE
 

La actualidad más candente (18)

Pjb Probes 2009
Pjb Probes 2009Pjb Probes 2009
Pjb Probes 2009
 
Introduction to DILIsym Services, Inc.
Introduction to DILIsym Services, Inc. Introduction to DILIsym Services, Inc.
Introduction to DILIsym Services, Inc.
 
Perspective on QSAR modeling of transport
Perspective on QSAR modeling of transportPerspective on QSAR modeling of transport
Perspective on QSAR modeling of transport
 
Applying computational models for transporters to predict toxicity
Applying computational models for transporters to predict toxicityApplying computational models for transporters to predict toxicity
Applying computational models for transporters to predict toxicity
 
Ph.D. Defense 2021.pptx
Ph.D. Defense 2021.pptxPh.D. Defense 2021.pptx
Ph.D. Defense 2021.pptx
 
AAPS 2015_W3081_Biomarker Screening Poster_Russell
AAPS 2015_W3081_Biomarker Screening Poster_RussellAAPS 2015_W3081_Biomarker Screening Poster_Russell
AAPS 2015_W3081_Biomarker Screening Poster_Russell
 
SF and PE CTR-IN 2016 Poster_FInal
SF and PE CTR-IN 2016 Poster_FInalSF and PE CTR-IN 2016 Poster_FInal
SF and PE CTR-IN 2016 Poster_FInal
 
BAMBA_conference_poster
BAMBA_conference_posterBAMBA_conference_poster
BAMBA_conference_poster
 
Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013
Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013
Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013
 
Pp 90505-lc-ms-serum-profiling-biomarker-msacleu2019-pp90505-en
Pp 90505-lc-ms-serum-profiling-biomarker-msacleu2019-pp90505-enPp 90505-lc-ms-serum-profiling-biomarker-msacleu2019-pp90505-en
Pp 90505-lc-ms-serum-profiling-biomarker-msacleu2019-pp90505-en
 
Oncodesign aacr 2018 development of a high throughput in vitro screening pl...
Oncodesign aacr 2018   development of a high throughput in vitro screening pl...Oncodesign aacr 2018   development of a high throughput in vitro screening pl...
Oncodesign aacr 2018 development of a high throughput in vitro screening pl...
 
Host Cell Protein Analysis - Measuring the Forest, or Counting theTrees
Host Cell Protein Analysis - Measuring the Forest, or Counting theTreesHost Cell Protein Analysis - Measuring the Forest, or Counting theTrees
Host Cell Protein Analysis - Measuring the Forest, or Counting theTrees
 
Rna editing as a drug target identification of inhibitors of rel 1 bsp 210
Rna editing as a drug target identification of inhibitors of rel 1 bsp 210Rna editing as a drug target identification of inhibitors of rel 1 bsp 210
Rna editing as a drug target identification of inhibitors of rel 1 bsp 210
 
High-throughput capillary-flow LC-MS proteomics with maximum MS utilisation
High-throughput capillary-flow LC-MS proteomics with maximum MS utilisationHigh-throughput capillary-flow LC-MS proteomics with maximum MS utilisation
High-throughput capillary-flow LC-MS proteomics with maximum MS utilisation
 
Collaborative Drug Discovery: A Platform For Transforming Neglected Disease R...
Collaborative Drug Discovery: A Platform For Transforming Neglected Disease R...Collaborative Drug Discovery: A Platform For Transforming Neglected Disease R...
Collaborative Drug Discovery: A Platform For Transforming Neglected Disease R...
 
ASMS 2014 poster final
ASMS 2014 poster finalASMS 2014 poster final
ASMS 2014 poster final
 
2015 PBL Assay Solutions
2015 PBL Assay Solutions2015 PBL Assay Solutions
2015 PBL Assay Solutions
 
Poster_Kamber_XenoScreen YES YAS
Poster_Kamber_XenoScreen YES YASPoster_Kamber_XenoScreen YES YAS
Poster_Kamber_XenoScreen YES YAS
 

Destacado

The BioAssay Research Database
The BioAssay Research DatabaseThe BioAssay Research Database
The BioAssay Research DatabaseRajarshi Guha
 
The smaller sukhavati vyuha
The smaller sukhavati vyuhaThe smaller sukhavati vyuha
The smaller sukhavati vyuhaLin Zhang Sheng
 
Characterization and visualization of compound combination responses in a hig...
Characterization and visualization of compound combination responses in a hig...Characterization and visualization of compound combination responses in a hig...
Characterization and visualization of compound combination responses in a hig...Rajarshi Guha
 
R & CDK: A Sturdy Platform in the Oceans of Chemical Data}
R & CDK: A Sturdy Platform in the Oceans of Chemical Data}R & CDK: A Sturdy Platform in the Oceans of Chemical Data}
R & CDK: A Sturdy Platform in the Oceans of Chemical Data}Rajarshi Guha
 
The Trans-NIH RNAi Initiative : Informatics
The Trans-NIH RNAi Initiative: InformaticsThe Trans-NIH RNAi Initiative: Informatics
The Trans-NIH RNAi Initiative : InformaticsRajarshi Guha
 
6 metabolite enrichment analysis
6  metabolite enrichment analysis6  metabolite enrichment analysis
6 metabolite enrichment analysisDmitry Grapov
 
4 partial least squares modeling
4  partial least squares modeling4  partial least squares modeling
4 partial least squares modelingDmitry Grapov
 
1 statistical analysis
1  statistical analysis1  statistical analysis
1 statistical analysisDmitry Grapov
 
5 data analysis case study
5  data analysis case study5  data analysis case study
5 data analysis case studyDmitry Grapov
 
Crunching Molecules and Numbers in R
Crunching Molecules and Numbers in RCrunching Molecules and Numbers in R
Crunching Molecules and Numbers in RRajarshi Guha
 
3 principal components analysis
3  principal components analysis3  principal components analysis
3 principal components analysisDmitry Grapov
 

Destacado (14)

The BioAssay Research Database
The BioAssay Research DatabaseThe BioAssay Research Database
The BioAssay Research Database
 
The smaller sukhavati vyuha
The smaller sukhavati vyuhaThe smaller sukhavati vyuha
The smaller sukhavati vyuha
 
Characterization and visualization of compound combination responses in a hig...
Characterization and visualization of compound combination responses in a hig...Characterization and visualization of compound combination responses in a hig...
Characterization and visualization of compound combination responses in a hig...
 
R & CDK: A Sturdy Platform in the Oceans of Chemical Data}
R & CDK: A Sturdy Platform in the Oceans of Chemical Data}R & CDK: A Sturdy Platform in the Oceans of Chemical Data}
R & CDK: A Sturdy Platform in the Oceans of Chemical Data}
 
The Trans-NIH RNAi Initiative : Informatics
The Trans-NIH RNAi Initiative: InformaticsThe Trans-NIH RNAi Initiative: Informatics
The Trans-NIH RNAi Initiative : Informatics
 
6 metabolite enrichment analysis
6  metabolite enrichment analysis6  metabolite enrichment analysis
6 metabolite enrichment analysis
 
0 introduction
0  introduction0  introduction
0 introduction
 
7 network mapping i
7  network mapping i7  network mapping i
7 network mapping i
 
4 partial least squares modeling
4  partial least squares modeling4  partial least squares modeling
4 partial least squares modeling
 
1 statistical analysis
1  statistical analysis1  statistical analysis
1 statistical analysis
 
5 data analysis case study
5  data analysis case study5  data analysis case study
5 data analysis case study
 
2 cluster analysis
2  cluster analysis2  cluster analysis
2 cluster analysis
 
Crunching Molecules and Numbers in R
Crunching Molecules and Numbers in RCrunching Molecules and Numbers in R
Crunching Molecules and Numbers in R
 
3 principal components analysis
3  principal components analysis3  principal components analysis
3 principal components analysis
 

Similar a Robots, Small Molecules & R

⭐⭐⭐⭐⭐ Finding a Dynamical Model of a Social Norm Physical Activity Intervention
⭐⭐⭐⭐⭐ Finding a Dynamical Model of a Social Norm Physical Activity Intervention⭐⭐⭐⭐⭐ Finding a Dynamical Model of a Social Norm Physical Activity Intervention
⭐⭐⭐⭐⭐ Finding a Dynamical Model of a Social Norm Physical Activity InterventionVictor Asanza
 
Introduction to Chainer Chemistry
Introduction to Chainer ChemistryIntroduction to Chainer Chemistry
Introduction to Chainer ChemistryPreferred Networks
 
Open Science Data Repository - Dataledger
Open Science Data Repository - DataledgerOpen Science Data Repository - Dataledger
Open Science Data Repository - DataledgerAlexandru Korotcov
 
Fingerprinting Chemical Structures
Fingerprinting Chemical StructuresFingerprinting Chemical Structures
Fingerprinting Chemical StructuresRajarshi Guha
 
7 qc toools LEARN and KNOW how to BUILD IN EXCEL
7 qc toools LEARN and KNOW how to BUILD IN EXCEL7 qc toools LEARN and KNOW how to BUILD IN EXCEL
7 qc toools LEARN and KNOW how to BUILD IN EXCELrajesh1655
 
Simple rules for building robust machine learning models
Simple rules for building robust machine learning modelsSimple rules for building robust machine learning models
Simple rules for building robust machine learning modelsKyriakos Chatzidimitriou
 
Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Li Shen
 
Terminological cluster trees for Disjointness Axiom Discovery
Terminological cluster trees for Disjointness Axiom DiscoveryTerminological cluster trees for Disjointness Axiom Discovery
Terminological cluster trees for Disjointness Axiom DiscoveryGiuseppe Rizzo
 
Database Research Principles Revealed
Database Research Principles RevealedDatabase Research Principles Revealed
Database Research Principles Revealedinfoblog
 
Neural Architectures for Named Entity Recognition
Neural Architectures for Named Entity RecognitionNeural Architectures for Named Entity Recognition
Neural Architectures for Named Entity RecognitionRrubaa Panchendrarajan
 
Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering odsc
 
Xomia_20220602.pptx
Xomia_20220602.pptxXomia_20220602.pptx
Xomia_20220602.pptxLonghow Lam
 
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...Kamel Mansouri
 
Botnets behavioral patterns in the network. A Machine Learning study of botne...
Botnets behavioral patterns in the network. A Machine Learning study of botne...Botnets behavioral patterns in the network. A Machine Learning study of botne...
Botnets behavioral patterns in the network. A Machine Learning study of botne...Czech Technical University in Prague
 
Multiple objectives in Collaborative Filtering (RecSys 2010)
Multiple objectives in Collaborative Filtering (RecSys 2010)Multiple objectives in Collaborative Filtering (RecSys 2010)
Multiple objectives in Collaborative Filtering (RecSys 2010)Tamas Jambor
 

Similar a Robots, Small Molecules & R (20)

⭐⭐⭐⭐⭐ Finding a Dynamical Model of a Social Norm Physical Activity Intervention
⭐⭐⭐⭐⭐ Finding a Dynamical Model of a Social Norm Physical Activity Intervention⭐⭐⭐⭐⭐ Finding a Dynamical Model of a Social Norm Physical Activity Intervention
⭐⭐⭐⭐⭐ Finding a Dynamical Model of a Social Norm Physical Activity Intervention
 
Introduction to Chainer Chemistry
Introduction to Chainer ChemistryIntroduction to Chainer Chemistry
Introduction to Chainer Chemistry
 
Open Science Data Repository - Dataledger
Open Science Data Repository - DataledgerOpen Science Data Repository - Dataledger
Open Science Data Repository - Dataledger
 
Fingerprinting Chemical Structures
Fingerprinting Chemical StructuresFingerprinting Chemical Structures
Fingerprinting Chemical Structures
 
7 qc toools LEARN and KNOW how to BUILD IN EXCEL
7 qc toools LEARN and KNOW how to BUILD IN EXCEL7 qc toools LEARN and KNOW how to BUILD IN EXCEL
7 qc toools LEARN and KNOW how to BUILD IN EXCEL
 
Simple rules for building robust machine learning models
Simple rules for building robust machine learning modelsSimple rules for building robust machine learning models
Simple rules for building robust machine learning models
 
Tree building 2
Tree building 2Tree building 2
Tree building 2
 
Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2
 
Terminological cluster trees for Disjointness Axiom Discovery
Terminological cluster trees for Disjointness Axiom DiscoveryTerminological cluster trees for Disjointness Axiom Discovery
Terminological cluster trees for Disjointness Axiom Discovery
 
Blinkdb
BlinkdbBlinkdb
Blinkdb
 
Database Research Principles Revealed
Database Research Principles RevealedDatabase Research Principles Revealed
Database Research Principles Revealed
 
Neural Architectures for Named Entity Recognition
Neural Architectures for Named Entity RecognitionNeural Architectures for Named Entity Recognition
Neural Architectures for Named Entity Recognition
 
Macs course
Macs courseMacs course
Macs course
 
The Right Way
The Right WayThe Right Way
The Right Way
 
Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
Xomia_20220602.pptx
Xomia_20220602.pptxXomia_20220602.pptx
Xomia_20220602.pptx
 
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
 
Botnets behavioral patterns in the network. A Machine Learning study of botne...
Botnets behavioral patterns in the network. A Machine Learning study of botne...Botnets behavioral patterns in the network. A Machine Learning study of botne...
Botnets behavioral patterns in the network. A Machine Learning study of botne...
 
Multiple objectives in Collaborative Filtering (RecSys 2010)
Multiple objectives in Collaborative Filtering (RecSys 2010)Multiple objectives in Collaborative Filtering (RecSys 2010)
Multiple objectives in Collaborative Filtering (RecSys 2010)
 

Más de Rajarshi Guha

Pharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark GenomePharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark GenomeRajarshi Guha
 
Pharos: Putting targets in context
Pharos: Putting targets in contextPharos: Putting targets in context
Pharos: Putting targets in contextRajarshi Guha
 
Pharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark GenomePharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark GenomeRajarshi Guha
 
Pharos - Face of the KMC
Pharos - Face of the KMCPharos - Face of the KMC
Pharos - Face of the KMCRajarshi Guha
 
Enhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
Enhancing Prioritization & Discovery of Novel Combinations using an HTS PlatformEnhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
Enhancing Prioritization & Discovery of Novel Combinations using an HTS PlatformRajarshi Guha
 
What can your library do for you?
What can your library do for you?What can your library do for you?
What can your library do for you?Rajarshi Guha
 
So I have an SD File … What do I do next?
So I have an SD File … What do I do next?So I have an SD File … What do I do next?
So I have an SD File … What do I do next?Rajarshi Guha
 
Characterization of Chemical Libraries Using Scaffolds and Network Models
Characterization of Chemical Libraries Using Scaffolds and Network ModelsCharacterization of Chemical Libraries Using Scaffolds and Network Models
Characterization of Chemical Libraries Using Scaffolds and Network ModelsRajarshi Guha
 
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action: Bridging Chemistry and Biology with Informatics at NCATSFrom Data to Action: Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATSRajarshi Guha
 
When the whole is better than the parts
When the whole is better than the partsWhen the whole is better than the parts
When the whole is better than the partsRajarshi Guha
 
Pushing Chemical Biology Through the Pipes
Pushing Chemical Biology Through the PipesPushing Chemical Biology Through the Pipes
Pushing Chemical Biology Through the PipesRajarshi Guha
 
Cloudy with a Touch of Cheminformatics
Cloudy with a Touch of CheminformaticsCloudy with a Touch of Cheminformatics
Cloudy with a Touch of CheminformaticsRajarshi Guha
 
Chemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & ReproducibleChemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & ReproducibleRajarshi Guha
 
Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?Rajarshi Guha
 
Quantifying Text Sentiment in R
Quantifying Text Sentiment in RQuantifying Text Sentiment in R
Quantifying Text Sentiment in RRajarshi Guha
 
PMML for QSAR Model Exchange
PMML for QSAR Model Exchange PMML for QSAR Model Exchange
PMML for QSAR Model Exchange Rajarshi Guha
 
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity DataSmall Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity DataRajarshi Guha
 
Predicting Activity Cliffs - Can Machine Learning Handle Special Cases?
Predicting Activity Cliffs - Can Machine Learning Handle Special Cases?Predicting Activity Cliffs - Can Machine Learning Handle Special Cases?
Predicting Activity Cliffs - Can Machine Learning Handle Special Cases?Rajarshi Guha
 
Enabling Discoveries at High Throughput - Small molecule and RNAi HTS at the ...
Enabling Discoveries at High Throughput - Small molecule and RNAi HTS at the ...Enabling Discoveries at High Throughput - Small molecule and RNAi HTS at the ...
Enabling Discoveries at High Throughput - Small molecule and RNAi HTS at the ...Rajarshi Guha
 

Más de Rajarshi Guha (20)

Pharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark GenomePharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark Genome
 
Pharos: Putting targets in context
Pharos: Putting targets in contextPharos: Putting targets in context
Pharos: Putting targets in context
 
Pharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark GenomePharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark Genome
 
Pharos - Face of the KMC
Pharos - Face of the KMCPharos - Face of the KMC
Pharos - Face of the KMC
 
Enhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
Enhancing Prioritization & Discovery of Novel Combinations using an HTS PlatformEnhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
Enhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
 
What can your library do for you?
What can your library do for you?What can your library do for you?
What can your library do for you?
 
So I have an SD File … What do I do next?
So I have an SD File … What do I do next?So I have an SD File … What do I do next?
So I have an SD File … What do I do next?
 
Characterization of Chemical Libraries Using Scaffolds and Network Models
Characterization of Chemical Libraries Using Scaffolds and Network ModelsCharacterization of Chemical Libraries Using Scaffolds and Network Models
Characterization of Chemical Libraries Using Scaffolds and Network Models
 
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action: Bridging Chemistry and Biology with Informatics at NCATSFrom Data to Action: Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATS
 
When the whole is better than the parts
When the whole is better than the partsWhen the whole is better than the parts
When the whole is better than the parts
 
Pushing Chemical Biology Through the Pipes
Pushing Chemical Biology Through the PipesPushing Chemical Biology Through the Pipes
Pushing Chemical Biology Through the Pipes
 
Cloudy with a Touch of Cheminformatics
Cloudy with a Touch of CheminformaticsCloudy with a Touch of Cheminformatics
Cloudy with a Touch of Cheminformatics
 
Chemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & ReproducibleChemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & Reproducible
 
Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?
 
Quantifying Text Sentiment in R
Quantifying Text Sentiment in RQuantifying Text Sentiment in R
Quantifying Text Sentiment in R
 
PMML for QSAR Model Exchange
PMML for QSAR Model Exchange PMML for QSAR Model Exchange
PMML for QSAR Model Exchange
 
Smashing Molecules
Smashing MoleculesSmashing Molecules
Smashing Molecules
 
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity DataSmall Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
 
Predicting Activity Cliffs - Can Machine Learning Handle Special Cases?
Predicting Activity Cliffs - Can Machine Learning Handle Special Cases?Predicting Activity Cliffs - Can Machine Learning Handle Special Cases?
Predicting Activity Cliffs - Can Machine Learning Handle Special Cases?
 
Enabling Discoveries at High Throughput - Small molecule and RNAi HTS at the ...
Enabling Discoveries at High Throughput - Small molecule and RNAi HTS at the ...Enabling Discoveries at High Throughput - Small molecule and RNAi HTS at the ...
Enabling Discoveries at High Throughput - Small molecule and RNAi HTS at the ...
 

Último

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Último (20)

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Robots, Small Molecules & R

  • 1. Robots, Small Molecules & R Ingredients for Exploring and Predic<ng Biological Effects Rajarshi Guha September 13, 2014 hEp://blog.rguha.net/
  • 2. Target Iden<fica<on Lead Discovery Lead Op<miza<on Clinical Development • Sensi<vity • Scaling Assay Op<miza<on Primary Screening • Fluorescence • High Content • Select subset to follow up • Diversity Cherry Picking Confirma<on • Counter screen • Explore SAR HTS Hun<ng for Leads
  • 3. High Throughput Screening • Test thousands to hundreds of thousands of compounds in one or more assays • Employs a robo<c plaXorm • Rapidly iden<fy novel modulators of biological systems – Infec<ous agents – Cellular basis of diseases
  • 6. HTS Workflow • Rapidly screen large compound collec<ons • Efficiently iden<fy real ac<ves – Test them in slower, accurate, expensive screens • Use the data to learn what types of compounds tend to be ac<ve • Use the model to suggest more compounds to screen 300K HTS 1000 300 Number of Molecules Cherry Picks
  • 7. Data Science Problems • Predic<ve models for highlight imbalanced datasets • Global versus local models? • Feature selec<on – data driven? Domain driven? • Clustering & enrichment • Similarity – defini<on, computa<on, performance • Integra<on – chemical structures, numerical data, text (papers, patents), images
  • 8. The Roles of R Data Access ROracle RMyQSL RPostgreSQL rpubchem chemblr Chemistry rcdk ChemmineR fingerprint HTS QC displayHTS spdep Imaging EBImage rflowcyt ripa raster Visualization grid ggplot Shiny ggvis igraph Data Analysis drc igraph randomForest svm ... Also see ChemPhys CRAN Task View
  • 9. HTS Data Types – Single Point 100 75 50 25 0 9.50 9.75 10.00 10.25 10.50 Concentration Response
  • 10. HTS Data Types – Dose Response 120 90 60 30 0.01 1.00 log10 Concentration Response y = S0 + Sinf − S0 1+10(log AC50−x)H
  • 11. HTS Data Types – Mul<ple Readouts (and have this at mul<ple doses!)
  • 12. HTS Data Types -­‐ Combina<ons +
  • 14. Features, Features, Features • How do we “quan<fy” a chemical structure?
  • 15. Features, Features, Features Charges Dipole moments Topological invariants Surface proper<es 1 0 1 1 0 0 0 1 0
  • 16. Working with Molecules in R • A number of OSS libraries are available • ChemmineR and rcdk are the main packages that allow you to manipulate molecules in R • Uses rJava to interface with JOELib and CDK respec<vely
  • 17. rcdk • Idioma<c R interface to the CDK library – I/O support for chemical file formats – Manipula<on of atoms, bonds, molecules – Generate molecular descriptors, fingerprints library(rcdk) mol <- parse.smiles(‘CCCC’)[[1]] mols <- load.molecules(‘http://www.rguha.net/mipe100.smi’)
  • 18. rcdk • rcdk works with references to Java objects – Can’t save them in a workspace (trivially) > mol [1] "Java-Object{AtomContainer(2040919865, #A:4, Atom(2131361171, S:C, H:3, AtomType(2131361171, FC:0, Isotope(2131361171, Element(2131361171, S:C, AN:6)))), Atom(1759969037, S:C, H:2, AtomType(1759969037, FC:0, Isotope(1759969037, Element(1759969037, S:C, AN:6)))), Atom(359851081, S:C, H:2, AtomType(359851081, FC:0, Isotope(359851081, Element(359851081, S:C, AN:6)))), Atom(703168415, S:C, H:3, AtomType(703168415, FC:0, Isotope(703168415, Element(703168415, S:C, AN:6)))), #B:3, Bond(549041464, #O:SINGLE, #S:NONE, #A:2, Atom(2131361171, S:C, H:3, AtomType(2131361171, FC:0, Isotope(2131361171, Element(2131361171, S:C, AN:6)))), Atom(1759969037, S:C, H:2, AtomType(1759969037, FC:0, Isotope(1759969037, Element(1759969037, S:C, AN:6)))), ElectronContainer(549041464EC:2)), Bond(2654289, #O:SINGLE, #S:NONE, #A:2, Atom(1759969037, S:C, H:2, AtomType(1759969037, FC:0, Isotope(1759969037, Element(1759969037, S:C, AN:6)))), Atom(359851081, S:C, H:2, AtomType(359851081, FC:0, Isotope(359851081, Element(359851081, S:C, AN:6)))), ElectronContainer(2654289EC:2)), Bond(1660962283, #O:SINGLE, #S:NONE, #A:2, Atom(359851081, S:C, H:2, AtomType(359851081, FC:0, Isotope(359851081, Element(359851081, S:C, AN:6)))), Atom(703168415, S:C, H:3, AtomType(703168415, FC:0, Isotope(703168415, Element(703168415, S:C, AN:6)))), ElectronContainer(1660962283EC: 2)))}" >
  • 19. Calcula<ng Molecular Features • Evaluate a matrix of numerical features mols <- load.molecules("mipe100.smi") dnames <- get.desc.names('topological') descs <- eval.desc(mols, dnames) • End up with a rectangular data.frame > str(descs) 'data.frame': 99 obs. of 195 variables: $ nRings7 : num 1 0 1 0 0 0 0 0 0 0 ... $ nRings8 : num 0 0 0 0 0 0 0 0 0 0 ... $ nRings9 : num 0 0 0 0 0 0 0 0 0 0 ... $ tpsaEfficiency : num 0.1856 0.2035 0.0118 0.0602 ...
  • 20. Calcula<ng Fingerprints • Binary string representa<on of molecular structure – Objec<vely defined, fast to calculate – Good for searching, clustering, predic<on library(fingerprint) fps <- lapply(mols, get.fingerprint) • The fingerprint package is used to represent them as S4 objects
  • 21. Calcula<ng Fingerprints • Methods to compute similari<es, generate summaries & manipulate fingerprints > fps[[1]] Fingerprint object name = length = 1024 folded = FALSE source = CDK bits on = 15 18 45 73 77 78 79 85 87 96 107 109 129 139 149 159 162 166 172 179 194 209 214 223 225 227 239 254 266 272 301 312 327 335 350 354 359 392 393 395 397 415 435 455 486 491 492 499 534 535 541 543 544 545 546 559 575 600 605 618 621 622 626 635 638 644 645 647 690 723 728 742 743 753 754 800 819 831 832 889 893 913 922 930 936 954 985 988 1005 1008 1016 >
  • 22. Use Case -­‐ SAR • Cluster molecules by structure and examine whether clusters are enriched in ac<vity library(chemblr); library(rcdk) d <- get.activity(chembl.id='CHEMBL857155', type='assay') cmpds <- lapply(d$ingredient_cmpd_chemblid, get.compound, type='chemblid') cmpds <- do.call(rbind, lapply(cmpds, function(x) data.frame(x$chemblId, x$smiles, stringsAsFactors=FALSE))) mols <- parse.smiles(cmpds$x.smiles) fps <- lapply(mols, get.fingerprint) sm <- fp.sim.matrix(fps) rownames(sm) <- cmpds$x.chemblId dm <- as.dist(1-sm) clus <- hclust(dm)
  • 23. Use Case -­‐ SAR CHEMBL331502 CHEMBL328164 CHEMBL52551 CHEMBL331120 CHEMBL120497 CHEMBL331759 CHEMBL120547 CHEMBL324064 CHEMBL318208 CHEMBL328627 CHEMBL99803 CHEMBL317562 CHEMBL332678 CHEMBL100312 CHEMBL119963 CHEMBL334031 CHEMBL323657 CHEMBL118406 CHEMBL118162 CHEMBL120137 CHEMBL331722 CHEMBL120078 CHEMBL121953 CHEMBL331783 CHEMBL333066 CHEMBL116832 CHEMBL316512 CHEMBL318471 CHEMBL98153 CHEMBL95827 CHEMBL119932 CHEMBL99037 CHEMBL120355 CHEMBL430574 CHEMBL120941 CHEMBL299756 CHEMBL317964 CHEMBL98501 CHEMBL317150 CHEMBL120030 CHEMBL99779 CHEMBL98554 CHEMBL318911 CHEMBL97844 CHEMBL316485 CHEMBL296586 CHEMBL100309 CHEMBL98360 CHEMBL316940 CHEMBL120664 CHEMBL419054 CHEMBL119989 CHEMBL121958 CHEMBL121957 CHEMBL329505 CHEMBL121543 CHEMBL121492 CHEMBL333894 CHEMBL333006 CHEMBL50894 CHEMBL116545 CHEMBL331190 CHEMBL325403 CHEMBL99423 CHEMBL330398 CHEMBL95477 CHEMBL545053 CHEMBL329063 CHEMBL331000 CHEMBL319373 CHEMBL431634 CHEMBL325654 CHEMBL332359 CHEMBL334084 CHEMBL328194
  • 24. 1.00 0.75 0.50 0.25 0.00 0 250 500 750 Bit Position Normalized Frequency Use Case -­‐ Bit Spectrum • Vector summary of the fingerprints for a dataset • Defined as the frac<on of <mes a bit posi<on is set to 1, for each bit posi<on 0 0 1 0 1 0 1 1 1 1 0 1 0.5 0.5 0.75 ... ... ... ... ... ~ 10K molecules
  • 25. • Comparison • Simply e.g.: Compare ~ 800 solubles with > 30k insolubles 1.0 Use Case -­‐ Bit Spectrum of two datasets is now O(n) take the difference of the two bit spectra Frequency 0.5 Normalized 0.0 -0.5 Δ -1.0 Bit Position 0 50 100 150 ## make two subsets and generate bit spectra sol.idx <- which(sol$label == 'high') insol.idx <- which(sol$label != 'high') sol.bs <- bit.spectrum(fps[sol.idx]) insol.bs <- bit.spectrum(fps[insol.idx]) ## display a difference plot bsdiff <- sol.bs - insol.bs d <- data.frame(x=1:length(sol.bs), y=bsdiff) ggplot(d, aes(x=x,y=y))+geom_line()+ xlab('Bit Position')+ ylab('Normalized Frequency')+ ylim(c(-1,1))
  • 27. Building Models is the Easy Part • Given a descriptor data.frame or fingerprint list we’re ready to build models – caret, caretEnsemble • Ques<on is whether the model(s) can generalize • Applicability is a key considera<on when predic<ng bioac<vity – Has economic & safety ramifica<ons in regulatory enviroments
  • 28. Domain Applicability • How Training Set Test Set dissimilar to the training set do you have to be before the predic<on is meaningless? – Distance to training set? Inside/outside convex hull – Comparison of bit spectra
  • 29. Global vs Local Models • Bioassay data is not really big data • Can big data be too big? • AID 1996 – 57K measurements of aqueous solubility • Do we build one model? • Or mul<ple local models? PCA of 166 Binary Features
  • 31. Screening Drug Combina<ons • Increased efficacy • Delay resistance • AEenuate toxicity • Inform signaling pathway connec<vity • Iden<fy synthe<c lethality • Polypharmacology Transla'onal Interest Basic Interest
  • 32. How to Test Combina<ons • Many procedures described in the literature – Fixed dose ra<o (aka ray) – Ray contour – Checkerboard – Gene<c algorithm C5,D5 C5 C4,D4 C4 C3,D3 C3 C2,D2 C2 C1,D5 C1,D4 C1,D3 C1,D2 C1,D1 C1 D5 D4 D3 D2 D1 0
  • 33. How to Test Combina<ons • Many procedures described in the literature – Fixed dose ra<o (aka ray) – Ray contour – Checkerboard – Gene<c algorithm Vargatef DCC-2036 PD-166285 GDC-0941 PI-103 GDC-0980 Bardoxolone methyl AATT-77551199 SNS-032 NCGC00188382-01 Lestaurtinib CNF-2024 ISOX Belinostat PF-477736 AZD-7762
  • 34. • Vargatef Why Similarity? exhibited anomalous matrix response compared to other VEGFR inhibitors Vargatef Linifanib Axitinib Sorafenib Vatalanib Motesanib Tivozanib Brivanib Telatinib Cabozantinib Cediranib BMS-794833 Lenvatinib OSI-632 Foretinib Regorafenib
  • 35. When are Combina<ons Similar? • Differences and their aggregates such as RMSD can lead to degeneracy • Instead we’re interested in the shape of the surface • How to characterize shape? – Parametrized fits – Distribu<on of responses 0.010 0.005 0.000 0 25 50 75 100 0.06 0.04 0.02 0.00 0 25 50 75 100 0.15 0.10 0.05 0.00 0 50 100 D, p value
  • 36. Similarity via the Syrjala Test 10.0 7.5 5.0 2.5 0.0 0.00 0.25 0.50 0.75 D density • Syrjala test used to compare popula<on distribu<ons over a spa<al grid – Invariant to grid orienta<on – Provides an empirical p-­‐value • Less degenerate than just considering 1D distribu<ons Syrjala, S.E., “A Sta<s<cal Test for a Difference between the Spa<al Distribu<ons of Two Popula<ons”, Ecology, 1996, 77(1), 75-­‐80
  • 37. Clustering Response Surfaces 0.0 0.2 0.4 0.6 0.8 C1 (24) C3(35) C2(47) C4(24)
  • 38. Working in “Combina<on Space” • Each cell line is represented as a vector of response matrices • “Distance” between two cell lines is a func<on of the distance between component response matrices • F can be min, max, mean, … L1 L2 = d1 = d2 = d3 = d4 = d5 D L1, L2 ( ) = F({d1, d2,…, dn}) , , , , ,
  • 39. Many Choices to Make 0 1 2 3 4 KMS-34 INA-6 L363 OPM-1 XG-2 FR4 AMO-1 XG-6 MOLP-8 ANBL-6 KMS-20 XG-7 OCI-MY1 XG-1 8226 EJM U266 KMS-11LB SKMM-1 MM-MM1 sum 0.0 0.1 0.2 0.3 0.4 0.5 0.6 L363 OPM-1 XG-2 KMS-20 XG-1 XG-7 ANBL-6 OCI-MY1 U266 XG-6 INA-6 MOLP-8 AMO-1 KMS-34 KMS-11LB SKMM-1 MM-MM1 EJM FR4 8226 max 0.00 0.05 0.10 0.15 0.20 0.25 INA-6 MM-MM1 8226 XG-1 U266 ANBL-6 SKMM-1 EJM OPM-1 XG-2 OCI-MY1 KMS-20 L363 KMS-11LB AMO-1 XG-6 FR4 KMS-34 MOLP-8 XG-7 min 0.0 0.2 0.4 0.6 0.8 1.0 1.2 L363 OPM-1 XG-2 KMS-34 INA-6 KMS-11LB SKMM-1 EJM U266 MM-MM1 FR4 AMO-1 XG-6 8226 MOLP-8 ANBL-6 OCI-MY1 XG-1 KMS-20 XG-7 euc
  • 41. Networks & Integra<on • Network models of molecules, and targets are common – Allows for the incorpora<on of lots of associated informa<on – Diseases, pathways, OTE’s, • When linked with clinical data & outcomes, we can generate massive networks – Adverse events (FDA AERS) – Analysis by Cloudera considered > 10E6 drug-­‐drug-­‐ reac<on triples Yildirim, M.A. et al
  • 42. Networks & integra<on • SAR data can be viewed in a network form – SALI, SARI based networks – Usually requires pairwise calcula<ons of the metric • Current studies have focused on small datasets (< 1000 molecules) • Hadoop + Giraph could let us apply this to HTS-­‐ scale datasets Peltason, L et al hEp://sali.rguha.net/
  • 43. Networks & integra<on • When we apply a network view we can consider many interes<ng applica<ons & make use of cloud scale infrastructure – Network based similarity – Community detec<on (aka clustering) – PageRank style ranking (of targets, compounds, …) – Generate network metrics, which can be used as input to predic<ve models (for interac<ons, effects, …) Bauer-­‐Mehren et al
  • 44. Combina<ons as Networks Combina<on screens lend themselves naturally to network representa<ons ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Δ Bliss+ 0.0 −0.5 −1.0 −1.4 −1.9 −2.4 −2.9 −3.3 −3.8 −4.3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Δ Bliss+ 0.0 −0.4 −0.8 −1.2 −1.5 −1.9 −2.3 −2.7 −3.1 −3.4 immune system process apoptotic process transcription from RNA polymerase II promoter protein phosphorylation cell communication immune response
  • 45. Combina<ons as Networks • Things get more interes<ng when we have n m screens • Can be simplified using a variety of methods – Neighborhoods – Minimum ● ● ● Spanning Tree ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ×
  • 46. Comparing Neighborhoods Combina<ons that have DBSumNeg < 1st quar<le value for that strain 3D7 DD2 HB3
  • 47. Iden<fying the Most Synergis<c Pairs ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
  • 48. Summary • The HTS workflow presents mul<ple data science problems involving (unique) data types • R can play a role at several stages, but model building is straighXorward • Representa<on is key and guides the types and nature of analyses