SlideShare una empresa de Scribd logo
1 de 25
Link Mining for Kernel-based
Compound-Protein Interaction Predictions
Using a Chemogenomics Approach
Masahito Ohue Takuro Yamazaki Tomohiro Ban Yutaka Akiyama
Department of Computer Science, School of Computing,
Tokyo Institute of Technology, Japan.
Thirteenth International Conference on Intelligent Computing (ICIC2017)
R13: Protein and Gene Bioinformatics: Analysis, Algorithms and Applications, Aug 9, 2017.
• Drug discovery and development
– >10 years time and >2 billion US dollars
– Possibility to reduce costs by computational approaches
• molecular docking, QSAR/QSPR modeling, toxicity prediction,
compound-protein interaction prediction
2
Background
Aug 9th, 2017 ICIC2017 Masahito Ohue
Paul SM, et al. Nat Rev Drug Discov. 2010, 9(3):203.
• Compound-Protein Interaction (CPI) Prediction (2008-)
– a.k.a. Drug-Target Interaction Prediction
– Recognize that query compound and protein are interact (1) or not (0)
by using machine learning (ML) with compounds, proteins, and
interaction information.
• Called “chemogenomics approach”
– It leads to find new targets, side effects, toxicity, etc.
3
Compound-Protein Interaction Prediction
Aug 9th, 2017 ICIC2017 Masahito Ohue
compounds proteins
Unknown interaction
(negative label, 0)
c1
c2
c3
c4
c5
c6
p1
p2
p3
p4
p5
p 1 p 2 p 3 p 4 p 5
c 1 1 1 0 0 0
c 2 0 0 1 0 0
c 3 0 1 1 0 0
c 4 0 0 1 0 0
c 5 0 0 0 0 0
c 6 0 0 0 0 0
4
CPI Prediction Problem
Aug 9th, 2017 ICIC2017 Masahito Ohue
Notation
Interaction Matrix Feature Vector of Compounds
Feature Vector of Proteins
e.g. MACCS Key,
PubChem fingerprint,
etc.
e.g. PFAM fingerprint,
amino acid k-mer, etc.
• Basic Concept
“Similar compounds/proteins have similar interactions”
a) Kernel-based Machine Learning
b) Matrix Factorization
p 1 p 2 p 3 p 4 p 5
c 1 1 1 0 0 0
c 2 0 0 1 0 0
c 3 0 1 1 0 0
c 4 0 0 1 0 0
c 5 0 0 0 0 0
c 6 0 0 0 0 0
5
Major Approaches for CPI Prediction
Aug 9th, 2017 ICIC2017 Masahito Ohue
→Kernel-based machine learning
c 1
c 2
c 3
c 4
c 5
c 6
p 1 p 2 p 3 p 4 p 5
decomposition
feature vectors
kernel functions
6
Pairwise Kernel Method (PKM)
Aug 9th, 2017 ICIC2017 Masahito Ohue
1 1 0
1 0 0 …
0 1 1
︙
compound-protein
interaction network
︙
interaction matrix
compound kernel
(similarity matrix)
protein kernel
(similarity matrix)
Learning
(SVM, etc.)
Pairwise Kernel (kernel trick)
Training
data
Pairwise Kernel Method (PKM) (Jacob & Vert. Bioinformatics 2008)
Prediction
Model
7
Gaussian Interaction Profile (GIP)
Aug 9th, 2017 ICIC2017 Masahito Ohue
1 1 0
1 0 0 …
0 1 1
︙
compound-protein
interaction network
︙
interaction matrix
compound kernel
(similarity matrix)
protein kernel
(similarity matrix)
Prediction
Model
Training
data
Gaussian Interaction Profile (GIP) (van Laarhoven, et al. Bioinformatics 2011)
Learning
(SVM, etc.)
Integration
(Multiple kernel scheme)
Similar interaction patterns
→ Have similar interactions
8
Gaussian Interaction Profile (GIP)
Aug 9th, 2017 ICIC2017 Masahito Ohue
1 1 0
1 0 0 …
0 1 1
︙
1
0
0
︙
0
0
1
︙
interaction matrix
interaction profile
compound
GIP kernel
protein
GIP kernel
GIP kernels (Gaussian kernel)
GIP method
• More accurate than using only
compound/protein similarities
Problem
= ‘0’ and ‘1’ are almost same
• All ‘0’ vectors obtained
maximum value, same as
all ‘1’ vectors .
• ‘1’ is experimentally determined,
but ‘0’ is unknown interaction.
• ‘1’-information should be
considered more reliable than ‘0’.
• Idea: ‘1’-information is more important
→ Network theory, graph mining, link mining
Data mining on world wide web, social network, biological networks, etc.
• Link indicators used in the field of link mining were
applied to CPI bipartite network.
9
Idea from Link Mining
Aug 9th, 2017 ICIC2017 Masahito Ohue
Bipartite graph (network)General graph (network)
node
link (edge)
Group A Group B
10
Link Indicators
Aug 9th, 2017 ICIC2017 Masahito Ohue
1 1 0
1 0 0 …
0 1 1
︙
0
0
1
︙
interaction matrix
1
0
0
︙
Bipartite network (CPIs)
3 link indicators were used in this study
Calculate
link indicator
Jaccard index
Cosine similarity
LHN
because these link indicators become positive definite kernels when used as kernels.
*
11
Proposed Method: Link Indicator Kernel (LIK)
Aug 9th, 2017 ICIC2017 Masahito Ohue
1 1 0
1 0 0 …
0 1 1
︙
1
0
0
︙
0
0
1
︙
interaction matrix
interaction profile
compound
Link Indicator Kernel
protein
Link Indicator Kernel
Link Indicator Kernels (LIKs)
• All ‘0’ vectors obtained
minimum value
• ‘1’-information are considered
more important than ‘0’.
*
12
CPI Prediction Method Summary
Aug 9th, 2017 ICIC2017 Masahito Ohue
1 1 0
1 0 0 …
0 1 1
︙
compound-protein
interaction network
︙
1
0
0
︙
0
0
1
︙
interaction matrix
interaction profile
compound kernel
(similarity matrix)
protein kernel
(similarity matrix)
compound kernel
(GIP/LIK)
protein kernel
(GIP/LIK)
Prediction
Model
Learning
(SVM, etc.)
• Dataset
– General benchmark dataset by Yamanishi et al.
– Contains 4 CPI networks
– Similarity matrices (similarity kernels) are precomputed
• Prediction Methods
– PKM w/similarity kernels + GIP (conventional)
– PKM w/similarity kernels + LIK (Jac/cos/LHN) (proposed)
Nuclear
Receptor
GPCR
Ion
Channel
Enzyme
#compounds 54 223 210 445
#proteins 26 95 204 664
#interactions 90 635 1476 2926
Density 6.41% 3.00% 3.45% 0.99%
13
Benchmarking
Aug 9th, 2017 ICIC2017 Masahito Ohue
(Yamanishi, et al. Bioinformatics, 2008)
http://web.kuicr.kyoto-u.ac.jp/supp/yoshi/drugtarget/
14
Evaluation (According to Ding’s benchmarking)
c1
c2
c3
c4
c5
c6
c1
c2
c3
c4
c5
c6
c1
c2
c3
c4
c5
c6
p1 p2 p3 p4 p5 p6 p1 p2 p3 p4 p5 p6 p1 p2 p3 p4 p5 p6
compound-wise CV protein-wise CV pairwise CV
1 0 0 1 0 0
0 1 0 0 1 1
1 0 0 0 0 0
1 0 1 1 0 0
? ? ? ? ? ?
? ? ? ? ? ?
1 0 0 1 ? ?
0 1 0 0 ? ?
1 0 0 0 ? ?
1 0 1 1 ? ?
0 1 0 0 ? ?
0 0 1 1 ? ?
? 0 ? 1 0 ?
0 ? 0 ? 1 1
1 0 0 0 0 ?
1 0 ? ? 0 0
? 1 0 0 0 1
? 0 ? 1 ? 0
• 3 Types of Cross-Validations (CVs)
• AUROC and AUPR
Precision
Recall
TP rate
FP rate
AUPR
AUROC
Perfect prediction
→AUROC = 1
Random prediction
→AUROC = 0.5
(diagonal line)
Perfect prediction
→AUPR = 1
Random prediction
→AUPR = density
(avg. AUPR≒0.035)
* 10-fold CV was randomly tried 5 times and the accuracy (AUROC, AUPR) were averaged.
(Ding, et al. Brief Bioinform 2014)
CPIs have much fewer positives than negatives, and FPs should be weighed more.
AUPR punishes FPs more than AUROC. →AUPR is more important than AUROC
15
Prediction Accuracy (Cross-Validations)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
compound-wise CV protein-wise CV pairwise CV
AUPR
Jaccard index cosine similarity LHN GIP random
0.4
0.5
0.6
0.7
0.8
0.9
1
compound-wise CV protein-wise CV pairwise CV
AUROC
AUPR (averaged the 4 datasets)
AUROC (averaged the 4 datasets)
Aug 9th, 2017 ICIC2017 Masahito Ohue
good
good
→LIKs are better than GIP, especially
compound-wise and protein-wise CVs.
LIK
LIK
LIK
LIK LIK
LIK
• Computational complexity
– #compound= , #protein=
– PKM Learning:
– Calc. LIKs:
– Total:
• Experimental computational time
16
Computational Time
Aug 9th, 2017 ICIC2017 Masahito Ohue
Nuclear
Receptor
GPCR
Ion
Channel
Enzyme
Conventional (PKM) [sec] 0.0680 4.86 24.1 232
Proposed (LIK) [sec] 0.0850 5.17 24.8 239
Increase rate (%) 25% 6.4% 2.9% 3.3%
Almost same calculation time as PKM
small large
dataset size
• We proposed Link Indicator Kernel (LIK) for CPI predictions.
• Compared with GIP, the calculation time was the same and
the accuracy was improved.
– Especially, predictions for novel compound, novel protein were good.
– Overall, LIK with cosine similarity was the most accurate.
– The difference between LIK’s 0 and 1 handling may be successful.
• LIK can also be applied to the derivation of GIP such as
WNNGIP, KronRLS-MKL, etc.
• Future Work
– Hyperparameter search becomes a bottleneck in the CPI problem,
but it can be accelerated with application of Bayesian Optimization*.
– It may be better to treat unknown interaction as unknown label.
Exploring the applicability of positive-unlabeled learning** is interest.
17
Conclusion
Aug 9th, 2017 ICIC2017 Masahito Ohue
**Lan, et al. Predicting drug-target interaction using
positive-unlabeled learning. Neurocomputing 206, 2016.
*Ban, Ohue, Akiyama. (submitted)
Acknowledgements
18
This work was partially supported by the Japan Society for the Promotion of Science (JSPS)
KAKENHI (grant nos. 24240044 and 15K16081), and Core Research for Evolutional Science and
Technology (CREST) “Extreme Big Data” (grant no. JPMJCR1303) from the Japan Science and
Technology Agency (JST).
Akiyama Lab. members Tokyo Tech
Takuro Yamazaki
Former student of our lab.
Currently he is a graduate student of the University of Tokyo, Japan.
Aug 9th, 2017 ICIC2017 Masahito Ohue
Supplements
• Proof: Cosine similarity & LHN
• Proof: Jaccard index
– Previously proved to be positive definite by Bouchard et al.
20
LIKs are Positive Definite Kernels
Aug 9th, 2017 ICIC2017 Masahito Ohue
Use the property of positive definite kernel:
Let be a positive definite kernel and be an arbitrary function.
Then, the kernel is also positive definite.
Inner product is positive definite.
Bouchard, et al. A proof for the positive definiteness of the Jaccard index matrix.
Int. J. Approx. Reason. 54: 615-626, 2013.
• Normally, a vector for a pair of compounds
and proteins is required for ML scheme.
• In the PKM, is defined as the tensor product
of the map of compound and protein .
• Pairwise kernel is defined between two pairs of
proteins and compounds and as
21
Pairwise Kernel
Aug 9th, 2017 ICIC2017 Masahito Ohue
Use only similarity matrices (kernels), do not use feature vector of .
22
Observed Distribution of Link Indicator Frequency
Aug 9th, 2017 ICIC2017 Masahito Ohue
moderate distribution
immoderate distribution
23
Overall Prediction Accuracy
Aug 9th, 2017 ICIC2017 Masahito Ohue
Overall prediction accuracy for each CPI prediction method in 10-fold CV
tests. The AUPR and AUROC values are averaged values of 3 CVs and 4
datasets (total average for 12 AUPR/AUROC values).
wk : multiple kernel weight
• Cosine similarity with wk=0.5 showed the best performance.
• Compared with GIP, the accuracy of LIK showed higher accuracy overall.
• SVM
– Cost parameter C = {0.1, 1, 10, 100}
– Multiple kernel weight wk = {0.1, 0.3, 0.5, 1}
• Cross validation (CV)
– 3 types; Compound-wise, Protein-wise, Pairwise
– 10-fold CVs
– Division of the dataset was randomly tried 5 times
– AUROC and AUPR
24
Settings for Learning
Aug 9th, 2017 ICIC2017 Masahito Ohue
25
History of the CPI Prediction Methods
Aug 9th, 2017 ICIC2017 Masahito Ohue
2008 2017 year
KRM (Yamanishi et al., 2008)
PKM (Jacob & Vert, 2008)
BLM (Bleakley et al., 2008)
LapRLS (Xia et al., 2010)
GIP (van Laarhoven et al., 2011)
KBMF2K (Gonen et al., 2012)
WNNGIP (van Laarhoven et al., 2013)
BLMNII (Mei et al., 2013)
MSCMF (Zheng et al., 2013)
REMAP (Lim et al., 2016)
Kernel-based
Matrix Factorization-based
KronRLS-MKL (Nascimento, et al., 2016)
NRLMF (Liu et al., 2016)

Más contenido relacionado

Similar a Link Mining for Kernel-based Compound-Protein Interaction Predictions Using a Chemogenomics Approach

BioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataBioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadata
Philip Cheung
 
Knowledge extraction and visualisation using rule-based machine learning
Knowledge extraction and visualisation using rule-based machine learningKnowledge extraction and visualisation using rule-based machine learning
Knowledge extraction and visualisation using rule-based machine learning
jaumebp
 
32_Nov07_MachineLear..
32_Nov07_MachineLear..32_Nov07_MachineLear..
32_Nov07_MachineLear..
butest
 

Similar a Link Mining for Kernel-based Compound-Protein Interaction Predictions Using a Chemogenomics Approach (20)

Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...
 
Article
ArticleArticle
Article
 
BioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataBioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadata
 
In Silico Approaches for Predicting Hazards from Chemical Structure and Exist...
In Silico Approaches for Predicting Hazards from Chemical Structure and Exist...In Silico Approaches for Predicting Hazards from Chemical Structure and Exist...
In Silico Approaches for Predicting Hazards from Chemical Structure and Exist...
 
Path2 ppi
Path2 ppiPath2 ppi
Path2 ppi
 
Cadd and molecular modeling for M.Pharm
Cadd and molecular modeling for M.PharmCadd and molecular modeling for M.Pharm
Cadd and molecular modeling for M.Pharm
 
Protein structure prediction by means
Protein structure prediction by meansProtein structure prediction by means
Protein structure prediction by means
 
Proteomics resources at the EBI & ExPASy
Proteomics resources at the EBI & ExPASyProteomics resources at the EBI & ExPASy
Proteomics resources at the EBI & ExPASy
 
PPT
PPTPPT
PPT
 
Resume
ResumeResume
Resume
 
Knowledge extraction and visualisation using rule-based machine learning
Knowledge extraction and visualisation using rule-based machine learningKnowledge extraction and visualisation using rule-based machine learning
Knowledge extraction and visualisation using rule-based machine learning
 
BCSRCv1.3
BCSRCv1.3BCSRCv1.3
BCSRCv1.3
 
ICAR 2015 Workshop - Nick Provart
ICAR 2015 Workshop - Nick ProvartICAR 2015 Workshop - Nick Provart
ICAR 2015 Workshop - Nick Provart
 
Bioinformatics MiRON
Bioinformatics MiRONBioinformatics MiRON
Bioinformatics MiRON
 
C044041723
C044041723C044041723
C044041723
 
32_Nov07_MachineLear..
32_Nov07_MachineLear..32_Nov07_MachineLear..
32_Nov07_MachineLear..
 
MULISA : A New Strategy for Discovery of Protein Functional Motifs and Residues
MULISA : A New Strategy for Discovery of Protein Functional Motifs and ResiduesMULISA : A New Strategy for Discovery of Protein Functional Motifs and Residues
MULISA : A New Strategy for Discovery of Protein Functional Motifs and Residues
 
Intro to databases
Intro to databasesIntro to databases
Intro to databases
 
BioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomicsBioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomics
 
OpenMS: Quantitative proteomics at large scale
OpenMS: Quantitative proteomics at large scaleOpenMS: Quantitative proteomics at large scale
OpenMS: Quantitative proteomics at large scale
 

Más de Masahito Ohue

Más de Masahito Ohue (20)

学振特別研究員になるために~2024年度申請版
 学振特別研究員になるために~2024年度申請版 学振特別研究員になるために~2024年度申請版
学振特別研究員になるために~2024年度申請版
 
学振特別研究員になるために~2023年度申請版
学振特別研究員になるために~2023年度申請版学振特別研究員になるために~2023年度申請版
学振特別研究員になるために~2023年度申請版
 
学振特別研究員になるために~2022年度申請版
学振特別研究員になるために~2022年度申請版学振特別研究員になるために~2022年度申請版
学振特別研究員になるために~2022年度申請版
 
第43回分子生物学会年会フォーラム2F-11「インシリコ創薬を支える最先端情報科学」から抜粋したAlphaFold2の話
第43回分子生物学会年会フォーラム2F-11「インシリコ創薬を支える最先端情報科学」から抜粋したAlphaFold2の話第43回分子生物学会年会フォーラム2F-11「インシリコ創薬を支える最先端情報科学」から抜粋したAlphaFold2の話
第43回分子生物学会年会フォーラム2F-11「インシリコ創薬を支える最先端情報科学」から抜粋したAlphaFold2の話
 
Learning-to-rank for ligand-based virtual screening
Learning-to-rank for ligand-based virtual screeningLearning-to-rank for ligand-based virtual screening
Learning-to-rank for ligand-based virtual screening
 
Parallelized pipeline for whole genome shotgun metagenomics with GHOSTZ-GPU a...
Parallelized pipeline for whole genome shotgun metagenomics with GHOSTZ-GPU a...Parallelized pipeline for whole genome shotgun metagenomics with GHOSTZ-GPU a...
Parallelized pipeline for whole genome shotgun metagenomics with GHOSTZ-GPU a...
 
Molecular Activity Prediction Using Graph Convolutional Deep Neural Network C...
Molecular Activity Prediction Using Graph Convolutional Deep Neural Network C...Molecular Activity Prediction Using Graph Convolutional Deep Neural Network C...
Molecular Activity Prediction Using Graph Convolutional Deep Neural Network C...
 
学振特別研究員になるために~2020年度申請版
学振特別研究員になるために~2020年度申請版学振特別研究員になるために~2020年度申請版
学振特別研究員になるために~2020年度申請版
 
出会い系タンパク質を探す旅
出会い系タンパク質を探す旅出会い系タンパク質を探す旅
出会い系タンパク質を探す旅
 
学振特別研究員になるために~2019年度申請版
学振特別研究員になるために~2019年度申請版学振特別研究員になるために~2019年度申請版
学振特別研究員になるために~2019年度申請版
 
目バーチャルスクリーニング
目バーチャルスクリーニング目バーチャルスクリーニング
目バーチャルスクリーニング
 
Microsoft Azure上でのタンパク質間相互作用予測システムの並列計算と性能評価
Microsoft Azure上でのタンパク質間相互作用予測システムの並列計算と性能評価Microsoft Azure上でのタンパク質間相互作用予測システムの並列計算と性能評価
Microsoft Azure上でのタンパク質間相互作用予測システムの並列計算と性能評価
 
学振特別研究員になるために~2018年度申請版
学振特別研究員になるために~2018年度申請版学振特別研究員になるために~2018年度申請版
学振特別研究員になるために~2018年度申請版
 
計算で明らかにするタンパク質の出会いとネットワーク(FIT2016 助教が吼えるセッション)
計算で明らかにするタンパク質の出会いとネットワーク(FIT2016 助教が吼えるセッション)計算で明らかにするタンパク質の出会いとネットワーク(FIT2016 助教が吼えるセッション)
計算で明らかにするタンパク質の出会いとネットワーク(FIT2016 助教が吼えるセッション)
 
Finding correct protein–protein docking models using ProQDock (ISMB2016読み会, 大上)
Finding correct protein–protein docking models using ProQDock (ISMB2016読み会, 大上)Finding correct protein–protein docking models using ProQDock (ISMB2016読み会, 大上)
Finding correct protein–protein docking models using ProQDock (ISMB2016読み会, 大上)
 
学振特別研究員になるために~知っておくべき10のTips~[平成29年度申請版]
学振特別研究員になるために~知っておくべき10のTips~[平成29年度申請版]学振特別研究員になるために~知っておくべき10のTips~[平成29年度申請版]
学振特別研究員になるために~知っておくべき10のTips~[平成29年度申請版]
 
ISMB/ECCB2015読み会:大上
ISMB/ECCB2015読み会:大上ISMB/ECCB2015読み会:大上
ISMB/ECCB2015読み会:大上
 
学振特別研究員になるために~知っておくべき10のTips~[平成28年度申請版]
学振特別研究員になるために~知っておくべき10のTips~[平成28年度申請版]学振特別研究員になるために~知っておくべき10のTips~[平成28年度申請版]
学振特別研究員になるために~知っておくべき10のTips~[平成28年度申請版]
 
IIBMP2014 Lightning Talk - MEGADOCK 4.0
IIBMP2014 Lightning Talk - MEGADOCK 4.0IIBMP2014 Lightning Talk - MEGADOCK 4.0
IIBMP2014 Lightning Talk - MEGADOCK 4.0
 
PrePPI: structure-based protein-protein interaction prediction
PrePPI: structure-based protein-protein interaction predictionPrePPI: structure-based protein-protein interaction prediction
PrePPI: structure-based protein-protein interaction prediction
 

Último

Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
gajnagarg
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
HyderabadDolls
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 

Último (20)

Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 

Link Mining for Kernel-based Compound-Protein Interaction Predictions Using a Chemogenomics Approach

  • 1. Link Mining for Kernel-based Compound-Protein Interaction Predictions Using a Chemogenomics Approach Masahito Ohue Takuro Yamazaki Tomohiro Ban Yutaka Akiyama Department of Computer Science, School of Computing, Tokyo Institute of Technology, Japan. Thirteenth International Conference on Intelligent Computing (ICIC2017) R13: Protein and Gene Bioinformatics: Analysis, Algorithms and Applications, Aug 9, 2017.
  • 2. • Drug discovery and development – >10 years time and >2 billion US dollars – Possibility to reduce costs by computational approaches • molecular docking, QSAR/QSPR modeling, toxicity prediction, compound-protein interaction prediction 2 Background Aug 9th, 2017 ICIC2017 Masahito Ohue Paul SM, et al. Nat Rev Drug Discov. 2010, 9(3):203.
  • 3. • Compound-Protein Interaction (CPI) Prediction (2008-) – a.k.a. Drug-Target Interaction Prediction – Recognize that query compound and protein are interact (1) or not (0) by using machine learning (ML) with compounds, proteins, and interaction information. • Called “chemogenomics approach” – It leads to find new targets, side effects, toxicity, etc. 3 Compound-Protein Interaction Prediction Aug 9th, 2017 ICIC2017 Masahito Ohue compounds proteins Unknown interaction (negative label, 0) c1 c2 c3 c4 c5 c6 p1 p2 p3 p4 p5
  • 4. p 1 p 2 p 3 p 4 p 5 c 1 1 1 0 0 0 c 2 0 0 1 0 0 c 3 0 1 1 0 0 c 4 0 0 1 0 0 c 5 0 0 0 0 0 c 6 0 0 0 0 0 4 CPI Prediction Problem Aug 9th, 2017 ICIC2017 Masahito Ohue Notation Interaction Matrix Feature Vector of Compounds Feature Vector of Proteins e.g. MACCS Key, PubChem fingerprint, etc. e.g. PFAM fingerprint, amino acid k-mer, etc.
  • 5. • Basic Concept “Similar compounds/proteins have similar interactions” a) Kernel-based Machine Learning b) Matrix Factorization p 1 p 2 p 3 p 4 p 5 c 1 1 1 0 0 0 c 2 0 0 1 0 0 c 3 0 1 1 0 0 c 4 0 0 1 0 0 c 5 0 0 0 0 0 c 6 0 0 0 0 0 5 Major Approaches for CPI Prediction Aug 9th, 2017 ICIC2017 Masahito Ohue →Kernel-based machine learning c 1 c 2 c 3 c 4 c 5 c 6 p 1 p 2 p 3 p 4 p 5 decomposition feature vectors kernel functions
  • 6. 6 Pairwise Kernel Method (PKM) Aug 9th, 2017 ICIC2017 Masahito Ohue 1 1 0 1 0 0 … 0 1 1 ︙ compound-protein interaction network ︙ interaction matrix compound kernel (similarity matrix) protein kernel (similarity matrix) Learning (SVM, etc.) Pairwise Kernel (kernel trick) Training data Pairwise Kernel Method (PKM) (Jacob & Vert. Bioinformatics 2008) Prediction Model
  • 7. 7 Gaussian Interaction Profile (GIP) Aug 9th, 2017 ICIC2017 Masahito Ohue 1 1 0 1 0 0 … 0 1 1 ︙ compound-protein interaction network ︙ interaction matrix compound kernel (similarity matrix) protein kernel (similarity matrix) Prediction Model Training data Gaussian Interaction Profile (GIP) (van Laarhoven, et al. Bioinformatics 2011) Learning (SVM, etc.) Integration (Multiple kernel scheme) Similar interaction patterns → Have similar interactions
  • 8. 8 Gaussian Interaction Profile (GIP) Aug 9th, 2017 ICIC2017 Masahito Ohue 1 1 0 1 0 0 … 0 1 1 ︙ 1 0 0 ︙ 0 0 1 ︙ interaction matrix interaction profile compound GIP kernel protein GIP kernel GIP kernels (Gaussian kernel) GIP method • More accurate than using only compound/protein similarities Problem = ‘0’ and ‘1’ are almost same • All ‘0’ vectors obtained maximum value, same as all ‘1’ vectors . • ‘1’ is experimentally determined, but ‘0’ is unknown interaction. • ‘1’-information should be considered more reliable than ‘0’.
  • 9. • Idea: ‘1’-information is more important → Network theory, graph mining, link mining Data mining on world wide web, social network, biological networks, etc. • Link indicators used in the field of link mining were applied to CPI bipartite network. 9 Idea from Link Mining Aug 9th, 2017 ICIC2017 Masahito Ohue Bipartite graph (network)General graph (network) node link (edge) Group A Group B
  • 10. 10 Link Indicators Aug 9th, 2017 ICIC2017 Masahito Ohue 1 1 0 1 0 0 … 0 1 1 ︙ 0 0 1 ︙ interaction matrix 1 0 0 ︙ Bipartite network (CPIs) 3 link indicators were used in this study Calculate link indicator Jaccard index Cosine similarity LHN because these link indicators become positive definite kernels when used as kernels. *
  • 11. 11 Proposed Method: Link Indicator Kernel (LIK) Aug 9th, 2017 ICIC2017 Masahito Ohue 1 1 0 1 0 0 … 0 1 1 ︙ 1 0 0 ︙ 0 0 1 ︙ interaction matrix interaction profile compound Link Indicator Kernel protein Link Indicator Kernel Link Indicator Kernels (LIKs) • All ‘0’ vectors obtained minimum value • ‘1’-information are considered more important than ‘0’. *
  • 12. 12 CPI Prediction Method Summary Aug 9th, 2017 ICIC2017 Masahito Ohue 1 1 0 1 0 0 … 0 1 1 ︙ compound-protein interaction network ︙ 1 0 0 ︙ 0 0 1 ︙ interaction matrix interaction profile compound kernel (similarity matrix) protein kernel (similarity matrix) compound kernel (GIP/LIK) protein kernel (GIP/LIK) Prediction Model Learning (SVM, etc.)
  • 13. • Dataset – General benchmark dataset by Yamanishi et al. – Contains 4 CPI networks – Similarity matrices (similarity kernels) are precomputed • Prediction Methods – PKM w/similarity kernels + GIP (conventional) – PKM w/similarity kernels + LIK (Jac/cos/LHN) (proposed) Nuclear Receptor GPCR Ion Channel Enzyme #compounds 54 223 210 445 #proteins 26 95 204 664 #interactions 90 635 1476 2926 Density 6.41% 3.00% 3.45% 0.99% 13 Benchmarking Aug 9th, 2017 ICIC2017 Masahito Ohue (Yamanishi, et al. Bioinformatics, 2008) http://web.kuicr.kyoto-u.ac.jp/supp/yoshi/drugtarget/
  • 14. 14 Evaluation (According to Ding’s benchmarking) c1 c2 c3 c4 c5 c6 c1 c2 c3 c4 c5 c6 c1 c2 c3 c4 c5 c6 p1 p2 p3 p4 p5 p6 p1 p2 p3 p4 p5 p6 p1 p2 p3 p4 p5 p6 compound-wise CV protein-wise CV pairwise CV 1 0 0 1 0 0 0 1 0 0 1 1 1 0 0 0 0 0 1 0 1 1 0 0 ? ? ? ? ? ? ? ? ? ? ? ? 1 0 0 1 ? ? 0 1 0 0 ? ? 1 0 0 0 ? ? 1 0 1 1 ? ? 0 1 0 0 ? ? 0 0 1 1 ? ? ? 0 ? 1 0 ? 0 ? 0 ? 1 1 1 0 0 0 0 ? 1 0 ? ? 0 0 ? 1 0 0 0 1 ? 0 ? 1 ? 0 • 3 Types of Cross-Validations (CVs) • AUROC and AUPR Precision Recall TP rate FP rate AUPR AUROC Perfect prediction →AUROC = 1 Random prediction →AUROC = 0.5 (diagonal line) Perfect prediction →AUPR = 1 Random prediction →AUPR = density (avg. AUPR≒0.035) * 10-fold CV was randomly tried 5 times and the accuracy (AUROC, AUPR) were averaged. (Ding, et al. Brief Bioinform 2014) CPIs have much fewer positives than negatives, and FPs should be weighed more. AUPR punishes FPs more than AUROC. →AUPR is more important than AUROC
  • 15. 15 Prediction Accuracy (Cross-Validations) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 compound-wise CV protein-wise CV pairwise CV AUPR Jaccard index cosine similarity LHN GIP random 0.4 0.5 0.6 0.7 0.8 0.9 1 compound-wise CV protein-wise CV pairwise CV AUROC AUPR (averaged the 4 datasets) AUROC (averaged the 4 datasets) Aug 9th, 2017 ICIC2017 Masahito Ohue good good →LIKs are better than GIP, especially compound-wise and protein-wise CVs. LIK LIK LIK LIK LIK LIK
  • 16. • Computational complexity – #compound= , #protein= – PKM Learning: – Calc. LIKs: – Total: • Experimental computational time 16 Computational Time Aug 9th, 2017 ICIC2017 Masahito Ohue Nuclear Receptor GPCR Ion Channel Enzyme Conventional (PKM) [sec] 0.0680 4.86 24.1 232 Proposed (LIK) [sec] 0.0850 5.17 24.8 239 Increase rate (%) 25% 6.4% 2.9% 3.3% Almost same calculation time as PKM small large dataset size
  • 17. • We proposed Link Indicator Kernel (LIK) for CPI predictions. • Compared with GIP, the calculation time was the same and the accuracy was improved. – Especially, predictions for novel compound, novel protein were good. – Overall, LIK with cosine similarity was the most accurate. – The difference between LIK’s 0 and 1 handling may be successful. • LIK can also be applied to the derivation of GIP such as WNNGIP, KronRLS-MKL, etc. • Future Work – Hyperparameter search becomes a bottleneck in the CPI problem, but it can be accelerated with application of Bayesian Optimization*. – It may be better to treat unknown interaction as unknown label. Exploring the applicability of positive-unlabeled learning** is interest. 17 Conclusion Aug 9th, 2017 ICIC2017 Masahito Ohue **Lan, et al. Predicting drug-target interaction using positive-unlabeled learning. Neurocomputing 206, 2016. *Ban, Ohue, Akiyama. (submitted)
  • 18. Acknowledgements 18 This work was partially supported by the Japan Society for the Promotion of Science (JSPS) KAKENHI (grant nos. 24240044 and 15K16081), and Core Research for Evolutional Science and Technology (CREST) “Extreme Big Data” (grant no. JPMJCR1303) from the Japan Science and Technology Agency (JST). Akiyama Lab. members Tokyo Tech Takuro Yamazaki Former student of our lab. Currently he is a graduate student of the University of Tokyo, Japan. Aug 9th, 2017 ICIC2017 Masahito Ohue
  • 20. • Proof: Cosine similarity & LHN • Proof: Jaccard index – Previously proved to be positive definite by Bouchard et al. 20 LIKs are Positive Definite Kernels Aug 9th, 2017 ICIC2017 Masahito Ohue Use the property of positive definite kernel: Let be a positive definite kernel and be an arbitrary function. Then, the kernel is also positive definite. Inner product is positive definite. Bouchard, et al. A proof for the positive definiteness of the Jaccard index matrix. Int. J. Approx. Reason. 54: 615-626, 2013.
  • 21. • Normally, a vector for a pair of compounds and proteins is required for ML scheme. • In the PKM, is defined as the tensor product of the map of compound and protein . • Pairwise kernel is defined between two pairs of proteins and compounds and as 21 Pairwise Kernel Aug 9th, 2017 ICIC2017 Masahito Ohue Use only similarity matrices (kernels), do not use feature vector of .
  • 22. 22 Observed Distribution of Link Indicator Frequency Aug 9th, 2017 ICIC2017 Masahito Ohue moderate distribution immoderate distribution
  • 23. 23 Overall Prediction Accuracy Aug 9th, 2017 ICIC2017 Masahito Ohue Overall prediction accuracy for each CPI prediction method in 10-fold CV tests. The AUPR and AUROC values are averaged values of 3 CVs and 4 datasets (total average for 12 AUPR/AUROC values). wk : multiple kernel weight • Cosine similarity with wk=0.5 showed the best performance. • Compared with GIP, the accuracy of LIK showed higher accuracy overall.
  • 24. • SVM – Cost parameter C = {0.1, 1, 10, 100} – Multiple kernel weight wk = {0.1, 0.3, 0.5, 1} • Cross validation (CV) – 3 types; Compound-wise, Protein-wise, Pairwise – 10-fold CVs – Division of the dataset was randomly tried 5 times – AUROC and AUPR 24 Settings for Learning Aug 9th, 2017 ICIC2017 Masahito Ohue
  • 25. 25 History of the CPI Prediction Methods Aug 9th, 2017 ICIC2017 Masahito Ohue 2008 2017 year KRM (Yamanishi et al., 2008) PKM (Jacob & Vert, 2008) BLM (Bleakley et al., 2008) LapRLS (Xia et al., 2010) GIP (van Laarhoven et al., 2011) KBMF2K (Gonen et al., 2012) WNNGIP (van Laarhoven et al., 2013) BLMNII (Mei et al., 2013) MSCMF (Zheng et al., 2013) REMAP (Lim et al., 2016) Kernel-based Matrix Factorization-based KronRLS-MKL (Nascimento, et al., 2016) NRLMF (Liu et al., 2016)

Notas del editor

  1. ※まず左と上にに枠を伸ばす。左下のカーソルをレーザーポインターに設定する。 Thank you for introduction professor Gromiha. My name is Masahito Ohue from Tokyo Tech Japan. Today, I will talk about “Link Mining for Kernel-based Compound-Protein Interaction Predictions Using a Chemogenomics Approach”.
  2. This study is related drug discovery. The costs of drug discovery and development are increasing recently. It requires over 10 years and over 2 billion dollars. The computational approaches for supporting drug discovery are expected to reduce costs. For example, molecular docking, QSAR modeling, toxicity prediction, and compound-protein interaction prediction, it is the main topic on my talk.
  3. The compound-protein interaction prediction, CPI prediction, is discrimination the query compound and protein are interact or not, by using machine learning techniques. It is called chemogenomics approach, because of using information of multiple compounds and multiple proteins. Interact is represented as 1, this is positive label, and unknown interaction is represented as 0, this is negative label.
  4. And the CPIs are represented as the binary interaction matrix Y, 1 is interact, and 0 is unknown interaction. The feature vectors of compounds and proteins are calculated by several ways, for example, compound fingerprint, protein domain information, and amino acid sequence information. These feature vectors and interaction matrix used in machine learning scheme.
  5. To predict the compound-protein interactions, there is a basic concept, similar compounds or similar proteins have similar interactions. Thus mainly machine learning approaches are used in the problem. There are two major approaches, one is kernel-based machine learning method, another one is matrix factorization-based method. In this talk, we focus the kernel-based machine learning methods.
  6. One of the excellent methods is pairwise kernel method, PKM. PKM uses the interaction matrix as training data and kernel matrices between compounds and proteins. Then, kernel-based learning using support vector machine or other kernel methods is done, and prediction model of compound-protein pair is generated. A keypoint is that PKM does not calculate kernel of compound-protein pair directly. It is defined by multiplying the compound kernel and protein kernel. By doing this, calculations in the high-dimensional space are avoided.
  7. And the other one is Gaussian interaction profile method, GIP. GIP is roughly the same scheme as PKM, except that it also uses the information of the interaction matrix. The idea is that if the compounds or proteins have similar interaction patterns, then they have similar interactions.
  8. GIP uses the interaction matrix as interaction profile vectors, and calculates the interaction profile-based kernel matrices using Gaussian kernels. GIP can more accurate prediction than using only compound and protein similarities. However, there is one problem that the Gaussian kernel treats the difference between 0s and the difference between 1s in the same way. For example, the kernel function k_GIP obtains the maximum value of 1 by all 0 vectors and all 1 vectors. ‘1’ means experimentally determined interaction but ‘0’ means interaction is unknown, so ‘1’ information should be considered more reliable than ‘0’.
  9. The idea to solve this problem exists in the field of link mining which is data mining on world wide web, social network, biological networks, and so on. Link mining predicts the possibility of a new link on the network, that is, focuses on ‘1’. In this study, link indicators used in link mining were applied to CPI bipartite network.
  10. Link indicators are calculated using interaction matrix. This study, 3 link indicators were used, Jaccard index, cosine similarity, and LHN. These link indicators become positive definite kernels, thus they can be applied kernel-based machine learning method.
  11. This slide shows our proposed method, called LIK. It is almost same as GIP, but the kernel matrices calculated by link indicators with interaction profiles. The kernel function k_LIK obtains the maximum value of 1 by same vectors. If 0-vectors are inputted then the kernel value is 0. LIK makes ‘1‘-information more important than ‘0’.
  12. This is the whole picture of the proposed method and conventional method. We used LIK, link indicator kernels instead of GIPs.
  13. In order to evaluate the prediction method, we used the general benchmark dataset created by Yamanishi. This dataset contains four CPI networks, nuclear receptor, GPCR, ion channel, and enzyme. Similarity matrices are precomputed by Yamanishi, and available at this website. We used GIP and three kinds of LIK methods for the computational experiment.
  14. The evaluation of the prediction methods were done by cross-validations. According to the previous benchmarking work, we used compound-wise, protein-wise and pairwise cross-validations. Compound-wise CV can verify whether the interaction of novel compound can be predicted. Similarly, protein-wise CV verifies the novel protein interactions. We used two types of accuracy, AUROC and AUPR. However, CPIs have much fewer positives than negatives, and FPs should be weighed more. AUPR punishes FPs more than AUROC. So AUPR is more important in CPI prediction problem.
  15. This figure shows the results of prediction accuracy. In AUROC, we can see that LIKs are comparable to GIP. On the other hand, in AUPR, LIKs are obtained better prediction than GIP. Especially, LIK works well with compound-wise and protein-wise cross-validations.
  16. This is the results of computational time. The computational complexity of PKM is order nc cubed times np cubed, and LIKs are order nc np times nc plus np. Thus the total computational complexity is the same as PKM. On the other hand, the actual calculation time depends on the size of the dataset. But the difference with PKM was kept within a short time.
  17. (時間が無い時→This slide is concluding remarks in my talk. で次。) OK, let me summarize my talk. We proposed link indicator kernel LIK for compound-protein interaction predictions. Our proposed method compared with GIP, and the calculation time was almost same and the prediction accuracy was improved. The difference between 0 and 1 handling on link mining may be successful. In future work, hyperparameter search becomes a bottle neck in the CPI prediction problem. It can be accelerated with application of Bayesian optimization technique which is our ongoing work. It may be better to treat unknown interaction as unknown label. Exploring the applicability of positive-unlabeled learning technique is interest work.
  18. OK, that’s all. Thank you very much for your attention.