Presentation by Ola Spjuth at the 8th Symposium on Conformal and Probabilistic Prediction with Applications, Varna, Bulgaria on 9 sept 2019.
ABSTRACT
Learning Under Privileged Information (LUPI) enables the inclusion of additional (privileged) information when training machine learning models, data that is not available when making predictions. The methodology has been successfully applied to a diverse set of problems from various fields. SVM+ was the first realization of the LUPI paradigm which showed fast convergence but did not scale well. To address the scalability issue, knowledge transfer approaches were proposed to estimate privileged information from standard features in order to construct improved decision rules. Most available knowledge transfer methods use regression techniques and the same data for approximating the privileged features as for learning the transfer function. Inspired by the cross-validation approach, we propose to partition the training data into K folds and use each fold for learning a transfer function and the remaining folds for approximations of privileged features—we refer to this as split knowledge transfer. We evaluate the method using four different experimental setups comprising one synthetic and three real datasets. The results indicate that our approach leads to improved accuracy as compared to LUPI with standard knowledge transfer.
REFERENCE:
Gauraha, N., Söderdahl, F. and Spjuth, O..
Split Knowledge Transfer in Learning Under Privileged Information Framework.
Proceedings of Machine Learning Research (PMLR). 105, 43-52. (2019).
URL: https://proceedings.mlr.press/v105/gauraha19a.html
Split Knowledge Transfer in Learning Under Privileged Information Framework
1. Split Knowledge Transfer in Learning Under
Privileged Information Framework
Niharika Gaurahaa
, Fabian S¨oderdahlb
and Ola Spjutha
Varna, Bulgaria, Sep 9, 2019
a
Department of Pharmaceutical Biosciences, Uppsala University, Sweden
b
Statisticon AB, Uppsala, Sweden 1
2. Our domain
We study life science problems, and in particular effects of drugs,
from various aspects.
Some data can be very expensive to generate, others cheaper.
2
3. Our domain
Another example: Imaging in multiple resolutions
Can we improve predictions using data from multiple resolutions at
training time? 3
4. Learning Under Privileged Information
• The classical machine learning paradigm assumes, training
examples in the form of iid pair:
(x1, y1), ..., (xl , yl ), xi ∈ X, yi ∈ {−1, +1}
• LUPI paradigm assumes, training examples in the form of iid
triplets:
(x1, x∗
1 , y1), ..., (xl , x∗
l , yl ), xi ∈ X, x∗
i ∈ X∗
, yi ∈ {−1, +1}
• x∗
i ∈ X∗ is Privileged Information (PI) or additional data
available for each training example, but not available for
testing examples.
4
5. LUPI contd...
Traning Data
in feature
space X
New Object
in feature
space X
Train Model Predictive
Rules
Prediction
Traning Data
in PI space X∗
5
6. LUPI and CP
We introduced CP in LUPI at COPA 2018 1.
We used SVM+ implementation of LUPI. Only minor
improvements in efficiency. Does not scale well for larger problems.
1
Niharika Gauraha, Lars Carlsson, Ola Spjuth. ”Conformal prediction in learning under privileged information
paradigm with applications in drug discovery”. Proceedings of the Seventh Workshop on Conformal and
Probabilistic Prediction and Applications, PMLR 91:147-156, 2018.
6
7. Knowledge Transfer LUPI
For knowledge transfer from space X∗− > X, for each privileged
feature a regression function is learned by using p decision features
in X as explanatory variables and privileged feature vectors X∗(i),
for i = 1, . . . , m, as response variables. In total m regression
functions φj (x), j = 1, . . . , m are learned.
7
8. Knowledge Transfer LUPI
We can augment the training data set as follows:
y1 x1 φ1(x1) . . . φm(x1)
. . . . . . . . . . . . . . .
yl xl φ1(xl ) . . . φm(xl )
Then, we apply some standard algorithm to this modified training
data and construct an p + m-dimensional decision rule f (x, Φ(x)),
where
Φ(x) = (φ1(x) φ2(x) . . . φm(x))
8
9. Knowledge Transfer LUPI: Contd...
Figure adapted from: Izmailov et al ”Feature Selection in Learning Using Privileged Information”. ICDM 2017
9
10. Knowledge Transfer LUPI: Contd...
Various regression techniques can be used for approximating
privileged features. For example, multiple linear Regression and
Kernelized Ridge Regression (KRR) with RBF kernel etc.
In LUPI with knowledge transfer approach the same data is used to
learn the m-regression functions Φ = φj (x), j = 1, . . . , m as for
creating the final decision rule.
10
11. Split Knowledge Transfer LUPI: Training
• Inspired by the cross-validation approach, we propose LUPI
with Split Knowledge Transfer, SKT-LUPI .
• The training set is split into K folds of equal (or almost equal)
size, and then the following two steps are repeated for
k = 1 . . . K.
step 1 We use the kth fold (Xk, X∗
k) for learning the m transfer
function Φk = (Φk1, . . . Φkm)T and the rest of the training set
is augmented with its predicted values from m regressions
¯Xk = (X−k Φk(X−k).
step 2 Using this augmented design matrix ¯Xk and the corresponding
labels, a decision rule, Fk is constructed.
11
12. Split Knowledge Transfer LUPI: Prediction
• For a new test object x ∈ X, we compute ˆx∗
1 , . . . , ˆx∗
K
regression estimates using regression functions learned
previously.
• Predictions are made for each combination (x, x∗
k ), for
k = 1, . . . , K in the m + p dimension space using the
corresponding decision rule Fk.
• We use the synergy method2 to combine the results from K
decision rules.
2
Vapnik, Vladimir, and Rauf Izmailov. ”Synergy of monotonic rules.” The
Journal of Machine Learning Research 17.1 (2016): 4722-4754.
12
13. Experiment 1: Synthetic Dataset
Table 1: Comparison of error rates (in %) on the synthetic dataset 3
between four types of classification scenarios considered: SVM on
standard features, LUPI with knowledge transfer (KT-LUPI), LUPI with
split knowledge transfer (SKT-LUPI) with 5-folds, and SVM on privileged
features. Multiple linear regression is used for knowledge transfer.
Training Size SVM KT-LUPI SKT-LUPI SVM on PI
25 9.86 7.68 6.10 3.39
35 7.26 5.42 4.80 3.19
45 7.21 4.69 4.22 2.98
55 6.88 5.04 4.84 2.51
3
Vapnik and Izmailov, ”Learning with Intelligent Teacher”, COPA 2016
Proceedings of the 5th International Symposium on Conformal and
Probabilistic Prediction with Applications, vol 9653 pp 3-19 13
14. Experiment 2: Knowledge Transfer using Linear Transformation
Table 2: Comparison of error rates (in %) on modified real datasets 4
between four types of classification scenarios considered: SVM on
standard features, LUPI with knowledge transfer (KT-LUPI), LUPI with
split knowledge transfer (SKT-LUPI) with 10-folds, and SVM on all
features (standard features and privileged features). Multiple linear
regression is used for knowledge transfer.
Dataset SVM KT-LUPI SKT-LUPI SVM on all features
Parkinsons 10.25 8.58 7.05 6.53
Ionosphere 5.49 4.92 4.29 4.85
kc2 17.28 17.09 16.99 16.80
4
Izmailov et al. ”Feature selection in learning using privileged information”. In
2017 IEEE International Conference on Data Mining Workshops, pp 957-963.
14
15. Experiment 3: Knowledge Transfer using non-Linear Transfor-
mation
Table 3: Comparison of error rates (in %) on modified real datasets 5
between four types of classification scenarios considered: SVM on
standard features, LUPI with knowledge transfer (KT-LUPI), LUPI with
split knowledge transfer (SKT-LUPI) with 10-folds, and SVM on all
features (standard features and privileged features). Kernel ridge
regression is used for knowledge transfer.
Dataset SVM KT-LUPI SKT-LUPI SVM on all features
Parkinsons 10.25 8.97 7.82 6.53
Ionosphere 5.49 5.07 4.57 4.85
kc2 17.28 17.23 16.95 16.80
5
Izmailov et al. ”Feature selection in learning using privileged information”. In
2017 IEEE International Conference on Data Mining Workshops, pp 957-963.
15
17. Conclusions
SKT-LUPI shows improved performance over KT-LUPI for the
studied datasets.
Limitations: Only a few (generated) LUPI datasets
Future work:
• SKT-LUPI with conformal prediction
• Explore feature selection / dimension reduction
• Application in image classification, compare with Deep
Learning
• Generating LUPI datasets
17
18. Acknowledgements
This project received financial support from the Swedish
Foundation for Strategic Research (SSF) as part of the HASTE
project under the call ‘Big Data and Computational Science’.
The computations were performed on resources provided by SNIC
through Uppsala Multidisciplinary Center for Advanced
Computational Science (UPPMAX) under project
SNIC 2019/8 − 15.
18