We adapt the idea of random projections applied to the out- put space, so as to enhance tree-based ensemble methods in the context of multi-label classification. We show how learning time complexity can be reduced without affecting computational complexity and accuracy of predictions. We also show that random output space projections may be used in order to reach different bias-variance tradeoffs, over a broad panel of benchmark problems, and that this may lead to improved accuracy while reducing significantly the computational burden of the learning stage.
Link to the paper http://orbi.ulg.ac.be/handle/2268/172146
Souce code available at https://github.com/arjoly/random-output-trees
Random forests with random projections of the output space for high dimensional multi-label classification
1. Random forests with random projections of the
output space for high dimensional multi-label
classification
Arnaud Joly, Pierre Geurts, Louis Wehenkel
2. Multi-label classification tasks
Many supervised learning applications in text, biology or image
processing where samples are associated to sets of labels.
Input X 800 600 pixel Output Y labels
driver, mountain, road,
car, tree, rock, line,
human, . . .
If each label corresponds
to a wikipedia article, then
we have around 4 million
labels.
2 / 15
3. Random forest
Randomized trees are built on a bootstrap copy of the input-output
pairs ((xi; yi) 2 (X Y))ni
=1 by recursively maximizing the
reduction of impurity, here the variance Var. At each node, the
best split is selected among k randomly selected features.
S
SL
Xk tk
Xk tk
SR
Var(S) = 0:24
Var(SL) = 0:014
Var(SR) = 0:1875
Var(S) = Var(S)
12
20
Var(SL)
8
20
Var(tR)
0:16
3 / 15
4. When Y is very high dimensional, this constitutes the main
bottleneck of the random tree ensemble.
The multi-output single tree algorithm requires the computation of
the sum of the variance over the label space at each tree node and
for each candidate split.
4 / 15
5. Multi-output regression trees in randomly projected output
space
We propose to approximate the computation of the variance by
using random projection of the output space.
5 / 15
6. Multi-output regression trees in randomly projected output
space
We propose to approximate the computation of the variance by
using random projection of the output space.
Theorem
Given 0, a sample (yi)ni
=1 of n points y 2 Rd, and a projection
matrix 2 Rmd such that for all pairs of points the
Jonhson-Lindenstrauss lemma holds, we have also
(1 )Var
(yi)ni
=1
Var
(yi)ni
=1
(1 + )Var
(yi)ni
=1
:
5 / 15
7. Multi-output regression trees in randomly projected output
space
1. Randomly project the output space
yi
=
2. Grow the tree on the projected output space
(xi;yi)ni
=1
3. Label leaves using (yi)ni
=1
6 / 15
8. Ensemble of randomized trees
Shared subspace Individual subspace
(xi;yi)ni
=1 (xi;1yi)ni
=1 (xi;2yi)ni
=1
7 / 15
9. Bias-variance analysis
Averaging over the learning set LS, algorithm randomization and
output subspace randomization , the square error Err of t multi
output tree models can be decomposed into:
Single shared subspace (Algo 1)
ELS;;tfErr(f1(x; LS;; t))g
= 2R
(x) + B2(x) + VLS(x) +
VAlgo(x)
t
+ VProj(x):
Individual subspace (Algo 2)
ELS;t;tfErr(f2(x; LS;t; t))g
= 2R
(x) | {z }
residual error
+B2(x) | {z }
bias
+VLS(x) +
VAlgo(x) + VProj(x)
| {z t }
variance
:
Individual subspace should always be preferred to single shared
subspace.
8 / 15
10. Label ranking average precision to assess performance
LRAP( ^ f) =
1
jTSj
X
i2TS
1
jyij
X
j2fk:yik
=1g
jLi
j(yi)j
jLi
j(1d)j
;
Li
j(q) =
n
k : qk = 1 and ^ f(xi)k ^ f(xi)j
o
where ^ f(xi)j is the probability (or the score) associated to the label
j by the learnt model ^ f applied to xi, 1d is a d-dimensional row
vector of ones.
Higher score if true labels have a higher probability (score) than the
false labels.
9 / 15
11. Decision tree performance converges with m = 200
Gaussian random output projections
0 200 400 600 800 1000
m
0.205
0.200
0.195
0.190
0.185
0.180
LRAP
Decision tree
Decision tree on Gaussian subspace (Algo 2)
Delicious dataset (983 labels)
10 / 15
12. Faster convergence with ensemble of randomized trees
0 200 400 600 800 1000
m
0.378
0.375
0.372
0.369
0.366
LRAP
Random forest
Random forest on Gaussian subspace (Algo 2)
Random forest on fix-Gaussian subspace (Algo 1)
Delicious dataset (983 labels, k =
p
p, t = 100, nmin = 1)
Randomly projecting the output space reduces computing time
from 3458 seconds (no projection) to 311 seconds (m = 25,
individual subspace) without accuracy degradation.
11 / 15
13. Systematic analysis on 24 datasets
Increasing m leads to convergence in LRAP
EUR-Lex (subj.) m=1
bibtex
tmc2017
reuters
emotions
Expression-GO
WIPO
yeast
medical
Yeast-GO
mediamill
enron
SCOP-GO
EUR-Lex (dir.)
genbase
scene
drug-interaction
EUR-Lex (desc.)
protein-interaction
diatoms
corel5k
CAL500
delicious
bookmarks
m=log(d)
m=d
0.8 1.0 1.2
LRAP(random forest on Gaussian output subspace) / LRAP(random forest)
(k =
p
p, t = 100, nmin = 1, averaged over 10 repetitions)
12 / 15
14. Output randomization could be more effective than input
randomization
0 150 300 450 600
k
0.400
0.392
0.384
0.376
0.368
0.360
LRAP
Random forest
Random forest on Gaussian subspace (Algo 2) with m= 1
Random forest on Gaussian subspace (Algo 2) with m= 7
Random forest on Gaussian subspace (Algo 2) with m=14
Drug-interaction dataset(1554 labels, t = 100, nmin = 1)
13 / 15
15. Alternative random output subspace
0 30 60 90 120 150
m
0.39
0.38
0.37
0.36
0.35
0.34
0.33
LRAP
Random forest
Random forest on random output subspace (Algo 2)
Random forest on sparse Rademacher (s=pd) subspace (Algo 2)
Random forest on sparse Rademacher (s=3) subspace (Algo 2)
Delicious dataset(981 labels, k =
p
p, t = 100, nmin = 1)
14 / 15
16. Random forests with random projections of the output space
for high dimensional multi-label classification
Conclusions
I Lower computing time, without affecting accuracy.
I Optimizing input and output randomization could improve
prediction performance.
Future work
Efficient technique to adjust random output space parameters so as
to reach the best accuracy and computing time trade-off.
Source code is available @
github.com/arjoly/random-output-trees.
15 / 15