Semi-supervised learning reduces the cost of labeling the
training data of a supervised learning algorithm through using unlabeled
data together with labeled data to improve the performance. Co-Training
is a popular semi-supervised learning algorithm, that requires multiple redundant
and independent sets of features (views). In many real-world application
domains, this requirement can not be satisfied. In this paper, a
single-view variant of Co-Training, CoBC (Co-Training by Committee),
is proposed, which requires an ensemble of diverse classifiers instead of
the redundant and independent views. Then we introduce two new learning
algorithms, QBC-then-CoBC and QBC-with-CoBC, which combines
the merits of committee-based semi-supervised learning and committeebased
active learning. An empirical study on handwritten digit recognition
is conducted where the random subspace method (RSM) is used to
create ensembles of diverse C4.5 decision trees. Experiments show that
these two combinations outperform the other non committee-based ones.
Combining Committee-Based Semi-supervised and Active Learning and Its Application to Handwritten Digits Recognition
1. Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work
Combining Committee-based
Semi-supervised and Active Learning and
Its Application to Handwritten Digits
Recognition
Mohamed Farouk Abdel Hady, Friedhelm Schwenker
Institute of Neural Information Processing
University of Ulm, Germany
{mohamed.abdel-hady|friedhelm.schwenker}@uni-ulm.de
April 8, 2010
1 / 24
3. Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work
Semi-Supervised Learning
In many domains, the amount of training examples is large
but unlabeled.
Data labeling process is often tedious, expensive and
time consuming because it requires the effort of human
experts.
Research directions of SSL
Semi-Supervised Clustering
Semi-Supervised Classification
Semi-Supervised Regression
Semi-Supervised Dimensionality Reduction
3 / 24
4. Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work
Semi-Supervised Learning
Description SSL algorithm
Single-view, Single-learner EM (Nigam and Ghani, 2000)
Single-classifier Self-Training (Nigam and Ghani, 2000)
Multi-view, Single-learner Co-EM (Nigam and Ghani, 2000)
Multiple classifiers Co-Training (Blum and Mitchell, COLT’98)
Single-view, Multi-learner Statistical Co-Learning (Goldman et al., 2000)
Multiple classifiers Democratic Co-Learning (Y. Zhou et al., 2004)
Single-view, Single-learner Tri-Training (Z.-H. Zhou, TKDE’05)
Multiple classifiers Co-Forest (Li and Z.-H. Zhou, TSMC’07)
Co-Training by Committee
Z.-H. Zhou and M. Li, Semi-supervised learning by disagreement, Knowledge and
Information Systems, in press.
4 / 24
7. Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work
Self-Training
But the most confident examples often lie away from the target
decision boundary (non informative examples). Therefore, in
many cases this process does not create representative
training sets as it selects non informative examples.
7 / 24
8. Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work
Multi-View Co-Training
Blum and Mitchell (1998)
As any multi-view learning algorithm, it requires that each
training example is represented by multiple sufficient and
redundant views,
i.e. two or more sets of features that are conditionally
independent given the class label and each is sufficient for
learning.
For web page classification: 1) the text appearing on the
page itself, and 2) the text attached to hyperlinks pointing
to this page, from other pages.
8 / 24
10. Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work
Single-View Co-Training by Committee
Contribution
A single-view variant of Co-Training for application
domains in which there are not redundant and independent
views is proposed.
Two learning frameworks for combining the merits of active
learning with semi-supervised learning.
Motivation
For many real-world applications, the requirement for two
sufficient and independent views can not be fulfilled.
Co-Training does not work well without an appropriate
feature splitting (Nigam and Ghani, 2000)
Measuring the labeling confidence is not a straightforward
task.
10 / 24
12. Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work
How to measure confidence
Inaccurate confidence estimation
→ selecting and adding mislabeled examples to the training set
→ degrade the classification accuracy
Estimating Class Probabilities (CPE) provided by companion
committee.
Confidence(xu, H
(t−1)
i ) = max
1≤c≤C
H
(t−1)
i (xu, ωc)
Unfortunately, in many cases the classifier does not provide an
accurate CPE. For instance, a decision tree provides piecewise
constant probability estimates. That is, all unlabeled examples
xu which lie into a particular leaf, will have the same CPEs
because the exact value of xu is not used in determining its
CPE.
12 / 24
13. Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work
Improving CPE of Decision Trees
Laplace Correction, Probability Estimation Tree (PET),
(Provost, Machine Learning 2003)
P(ωc|xu) =
nc + 1
N + C
Bagging of PET
Retrofitting Decision Tree Classifiers Using Kernel Density
Estimation (Fayyad, ICML’95)
Improve Decision Trees for Probability-Based Ranking by
Lazy Learners (Liang, ICTAI’06)
13 / 24
14. Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work
Estimating local competence
The local competency of an unlabeled example xu given
H
(t−1)
i is defined as follows:
Comp(xu, H
(t−1)
i ) =
xn∈N(xu),xn∈ωpred
H
(t−1)
i (xn, ωpred )
||xn − xu||2 +
where ωpred is the class label assigned to xu by H
(t−1)
i ;
H
(t−1)
j (xn, ωpred ) is the probability given by H
(t−1)
j that
neighbor xn belongs to class ωpred ; is a constant added to
avoid zero denominator.
It is inspired by decision-dependent distance-based k-nn
estimate of the competence that was proposed for dynamic
classifier selection. (Woods, PAMI’97)
14 / 24
15. Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work
Estimating local competence
estimating local competence of an unlabeled example
given companion committee
15 / 24
16. Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work
Handwritten Digits Recognition
The Handwritten Digits that are described by four sets of
features and are publicly available at UCI Repository. The digits
were extracted from a collection of Dutch utility maps. A total of
2,000 patterns (200 patterns per class) have been digitized in
binary images.
Name Description
mfeat-pix 240 pixel averages in 2 x 3 windows
mfeat-kar 64 Karhunen-Love coefficients
mfeat-fac 216 profile correlations
mfeat-fou 76 Fourier coefficients of the character shapes
16 / 24
17. Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work
Experimental Setup
WEKA
4 runs of 10-fold cross-validation
For SSL, 10% of the training examples (180 patterns) are
randomly selected as the initial labeled data set L while the
remaining are used as unlabeled data set U.
The Random Subspace Method constructs an ensemble of
ten C4.5 pruned decision trees (with Laplace Correction)
where each tree uses only 50% of the features.
We set the pool size u = 100, the sample size n = one and
the number of nearest neighbors used to estimate local
competence k is 10.
17 / 24
18. Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work
Experimental Results
Comparison between forests and individual trees.
Comparison between CoBC and Self-Training.
Comparison between CPE and local competence
confidence measures.
Comparison between CoBC and Co-Forest.
18 / 24
19. Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work
Experimental Results
•
: corrected paired t-test implemented in WEKA at 0.05 significance level.
19 / 24
20. Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work
Combining QBC and CoBC
Both semi-supervised learning and active learning tackle the
same problem but from different directions.
QBC-then-CoBC: QBC provides CoBC with a better
starting point instead of randomly selecting labeled
examples.
QBC-with-CoBC: In QBC-then-CoBC, QBC does not
benefit from CoBC. On the other hand, in QBC-with-CoBC,
both algorithms are benefiting from each other.
20 / 24
21. Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work
Experimental Results
•
: corrected paired t-test implemented in WEKA at 0.05 significance level.
21 / 24
22. Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work
Conclusion
A new single-view committe-based semi-supervised
learning framework is proposed.
An ensemble of diverse and accurate classifiers can
effectively exploit the unlabeled data to improve the
recognition accuracy.
The random subspace method not only enforces the
diversity but also reduces the dimensionality which is
desirable in case of small training set size.
CoBC outperforms Self-Training.
The local competence estimates is an effective confidence
measure that outperforms the class probability estimates
for sample selection.
22 / 24
23. Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work
Future Work
Influence of ensemble size, random subspace size
Different ensemble learners, base learners such as SVM
or kNN
CoBC depends only on the companion committee H
(t−1)
j
constructed at the previous iteration to measure
confidence. We will study the influence of depending on all
the previous versions (H
(t )
j , t = t − 1, t − 2, . . . , 0).
23 / 24