2. Supervised learning is a typical machine learning setting, where labeled examples are used as training examples ? = yes Supervised learning decision trees, neural networks, support vector machines, etc. trained model training data label training unseen data (Jeff, Professor, 7, ?) label unknown
3. Labeled vs. Unlabeled In many practical applications, unlabeled training examples are readily available but labeled ones are fairly expansive to obtain because labeling the unlabeled examples requires human effort class = “ war ” (almost) infinite number of web pages on the Internet ?
4.
5. SSL: Why unlabeled data can be helpful? Suppose the data is well-modeled by a mixture density: Thus, the optimal classification rule for this model is the MAP rule: [D.J. Miller & H.S. Uyar, NIPS’96] where and = { l } The class labels are viewed as random quantities and are assumed chosen conditioned on the selected mixture component m i {1,2,…, L } and possibly on the feature value, i.e. according to the probabilities P[ c i | x i , m i ] where unlabeled examples can be used to help estimate this term
6. Transductive SVM Transductive SVM : Taking into account a particular test set and trying to minimize misclassifications of just those particular examples Figure reprinted from [T. Joachims, ICML99] Concretely, using unlabeled examples to help identify the maximum margin hyperplanes
7. Active learning: Getting more from query The labels of the training examples are obtained by querying the oracle . Thus, for the same number of queries, more helpful information can be obtained by actively selecting some unlabeled examples to query Key: To select the unlabeled examples on which the labeling will convey the most helpful information for the learner
8.
9.
10.
11.
12. [A. Blum & T. Mitchell, COLT98] Co-training (con’t) learner 1 learner 2 X 1 view X 2 view labeled training examples unlabeled training examples labeled unlabeled examples labeled unlabeled examples
20. The Yarowsky Algorithm Choose instances labeled with high confidence Add them to the pool of current labeled training data …… (Yarowsky 1995) Iteration: 0 + - A Classifier trained by SL Iteration: 1 + - Iteration: 2 + -
25. Co-Training Allow C1 to label Some instances Allow C2 to label Some instances Iteration: t + - Iteration: t +1 + - …… C1 : A Classifier trained on view 1 C2 : A Classifier trained on view 2 Add self-labeled instances to the pool of training data