2. Aproach 1/2 Create basics Amount of documents for testing (500/1000/2000/4000/8000/1600/full) Define min & max document frequentie “How to select key attributes/features? Best predictor?”
3. Aproach 2/2 Testing Using cross validation. Data: F-Measure (precission / recall) ROC (Receiver operating characteristic) TP vs FP Classifier errors Correctly VS Incorrectly Classified Instances