SlideShare una empresa de Scribd logo
1 de 19
Descargar para leer sin conexión
Weka tutorial
Speaker:楊明翰
What is Weka?
A collection of machine learning algorithms for data
mining tasks
Weka contains tools for
• data pre-processing,
• classification, regression,
• clustering,
• association rules, and
• visualization.
Suggestion: Version 3.5.8
What can it help in your hw1?
• Visualization
• Data analysis
• Easy to try different classifiers
But………..
If you want to get better performance, you still
have to implement many things ,such as cross
validation, parameters selection , and clustering .
P.S. You are free to use anything to complete the
homework.
Explorer
Classifier
Black : build in
Red: supported but need to
download by user
Installation guide for libsvm :
http://www.cs.iastate.edu/~
yasser/wlsvm/
Use Weka in your Java code
The most common components you might want to
use, are
– Instances - your data
– Filter - for pre-processing the data
– Classifier/Clusterer - is built on the processed
data
– Evaluating - how good is the classifier/clusterer?
– Attribute selection - removing irrelevant
attributes from your data
Arff format
@relation KDDCUP
@attribute Ground-Truth {-1.0,1.0}
@attribute Image-Finding-ID numeric
@attribute Study-Finding-ID numeric
@attribute Image-ID numeric
@attribute Study-ID numeric
@attribute LeftBreast {0.0,1.0}
@attribute MLO {0.0,1.0}
@attribute X-location numeric
@attribute Y-location numeric
@attribute X-nipple-location numeric
@attribute Y-nipple-location numeric
@attribute att1 numeric
@attribute att2 numeric
…
@attribute att117 numeric
@attribute serialNumber numeric
@data
-1.0,0.0,0.0,0,150,0.0,0.0,1732.0,2380.0,1356.0,2106.0,-1.196111E-1,4.764423E-2,2.27225E-1,2.511147E-1,-6.94537E-2,-7.478557E-2,5.444844E-
1,8.050464E-1,4.708327E-2,1.310514E0,-1.871811E-1,-4.098435E-1,-2.669971E-1,2.50289E-1,-2.438625E-1,8.022098E-2,8.098504E-1,9.880441E-
2,3.374689E-4,-6.384426E-1,1.108627E0,1.043443E0,-1.612419E0,-5.633943E-1,-4.357306E-1,-4.572176E-1,8.236916E-2,5.218327E-1,1.922271E-
1,4.565068E-1,-8.969028E-1,-4.403602E-1,1.41807E-1,-2.252249E-1,2.34936E-1,6.527024E-1,-5.750284E-1,-5.676962E-1,-5.344064E-1,-1.513411E-
1,7.280352E-1,7.21983E-1,6.978422E-1,5.667439E-1,3.273161E-3,-6.958107E-2,7.912039E-
1,1.659563E0,1.192391E0,1.173782E0,1.145927E0,1.645195E0,-5.52926E-1,-1.424765E-1,-1.416166E-1,-1.396449E-1,-1.374919E-1,-5.500465E-1,-
3.0028E-2,2.788235E-1,1.178261E0,2.937468E-1,3.483202E-1,3.941773E-1,4.250069E-1,3.226059E-1,2.569432E-1,5.522287E-
1,1.811639E0,1.844379E0,1.188755E0,1.86738E0,-1.05269E0,1.434895E-2,5.235738E-3,-4.779273E-3,-9.884836E-2,-9.526174E-1,-3.106309E-
1,1.434759E0,1.486669E0,3.402836E-1,5.323643E-1,-3.38767E-1,-3.644332E-1,7.650664E-3,3.811143E-2,5.595391E-2,-3.589534E-1,-6.765502E-1,-
6.669187E-1,-6.591878E-1,-2.893004E-1,1.048242E0,-7.317548E-1,-1.985699E-1,4.513422E-1,1.06145E0,4.777854E-
1,1.267896E0,1.350758E0,1.337705E0,1.385917E0,1.091785E0,1.289325E0,5.511991E-1,-8.125907E-1,1.050196E0,-4.338815E-1,-4.664211E-
1,6.203229E-1,-6.020947E-1,5.299978E-1,2.989034E-1,-7.676021E-2,1.5216E-1,-3.001498E-1,0
Instances
import weka.core.Instances;
import java.io.BufferedReader;
import java.io.FileReader;
...
Instances data = new Instances( new BufferedReader( new
FileReader("/some/where/data.arff")));
// setting class attribute
data.setClassIndex(data.numAttributes() - 1);
// The class index indicate the target attribute used for
classification.
filters
import weka.core.Instances;
import weka.filters.Filter;
import weka.filters.unsupervised.attribute.Remove;
...
String[] options = new String[2];
options[0] = "-R"; // "range"
options[1] = "1"; // first attribute
Remove remove = new Remove(); // new instance of filter
remove.setOptions(options); // set options
remove.setInputFormat(data); // inform filter about dataset AFTER
setting options
Instances newData = Filter.useFilter(data, remove); // apply filter
classifier
import weka.classifiers.functions.LibSVM;
...
String[] options = String[] options =
weka.core.Utils.splitOptions("-S 0 -K 2 -D 3 -G 0.0 -R 0.0 -N 0.5
-M 40.0 -C 1.0 -E 0.0010 -P 0.1 -B");
LibSVM classifier = new LibSVM(); // new instance of tree
classifier.setOptions(options); // set the options
classifier.buildClassifier(data); // build classifier
Classifying instances
Instances unlabeled=…//load from somewhere
…
for (int i = 0; i < unlabeled.numInstances(); i++) {
Instance ins=unlabeled.instance(i);
clsLabel = classifier.classifyInstance(ins); //get predict label
double[] prob_array=classifier.distributionForInstance(ins);
//get probability for each category
}
Example:weka+libsvm+5 folds CV
public static void main(String[] args) throws Exception {
PrintWriter pw_score=new PrintWriter( new FileOutputStream ("c:tempscore.txt"));
PrintWriter pw_label=new PrintWriter(new FileOutputStream ("c:templabel.txt"));
PrintWriter pw_pid=new PrintWriter(new FileOutputStream ("c:temppid.txt"));
Instances data = new Instances(
new BufferedReader(
new FileReader("C:tempTrainSet_sn.arff")));
Remove remove = new Remove(); // new instance of filter
remove.setOptions(weka.core.Utils.splitOptions("-R 2-11,129"));// set options
remove.setInputFormat(data); // inform filter about dataset AFTER setting options
Int seed = 2; // the seed for randomizing the data
int folds = 5; // the number of folds to generate, >=2
data.setClassIndex(0); // first attribute is groundtruth
Instances randData;
Random rand = new Random(seed); // create seeded number generator
randData = new Instances(data); // create copy of original data
randData.randomize(rand); // randomize data with number generator
for(int n=0;n<folds;n++){
Instances train = randData.trainCV(folds, n);
Instances test = randData.testCV(folds, n);
System.out.println("Fold "+n+"train "+train.numInstances()+"test "+test.numInstances());
String[] options = weka.core.Utils.splitOptions("-S 0 -K 2 -D 3 -G 0.0 -R 0.0 -N 0.5 -M 40.0 -C
1.0 -E 0.0010 -P 0.1 -B");
LibSVM classifier=new LibSVM();
classifier.setOptions(options);
FilteredClassifier fc = new FilteredClassifier();
fc.setFilter(remove);
fc.setClassifier(classifier);
fc.buildClassifier(train);
for(int i=0;i<test.numInstances();i++)
{
double[] tmp=(double[])fc.distributionForInstance(test.instance(i));
//tmp[0] :prob of negtive
//tmp[1] :prob of positive
pw_label.println(test.instance(i).attribute(0).value((int)test.instance(i).value(0))); //ground
truth
pw_score.println(tmp[1]); //predict value
pw_pid.println((int)test.instance(i).value(4)); //study-ID
}}
FROC
Algorithm:
1. Load “predicted score”, “ground truth”, and “patient id”.
2. Initialize :
“Detected_patients = [ ]
Sorting rows
( priority “predicted score” > “ground truth” > “patient id” in descending order).
3. For each row,
If ground truth is negative, x+=1
Else // get a positive point
If patient is not in “Detected_patients, //get a new positive patient
y+=1 and add patient_id to Detected_patients
else //patient is found before
do nothing
4. Normalize
x => 0~ average false alarm per image i.e. X is divided by total image numbers
y => 0~1 i.e. Y is divided by patients numbers
5. Calculate the area under the curve
FROC tools-JAVA
java -cp bin mslab.kddcup2008.roc.ROC score.txt label.txt pid.txt
score.txt : predict label for each point . i.e. probability for being
positive
label.txt : ground truth for each point
pid.txt : patient ID for each point
FROC tools-Matlab
• Matlab matlab function
– [Pd_patient_wise,FA_per_image,AUC] =
get_ROC_KDD(p,Y,PID,fa_low,fa_high)
• Pd_patient_wise
– The y location of each point on the curve.
• FA_per_image
– The x location of each point on the curve.
• AUC
• p – Predicted label
• Y – Ground truth
• PID – Patient ID
– Plot(FA_per_image,Pd_patient_wise);
FROC curve example
The result of above example:
• AUC = 0.0782
Measurements by Points:
• TP = 237
• FN = 386
• FP = 108
• TN = 101563
• precision = 0.6870
• recall = 0.3804
• FScore = 0.4897
Reference:
Use weka in your java code
Generating cross-validation folds
Download:
Example code
Java roc code
matlab roc code

Más contenido relacionado

La actualidad más candente

Java OOP Programming language (Part 3) - Class and Object
Java OOP Programming language (Part 3) - Class and ObjectJava OOP Programming language (Part 3) - Class and Object
Java OOP Programming language (Part 3) - Class and ObjectOUM SAOKOSAL
 
iOS Development Methodology
iOS Development MethodologyiOS Development Methodology
iOS Development MethodologySmartLogic
 
Decision tree handson
Decision tree handsonDecision tree handson
Decision tree handsonShyam Sarkar
 
Chap2 class,objects contd
Chap2 class,objects contdChap2 class,objects contd
Chap2 class,objects contdraksharao
 
An Overview of the Java Programming Language
An Overview of the Java Programming LanguageAn Overview of the Java Programming Language
An Overview of the Java Programming LanguageSalaam Kehinde
 
Jdbc oracle
Jdbc oracleJdbc oracle
Jdbc oracleyazidds2
 
Java 8 - An Introduction by Jason Swartz
Java 8 - An Introduction by Jason SwartzJava 8 - An Introduction by Jason Swartz
Java 8 - An Introduction by Jason SwartzJason Swartz
 
Lecture02 class -_templatev2
Lecture02 class -_templatev2Lecture02 class -_templatev2
Lecture02 class -_templatev2Hariz Mustafa
 
.NET Database Toolkit
.NET Database Toolkit.NET Database Toolkit
.NET Database Toolkitwlscaudill
 

La actualidad más candente (20)

Java OOP Programming language (Part 3) - Class and Object
Java OOP Programming language (Part 3) - Class and ObjectJava OOP Programming language (Part 3) - Class and Object
Java OOP Programming language (Part 3) - Class and Object
 
Java Programming - 06 java file io
Java Programming - 06 java file ioJava Programming - 06 java file io
Java Programming - 06 java file io
 
Elementary Sort
Elementary SortElementary Sort
Elementary Sort
 
Unit testing
Unit testingUnit testing
Unit testing
 
Op ps
Op psOp ps
Op ps
 
Magic methods
Magic methodsMagic methods
Magic methods
 
Lecture 7 arrays
Lecture   7 arraysLecture   7 arrays
Lecture 7 arrays
 
Unit3 part1-class
Unit3 part1-classUnit3 part1-class
Unit3 part1-class
 
Java session4
Java session4Java session4
Java session4
 
iOS Development Methodology
iOS Development MethodologyiOS Development Methodology
iOS Development Methodology
 
Decision tree handson
Decision tree handsonDecision tree handson
Decision tree handson
 
PHP 5 Magic Methods
PHP 5 Magic MethodsPHP 5 Magic Methods
PHP 5 Magic Methods
 
3 class definition
3 class definition3 class definition
3 class definition
 
Chap2 class,objects contd
Chap2 class,objects contdChap2 class,objects contd
Chap2 class,objects contd
 
An Overview of the Java Programming Language
An Overview of the Java Programming LanguageAn Overview of the Java Programming Language
An Overview of the Java Programming Language
 
Jdbc oracle
Jdbc oracleJdbc oracle
Jdbc oracle
 
Java 8 - An Introduction by Jason Swartz
Java 8 - An Introduction by Jason SwartzJava 8 - An Introduction by Jason Swartz
Java 8 - An Introduction by Jason Swartz
 
Spring data jpa
Spring data jpaSpring data jpa
Spring data jpa
 
Lecture02 class -_templatev2
Lecture02 class -_templatev2Lecture02 class -_templatev2
Lecture02 class -_templatev2
 
.NET Database Toolkit
.NET Database Toolkit.NET Database Toolkit
.NET Database Toolkit
 

Destacado (8)

ITB tutorial WEKA Prabhat Agarwal
ITB tutorial WEKA Prabhat AgarwalITB tutorial WEKA Prabhat Agarwal
ITB tutorial WEKA Prabhat Agarwal
 
Wekatutorial
WekatutorialWekatutorial
Wekatutorial
 
Weka
WekaWeka
Weka
 
Text classification with Weka
Text classification with WekaText classification with Weka
Text classification with Weka
 
Text categorization
Text categorizationText categorization
Text categorization
 
Weka presentation
Weka presentationWeka presentation
Weka presentation
 
KNN
KNN KNN
KNN
 
WEKA Tutorial
WEKA TutorialWEKA Tutorial
WEKA Tutorial
 

Similar a saihw1_weka_tutorial.pptx - Machine Discovery and Social Network ...

MT_01_unittest_python.pdf
MT_01_unittest_python.pdfMT_01_unittest_python.pdf
MT_01_unittest_python.pdfHans Jones
 
Machine Learning Algorithms
Machine Learning AlgorithmsMachine Learning Algorithms
Machine Learning AlgorithmsHichem Felouat
 
Introduction to Software Testing
Introduction to Software TestingIntroduction to Software Testing
Introduction to Software TestingSergio Arroyo
 
CascadiaJS 2015 - Adding intelligence to your JS applications
CascadiaJS 2015 - Adding intelligence to your JS applicationsCascadiaJS 2015 - Adding intelligence to your JS applications
CascadiaJS 2015 - Adding intelligence to your JS applicationsKevin Dela Rosa
 
CS301-lec01.ppt
CS301-lec01.pptCS301-lec01.ppt
CS301-lec01.pptomair31
 
Ifi7184 lesson3
Ifi7184 lesson3Ifi7184 lesson3
Ifi7184 lesson3Sónia
 
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Yao Yao
 
EVERYTHING ABOUT STATIC CODE ANALYSIS FOR A JAVA PROGRAMMER
EVERYTHING ABOUT STATIC CODE ANALYSIS FOR A JAVA PROGRAMMEREVERYTHING ABOUT STATIC CODE ANALYSIS FOR A JAVA PROGRAMMER
EVERYTHING ABOUT STATIC CODE ANALYSIS FOR A JAVA PROGRAMMERAndrey Karpov
 
Spock: Test Well and Prosper
Spock: Test Well and ProsperSpock: Test Well and Prosper
Spock: Test Well and ProsperKen Kousen
 
Quick Machine learning projects steps in 5 mins
Quick Machine learning projects steps in 5 minsQuick Machine learning projects steps in 5 mins
Quick Machine learning projects steps in 5 minsNaveen Davis
 
Grails unit testing
Grails unit testingGrails unit testing
Grails unit testingpleeps
 
Svm implementation for Health Data
Svm implementation for Health DataSvm implementation for Health Data
Svm implementation for Health DataAbhishek Agrawal
 
Core Java Concepts
Core Java ConceptsCore Java Concepts
Core Java Conceptsmdfkhan625
 
Spark MLlib - Training Material
Spark MLlib - Training Material Spark MLlib - Training Material
Spark MLlib - Training Material Bryan Yang
 
Static analysis: Around Java in 60 minutes
Static analysis: Around Java in 60 minutesStatic analysis: Around Java in 60 minutes
Static analysis: Around Java in 60 minutesAndrey Karpov
 

Similar a saihw1_weka_tutorial.pptx - Machine Discovery and Social Network ... (20)

MT_01_unittest_python.pdf
MT_01_unittest_python.pdfMT_01_unittest_python.pdf
MT_01_unittest_python.pdf
 
Machine Learning Algorithms
Machine Learning AlgorithmsMachine Learning Algorithms
Machine Learning Algorithms
 
Introduction to Software Testing
Introduction to Software TestingIntroduction to Software Testing
Introduction to Software Testing
 
CascadiaJS 2015 - Adding intelligence to your JS applications
CascadiaJS 2015 - Adding intelligence to your JS applicationsCascadiaJS 2015 - Adding intelligence to your JS applications
CascadiaJS 2015 - Adding intelligence to your JS applications
 
OpenML 2019
OpenML 2019OpenML 2019
OpenML 2019
 
Ember
EmberEmber
Ember
 
CS301-lec01.ppt
CS301-lec01.pptCS301-lec01.ppt
CS301-lec01.ppt
 
Ifi7184 lesson3
Ifi7184 lesson3Ifi7184 lesson3
Ifi7184 lesson3
 
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
 
Java Tutorial
Java Tutorial Java Tutorial
Java Tutorial
 
Data Structure Lec #1
Data Structure Lec #1Data Structure Lec #1
Data Structure Lec #1
 
EVERYTHING ABOUT STATIC CODE ANALYSIS FOR A JAVA PROGRAMMER
EVERYTHING ABOUT STATIC CODE ANALYSIS FOR A JAVA PROGRAMMEREVERYTHING ABOUT STATIC CODE ANALYSIS FOR A JAVA PROGRAMMER
EVERYTHING ABOUT STATIC CODE ANALYSIS FOR A JAVA PROGRAMMER
 
Spock: Test Well and Prosper
Spock: Test Well and ProsperSpock: Test Well and Prosper
Spock: Test Well and Prosper
 
Quick Machine learning projects steps in 5 mins
Quick Machine learning projects steps in 5 minsQuick Machine learning projects steps in 5 mins
Quick Machine learning projects steps in 5 mins
 
Grails unit testing
Grails unit testingGrails unit testing
Grails unit testing
 
Svm implementation for Health Data
Svm implementation for Health DataSvm implementation for Health Data
Svm implementation for Health Data
 
Core Java Concepts
Core Java ConceptsCore Java Concepts
Core Java Concepts
 
ppopoff
ppopoffppopoff
ppopoff
 
Spark MLlib - Training Material
Spark MLlib - Training Material Spark MLlib - Training Material
Spark MLlib - Training Material
 
Static analysis: Around Java in 60 minutes
Static analysis: Around Java in 60 minutesStatic analysis: Around Java in 60 minutes
Static analysis: Around Java in 60 minutes
 

Más de butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

Más de butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

saihw1_weka_tutorial.pptx - Machine Discovery and Social Network ...

  • 2. What is Weka? A collection of machine learning algorithms for data mining tasks Weka contains tools for • data pre-processing, • classification, regression, • clustering, • association rules, and • visualization. Suggestion: Version 3.5.8
  • 3. What can it help in your hw1? • Visualization • Data analysis • Easy to try different classifiers But……….. If you want to get better performance, you still have to implement many things ,such as cross validation, parameters selection , and clustering . P.S. You are free to use anything to complete the homework.
  • 5. Classifier Black : build in Red: supported but need to download by user Installation guide for libsvm : http://www.cs.iastate.edu/~ yasser/wlsvm/
  • 6. Use Weka in your Java code The most common components you might want to use, are – Instances - your data – Filter - for pre-processing the data – Classifier/Clusterer - is built on the processed data – Evaluating - how good is the classifier/clusterer? – Attribute selection - removing irrelevant attributes from your data
  • 7. Arff format @relation KDDCUP @attribute Ground-Truth {-1.0,1.0} @attribute Image-Finding-ID numeric @attribute Study-Finding-ID numeric @attribute Image-ID numeric @attribute Study-ID numeric @attribute LeftBreast {0.0,1.0} @attribute MLO {0.0,1.0} @attribute X-location numeric @attribute Y-location numeric @attribute X-nipple-location numeric @attribute Y-nipple-location numeric @attribute att1 numeric @attribute att2 numeric … @attribute att117 numeric @attribute serialNumber numeric @data -1.0,0.0,0.0,0,150,0.0,0.0,1732.0,2380.0,1356.0,2106.0,-1.196111E-1,4.764423E-2,2.27225E-1,2.511147E-1,-6.94537E-2,-7.478557E-2,5.444844E- 1,8.050464E-1,4.708327E-2,1.310514E0,-1.871811E-1,-4.098435E-1,-2.669971E-1,2.50289E-1,-2.438625E-1,8.022098E-2,8.098504E-1,9.880441E- 2,3.374689E-4,-6.384426E-1,1.108627E0,1.043443E0,-1.612419E0,-5.633943E-1,-4.357306E-1,-4.572176E-1,8.236916E-2,5.218327E-1,1.922271E- 1,4.565068E-1,-8.969028E-1,-4.403602E-1,1.41807E-1,-2.252249E-1,2.34936E-1,6.527024E-1,-5.750284E-1,-5.676962E-1,-5.344064E-1,-1.513411E- 1,7.280352E-1,7.21983E-1,6.978422E-1,5.667439E-1,3.273161E-3,-6.958107E-2,7.912039E- 1,1.659563E0,1.192391E0,1.173782E0,1.145927E0,1.645195E0,-5.52926E-1,-1.424765E-1,-1.416166E-1,-1.396449E-1,-1.374919E-1,-5.500465E-1,- 3.0028E-2,2.788235E-1,1.178261E0,2.937468E-1,3.483202E-1,3.941773E-1,4.250069E-1,3.226059E-1,2.569432E-1,5.522287E- 1,1.811639E0,1.844379E0,1.188755E0,1.86738E0,-1.05269E0,1.434895E-2,5.235738E-3,-4.779273E-3,-9.884836E-2,-9.526174E-1,-3.106309E- 1,1.434759E0,1.486669E0,3.402836E-1,5.323643E-1,-3.38767E-1,-3.644332E-1,7.650664E-3,3.811143E-2,5.595391E-2,-3.589534E-1,-6.765502E-1,- 6.669187E-1,-6.591878E-1,-2.893004E-1,1.048242E0,-7.317548E-1,-1.985699E-1,4.513422E-1,1.06145E0,4.777854E- 1,1.267896E0,1.350758E0,1.337705E0,1.385917E0,1.091785E0,1.289325E0,5.511991E-1,-8.125907E-1,1.050196E0,-4.338815E-1,-4.664211E- 1,6.203229E-1,-6.020947E-1,5.299978E-1,2.989034E-1,-7.676021E-2,1.5216E-1,-3.001498E-1,0
  • 8. Instances import weka.core.Instances; import java.io.BufferedReader; import java.io.FileReader; ... Instances data = new Instances( new BufferedReader( new FileReader("/some/where/data.arff"))); // setting class attribute data.setClassIndex(data.numAttributes() - 1); // The class index indicate the target attribute used for classification.
  • 9. filters import weka.core.Instances; import weka.filters.Filter; import weka.filters.unsupervised.attribute.Remove; ... String[] options = new String[2]; options[0] = "-R"; // "range" options[1] = "1"; // first attribute Remove remove = new Remove(); // new instance of filter remove.setOptions(options); // set options remove.setInputFormat(data); // inform filter about dataset AFTER setting options Instances newData = Filter.useFilter(data, remove); // apply filter
  • 10. classifier import weka.classifiers.functions.LibSVM; ... String[] options = String[] options = weka.core.Utils.splitOptions("-S 0 -K 2 -D 3 -G 0.0 -R 0.0 -N 0.5 -M 40.0 -C 1.0 -E 0.0010 -P 0.1 -B"); LibSVM classifier = new LibSVM(); // new instance of tree classifier.setOptions(options); // set the options classifier.buildClassifier(data); // build classifier
  • 11. Classifying instances Instances unlabeled=…//load from somewhere … for (int i = 0; i < unlabeled.numInstances(); i++) { Instance ins=unlabeled.instance(i); clsLabel = classifier.classifyInstance(ins); //get predict label double[] prob_array=classifier.distributionForInstance(ins); //get probability for each category }
  • 12. Example:weka+libsvm+5 folds CV public static void main(String[] args) throws Exception { PrintWriter pw_score=new PrintWriter( new FileOutputStream ("c:tempscore.txt")); PrintWriter pw_label=new PrintWriter(new FileOutputStream ("c:templabel.txt")); PrintWriter pw_pid=new PrintWriter(new FileOutputStream ("c:temppid.txt")); Instances data = new Instances( new BufferedReader( new FileReader("C:tempTrainSet_sn.arff"))); Remove remove = new Remove(); // new instance of filter remove.setOptions(weka.core.Utils.splitOptions("-R 2-11,129"));// set options remove.setInputFormat(data); // inform filter about dataset AFTER setting options Int seed = 2; // the seed for randomizing the data int folds = 5; // the number of folds to generate, >=2 data.setClassIndex(0); // first attribute is groundtruth Instances randData; Random rand = new Random(seed); // create seeded number generator randData = new Instances(data); // create copy of original data randData.randomize(rand); // randomize data with number generator
  • 13. for(int n=0;n<folds;n++){ Instances train = randData.trainCV(folds, n); Instances test = randData.testCV(folds, n); System.out.println("Fold "+n+"train "+train.numInstances()+"test "+test.numInstances()); String[] options = weka.core.Utils.splitOptions("-S 0 -K 2 -D 3 -G 0.0 -R 0.0 -N 0.5 -M 40.0 -C 1.0 -E 0.0010 -P 0.1 -B"); LibSVM classifier=new LibSVM(); classifier.setOptions(options); FilteredClassifier fc = new FilteredClassifier(); fc.setFilter(remove); fc.setClassifier(classifier); fc.buildClassifier(train); for(int i=0;i<test.numInstances();i++) { double[] tmp=(double[])fc.distributionForInstance(test.instance(i)); //tmp[0] :prob of negtive //tmp[1] :prob of positive pw_label.println(test.instance(i).attribute(0).value((int)test.instance(i).value(0))); //ground truth pw_score.println(tmp[1]); //predict value pw_pid.println((int)test.instance(i).value(4)); //study-ID }}
  • 14. FROC Algorithm: 1. Load “predicted score”, “ground truth”, and “patient id”. 2. Initialize : “Detected_patients = [ ] Sorting rows ( priority “predicted score” > “ground truth” > “patient id” in descending order). 3. For each row, If ground truth is negative, x+=1 Else // get a positive point If patient is not in “Detected_patients, //get a new positive patient y+=1 and add patient_id to Detected_patients else //patient is found before do nothing 4. Normalize x => 0~ average false alarm per image i.e. X is divided by total image numbers y => 0~1 i.e. Y is divided by patients numbers 5. Calculate the area under the curve
  • 15. FROC tools-JAVA java -cp bin mslab.kddcup2008.roc.ROC score.txt label.txt pid.txt score.txt : predict label for each point . i.e. probability for being positive label.txt : ground truth for each point pid.txt : patient ID for each point
  • 16. FROC tools-Matlab • Matlab matlab function – [Pd_patient_wise,FA_per_image,AUC] = get_ROC_KDD(p,Y,PID,fa_low,fa_high) • Pd_patient_wise – The y location of each point on the curve. • FA_per_image – The x location of each point on the curve. • AUC • p – Predicted label • Y – Ground truth • PID – Patient ID – Plot(FA_per_image,Pd_patient_wise);
  • 18. The result of above example: • AUC = 0.0782 Measurements by Points: • TP = 237 • FN = 386 • FP = 108 • TN = 101563 • precision = 0.6870 • recall = 0.3804 • FScore = 0.4897
  • 19. Reference: Use weka in your java code Generating cross-validation folds Download: Example code Java roc code matlab roc code