1. Introduction to Machine
Learning
Lecture 4
Slides based on Francisco Herrera course on Data Mining
Albert Orriols i Puig
aorriols@salle.url.edu
i l @ ll ld
Artificial Intelligence – Machine Learning
Enginyeria i Arquitectura La Salle
gy q
Universitat Ramon Llull
2. Recap of Lecture 3
Typically, techniques in ML have been divided in
different paradigms
Inductive learning
Explanation-based learning
p g
Analogy-based learning
Evolutionary learning
Connectionist Learning
Slide 2
Artificial Intelligence Machine Learning
3. Recap of Lecture 3
Problems that we’ll study
Data l
D t classification: C4 5 kNN N ï B
ifi ti C4.5, kNN, Naïve Bayes …
1.
Statistical learning: SVM
2.
Association analysis: A-priori
3.
Link mining: Page Rank
4.
Clustering: k-means
5.
Reinforcement learning: Q-learning, XCS
g g,
6.
Regression
7.
Genetic Fuzzy Systems
8.
8
Slide 3
Artificial Intelligence Machine Learning
4. Today’s Agenda
Situation: Where Are We?
Classification
Prediction
Clustering
Association
Data Mining Systems
D t Mi i S t
Slide 4
Artificial Intelligence Machine Learning
5. Situation: Where Are We?
The input consists of examples featured by
different characteristics
Slide 5
Artificial Intelligence Machine Learning
6. Situation: Where Are We?
What can we do with a bunch of examples?
Depend on the type of examples we may have
Classification: Find the class to which a new instance belongs to
g
E.g.: Find whether a new patient has cancer or not
Numeric prediction: A variation of classification in which the output
p p
consists of numeric classes
E.g.: Find the frequency of cancerous cell found
Regression: Find a function that fits your examples
E.g.: Find a function that controls your chain process
Association: Find association among your problem attributes or
variables
E.g.: Find relations such as a patient with high-blood-pressure i
E Fi d l ti h ti t ith hi h bl d is
more likely to have heart-attack disease
Clustering: Process to cluster/group the instances into classes
E.g.: Group clients whose purchases are similar
Slide 6
Artificial Intelligence Machine Learning
7. Data Classification
Test set
New instance
Information based Knowledge
on experience extraction
t ti
Learner Model
Dataset
Predicted Output
Training set
Slide 7
Artificial Intelligence Machine Learning
8. Example of Data Classification
Data Set Classification Model How
The classification model can be implemented in several ways:
• Rules
• Decision trees
• Mathematical formulae
Slide 8
Artificial Intelligence Machine Learning
9. Classification as a Two-Step Process
Model usage: to classify future or unknown objects
g y j
Estimate the accuracy of the model
The known label of test samples is compared with the label
predicted by the system
The accuracy rate is the p p
y proportion of test examples that are
p
correctly classified by the model
The test set is independent of the training set
If the experts thing that the model is acceptable
Then, use to the model to predict unknown examples
Slide 9
Artificial Intelligence Machine Learning
10. Going to Real World
katydids
Definition: Given a collection of
a o a ed data (in s
annotated da a ( this case katydids
a yd ds
and grasshoppers), decide what type
of insect in the following one
grasshoppers
Slide 10
Artificial Intelligence Machine Learning
11. Going to Real World
How can I put a katydid or a g
p y grasshopper into my
pp y
computer?
Slide 11
Artificial Intelligence Machine Learning
12. Going to Real World
Thus, the classification problem has been reduced to
, p
Insect Abdomen Antennae Insect
ID Length
L th Length
L th Class
Cl
1 2.7 5.5 Grasshopper
2 8.0 9.1 Katydid
3 0.9
09 4.7
47 Grasshopper
4 1.1 3.1 Grasshopper
5 5.4 8.5 Katykid
6 2.9 1.9 Grasshopper
7 6.1 6.6 Katydid
8 0.5 1.0 Grasshopper
9 8.3 6.6 Katydid
10 8.1
81 4.7
47 Katydid
We have an observation with abdomen length 5 1 and
5.1
antennae length 7?
Slide 12
Artificial Intelligence Machine Learning
13. Going to Real World
Actually, we could write that
y,
How do I classify this domain?
Slide 13
Artificial Intelligence Machine Learning
14. How to Create Classification Models
We will study some of this methods:
The decision tree C4 5
C4.5
The instance based classifier kNN
The probabilistic classifier Naïve Bayes
Slide 14
Artificial Intelligence Machine Learning
15. Regression or Prediction
Prediction vs data classification
Similarities: Both learn from a data set
Difference:
Diff
In classification, each example has a class associated
In
I prediction, each example has a numerical value
di ti h lh ill
associated
Slide 15
Artificial Intelligence Machine Learning
16. How to Extract a Model?
Prediction works analogously to data classification
Use
U an algorithm to b ild a model
build
l ih dl
Use this model to predict the new unknown example
Types of regression
Linear and multiple regression
Non-linear regression
Two of the most-used approaches to regression
pp g
Neural networks
F lb d t
Fuzzy rule-based systems
Slide 16
Artificial Intelligence Machine Learning
17. Clustering
The clustering problem
gp
Given a data base D={t1, t2, …, tn} of transactions and an
integer value k, the c us e g p ob e refers to de e a
ege a ue , e clustering problem e e s o define
mapping f: D {1,…, k} where each ti is assigned to one cluster
kj, 1<=j<=k
Main difference with classification
In classification, each example is labeled with a class
classification
In clustering, examples are not labeled
Examples of clustering
Segment customer data base based on
similar buying patterns
Group houses in a town into
G h i t it
neighborhoods based on similar features
Identify new plant species
Identify similar web usage patterns
Slide 17
Artificial Intelligence Machine Learning
18. Example of Clustering
Put these people in different clusters
pp
Which are the keys?
Define what’s similar
Group similar things in
different clusters
Size of the clusters?
Which type of clustering do I want?
Hierarchical clustering?
Partition-based clustering?
Slide 18
Artificial Intelligence Machine Learning
20. How to Group the Elements?
Slide 20
Artificial Intelligence Machine Learning
21. Which Type of Clustering?
Many types of clustering
y yp g
Hierarchical: Nested set of clusters
Partition-based: One set of clusters
Incremental: Each element handled at one time
Simultaneous: All elements h dl d t
Si lt l t handled together
th
Overlapping/non-overlapping
Hierarchical Clustering Partition-based Clustering
Slide 21
Artificial Intelligence Machine Learning
22. Association Rules
Given a set of items I={I1, I2, …, Im} and a database of
{, , , }
transactions D={t1, t2, …, tn} where ti={Ii1, Ii2, …, Iik}
and Iij Є I
The association rule problem is to identify all the rules
with form
X Y
Rules ith minimum s pport
R les with minim m support and confidence
Support: Fraction of transactions which contain both X and Y
Confidence: Measures of how often items in Y appear in
transactions that contain X
Slide 22
Artificial Intelligence Machine Learning
23. Example Association Rules
I = {Beer, Bread Jelly Milk PeanutButter}
{Beer Bread, Jelly, Milk,
Support of {Bread, PeanutButter} is 60%
Slide 23
Artificial Intelligence Machine Learning
25. Before Finishing…
Some environments that contain algorithms to perform
g p
data classification, regression, clustering and
association rule mining
KEEL: http://www keel es
http://www.keel.es
Weka: http://www.cs.waikato.ac.nz/ml/weka/
Rapid Miner: http://rapid-i.com/content/blogcategory/38/69/
Slide 25
Artificial Intelligence Machine Learning
26. Next Class
Start with data classification
C4.5
Slide 26
Artificial Intelligence Machine Learning
27. Introduction to Machine
Learning
Lecture 4
Slides based on Francisco Herrera course on Data Mining
Albert Orriols i Puig
aorriols@salle.url.edu
i l @ ll ld
Artificial Intelligence – Machine Learning
Enginyeria i Arquitectura La Salle
gy q
Universitat Ramon Llull