Learning The Membership Function Contexts For Mining Fuzzy Association Rules By Using Genetic Algorithms
1. Learning the membership
function contexts for mining
fuzzy association rules by using
genetic algorithms
Jesús Alcalá-Fdez, Rafael Alcalá
María José Gacto, Francisco Herrera
Fuzzy Sets and Systems (2008), article in press
Presenter: Chia-Ming Wang
3. Before we go
• T. Hong, C. Chen,Y. Wu,Y. Lee, Using divide-
and-conquer GA strategy in fuzzy data
mining, in: IEEE Symp. on Fuzzy Systems,
Budapest, Hungary, 2004, pp. 116–121.
• T. Hong, C. Kuo, S. Chi,Trade-off between
time complexity and number of rules for
fuzzy mining from quantitative data, Journal
of Uncertain Fuzziness Knowledge-Based
Systems 9 (5) (2001) 587–604.
Thanks to Prof. Hong who provide me the second paper today.
16. Linguistic terms
Low Middle High Low Middle High
age weight
if age is Middle then weight is High
17. The 2-tuples linguistic
representation
if age is Middle then weight is High
F. Herrera, L. Martínez, A 2-tuple fuzzy linguistic representation model for computing with words, IEEE Trans.
Fuzzy Systems 8 (6) (2000) 746–752.
18. The 2-tuples linguistic
representation
if age is Middle then weight is High
if age is (Middle, 0.3) then weight is (High, -0.1)
(si , αi ), si ∈ S, αi ∈ [−0.5, 0.5)
F. Herrera, L. Martínez, A 2-tuple fuzzy linguistic representation model for computing with words, IEEE Trans.
Fuzzy Systems 8 (6) (2000) 746–752.
29. Traditional GA
Population
(chromosomes)
parents
Evaluation
(fitness)
30. Traditional GA
Population
(chromosomes)
parents
Evaluation
(fitness)
Reproduction
Mating pool
(selection)
31. Traditional GA
Population
(chromosomes)
parents
‣ crossover Genetic Evaluation
‣ mutation operators (fitness)
Mates Reproduction
Mating pool
(recombination) (selection)
32. Traditional GA
Population
(chromosomes)
offsprings parents
‣ crossover Genetic Evaluation
‣ mutation operators (fitness)
Mates Reproduction
Mating pool
(recombination) (selection)
33. GA Used in this paper
• CHC genetic model
• MFs codification and initial gene pool
• Chromosome evaluation
• Crossover operator
34. GA Used in this paper
• CHC genetic model
• MFs codification and initial gene pool
• Chromosome evaluation
• Crossover operator
35. Scheme of CHC model
L. Eshelman, The CHC adaptive search algorithm: How to have safe search when engaging in nontraditional genetic
recombination, Foundations of Genetic Algorithms, Vol. 1, Morgan Kaufmann, Los Altos, CA, 1991, pp. 265–283.
36. Scheme of CHC model
Initialize population
and THRESHOLD
L. Eshelman, The CHC adaptive search algorithm: How to have safe search when engaging in nontraditional genetic
recombination, Foundations of Genetic Algorithms, Vol. 1, Morgan Kaufmann, Los Altos, CA, 1991, pp. 265–283.
37. Scheme of CHC model
Initialize population Crossover of N
and THRESHOLD parents
L. Eshelman, The CHC adaptive search algorithm: How to have safe search when engaging in nontraditional genetic
recombination, Foundations of Genetic Algorithms, Vol. 1, Morgan Kaufmann, Los Altos, CA, 1991, pp. 265–283.
38. Scheme of CHC model
Initialize population Crossover of N
and THRESHOLD parents
Incest prevention
1/2 * hamming distance > L
L = (#Genes *BITSGENE)/4
L. Eshelman, The CHC adaptive search algorithm: How to have safe search when engaging in nontraditional genetic
recombination, Foundations of Genetic Algorithms, Vol. 1, Morgan Kaufmann, Los Altos, CA, 1991, pp. 265–283.
39. Scheme of CHC model
Initialize population Crossover of N Evaluation of the
and THRESHOLD parents New Individuals
L. Eshelman, The CHC adaptive search algorithm: How to have safe search when engaging in nontraditional genetic
recombination, Foundations of Genetic Algorithms, Vol. 1, Morgan Kaufmann, Los Altos, CA, 1991, pp. 265–283.
40. Scheme of CHC model
Initialize population Crossover of N Evaluation of the
and THRESHOLD parents New Individuals
Selection of the best N
individuals between
parents and offsprings
L. Eshelman, The CHC adaptive search algorithm: How to have safe search when engaging in nontraditional genetic
recombination, Foundations of Genetic Algorithms, Vol. 1, Morgan Kaufmann, Los Altos, CA, 1991, pp. 265–283.
41. Scheme of CHC model
Initialize population Crossover of N Evaluation of the
and THRESHOLD parents New Individuals
Selection of the best N
individuals between
parents and offsprings
if NO new individual,
decrement THRESHOLD
L. Eshelman, The CHC adaptive search algorithm: How to have safe search when engaging in nontraditional genetic
recombination, Foundations of Genetic Algorithms, Vol. 1, Morgan Kaufmann, Los Altos, CA, 1991, pp. 265–283.
42. Scheme of CHC model
Initialize population Crossover of N Evaluation of the
and THRESHOLD parents New Individuals
Selection of the best N
individuals between
parents and offsprings
THRESHOLD if NO new individual,
<0 decrement THRESHOLD
L. Eshelman, The CHC adaptive search algorithm: How to have safe search when engaging in nontraditional genetic
recombination, Foundations of Genetic Algorithms, Vol. 1, Morgan Kaufmann, Los Altos, CA, 1991, pp. 265–283.
43. Scheme of CHC model
Initialize population Crossover of N Evaluation of the
and THRESHOLD parents New Individuals
Selection of the best N
individuals between
parents and offsprings
no
THRESHOLD if NO new individual,
<0 decrement THRESHOLD
L. Eshelman, The CHC adaptive search algorithm: How to have safe search when engaging in nontraditional genetic
recombination, Foundations of Genetic Algorithms, Vol. 1, Morgan Kaufmann, Los Altos, CA, 1991, pp. 265–283.
44. Scheme of CHC model
Initialize population Crossover of N Evaluation of the
and THRESHOLD parents New Individuals
Selection of the best N
individuals between
parents and offsprings
no
Restart the population THRESHOLD if NO new individual,
and THRESHOLD <0 decrement THRESHOLD
yes
L. Eshelman, The CHC adaptive search algorithm: How to have safe search when engaging in nontraditional genetic
recombination, Foundations of Genetic Algorithms, Vol. 1, Morgan Kaufmann, Los Altos, CA, 1991, pp. 265–283.
45. GA Used in this paper
• CHC genetic model
• MFs codification and initial gene pool
• Chromosome evaluation
• Crossover operator
48. Initial Gene Pool
chromosome: (c11,...,c1m,c21,...,c2m,...,cn1,...,cnm)
1 item with m MFs
• initial MFs obtained from expert knowledge
• individuals generated at random in [-0.5, 0.5)
70. Fuzzy Support (count)
DB n item
(i)
vj
T ith
(i) (i)
(i) fj1 fjm
bread fj = + ···
Rj1 Rjm
71. Fuzzy Support (count)
DB n item
(i)
vj
T ith
(i) (i)
(i) fj1 fjm
bread fj = + ···
Rj1 Rjm
item
m mf
72. Fuzzy Support (count)
DB n item
(i)
vj degree
T ith
(i) (i)
(i) fj1 fjm
bread fj = + ···
Rj1 Rjm
item
m mf
73. Fuzzy Support (count)
DB n item
(i)
vj degree
T ith
(i) (i)
(i) fj1 fjm
bread fj = + ···
Rj1 Rjm
T
(i)
countjk = fjk item
i=1 m mf
bread.low.count
74. Fuzzy Support (count)
DB n item
(i)
vj degree
T ith
(i) (i)
(i) fj1 fjm
bread fj = + ···
Rj1 Rjm
T
(i)
countjk = fjk item
i=1 m mf
bread.low.count
L1 = {Rjk |countjk ≥ α, 1 ≤ j ≤ n and 1 ≤ k ≤ m
n item
75. Fuzzy Support
x∈L1 f uzzy support
f itness(Cq ) =
suitability(Cq )
76. Fuzzy Support
x∈L1 f uzzy support
f itness(Cq ) =
suitability(Cq )
n
suitability(Cq ) = [overlap f actor(Cqk ) + 1]
k=1
77. Fuzzy Support
x∈L1 f uzzy support
f itness(Cq ) =
suitability(Cq )
n
suitability(Cq ) = [overlap f actor(Cqk ) + 1]
k=1
n item
78. Fuzzy Support
L1
x∈L1 f uzzy support
f itness(Cq ) =
suitability(Cq )
n
suitability(Cq ) = [overlap f actor(Cqk ) + 1]
k=1
n item
79. Fuzzy Support
L1 count / T # transaction
x∈L1 f uzzy support
f itness(Cq ) =
suitability(Cq )
n
suitability(Cq ) = [overlap f actor(Cqk ) + 1]
k=1
n item
80. GA Used in this paper
• CHC genetic model
• MFs codification and initial gene pool
• Chromosome evaluation
• Crossover operator
81. PCBLX Crossover
X = (x1 · · · xn ) Y = (y1 · · · yn ) (xi , yi ∈ [ai , bi ] ⊂ R, i = 1 · · · n)
O1 = (o11 · · · o1n ) [li , u1 ] li = max{ai , xi − Ii · α} u2 = min{bi , xi + Ii · α}
1
i
1
i
O2 = (o21 · · · o2n ) [li , u2 ] li = max{ai , yi − Ii · α} u2 = min{bi , yi + Ii · α}
2
i
2
i
Ii = |xi − yi |
F. Herrera, M. Lozano, A.M. Sánchez, A taxonomy for the crossover operator for real-coded genetic
algorithms: An experimental study. Int. J. Intell. Syst. 18 (2003) 309-338.
82. PCBLX Crossover
X = (x1 · · · xn ) Y = (y1 · · · yn ) (xi , yi ∈ [ai , bi ] ⊂ R, i = 1 · · · n)
O1 = (o11 · · · o1n ) [li , u1 ] li = max{ai , xi − Ii · α} u2 = min{bi , xi + Ii · α}
1
i
1
i
O2 = (o21 · · · o2n ) [li , u2 ] li = max{ai , yi − Ii · α} u2 = min{bi , yi + Ii · α}
2
i
2
i
Ii = |xi − yi |
ai xi yi bi
PCBLX BLX
F. Herrera, M. Lozano, A.M. Sánchez, A taxonomy for the crossover operator for real-coded genetic
algorithms: An experimental study. Int. J. Intell. Syst. 18 (2003) 309-338.
85. Conceptual Flowchart
Learning
Membership Function
Learning
Process
Predefined MFs
Transaction
Database
86. Conceptual Flowchart
Learning
Membership Function
Learning
Process
Predefined MFs
Evaluation
Module
(Fitness)
Transaction
Database
MFs
87. Conceptual Flowchart
Learning Mining Fuzzy
Membership Function Association Rules
Learning
Process
Predefined MFs
Evaluation
Module
(Fitness)
Transaction
Database
MFs
88. Conceptual Flowchart
Learning Mining Fuzzy
Membership Function Association Rules
Learning Fuzzy
Process mining
Predefined MFs Definitive MFs
Evaluation
Module
(Fitness)
Transaction Transaction
Database Database
MFs
89. Conceptual Flowchart
Learning Mining Fuzzy
Membership Function Association Rules
Learning Fuzzy
Process mining
Predefined MFs Definitive MFs
Evaluation
Module
(Fitness)
Transaction Transaction
Database Database Fuzzy
Association Rules
MFs
90. Procedures
Stage 1
1. initialization
2. evaluate the initial chromosomes
1. for all items in transaction, transfer the
quantitative values to fuzzy sets
2. calculate count, fuzzy support
3. calculate fitness
3. set threshold L
4. generate the next population
5. CHC procedure
6. if # run not reach, goto step4
Stage 2
Mining Fuzzy association rules by (Hong 2001)
93. Data Set
Bureau of the Census
FAM95
#63,756 instance
#23 attr.
#10 attr.
This data set was obtained from the Statistics Data Sets Archive website http://www.stat.ucla.edu/data/fpp.
97. Results obtained in the
genetic process
Hong el al.’s approach with the 2-tuples
Support Fitness Fsup Suit #1Itemset
With three linguistic terms
0.2 0.97 10.90 11.18 20
0.5 0.89 11.36 12.64 18
0.7 0.59 6.20 10.33 7
0.9 0.26 2.79 10.52 3
With five linguistic terms
0.2 0.93 10.18 10.93 22
0.5 0.64 7.39 11.80 11
0.7 0.41 0.476 11.60 6
0.9 0.08 0.91 10.92 1
98. Fitness vs Function Evaluation
1
Average Fitness Values.
0.8
0.6
0.4
0.2
0
0 2000 4000 6000 8000 10000
Evaluations
The Proposed Approach Hong et al.'s Approach
99. Frequent 1-itemsets vs minsup
Number of Large 1-itemsets
20
15
10
5
0
0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90
Minimum Support
The Proposed Approach Hong et al.'s Approach Uniform Fuzzy Partition
106. Time vs #Transaction
30.00
25.00
Runtime (minutes)
20.00
15.00
10.00
5.00
0.00
10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Number of Transactions
Proposed Approach Hong et al.'s Approach
107. Time vs #Attribute
30.00
25.00
Runtime (minutes)
20.00
15.00
10.00
5.00
0.00
2 3 4 5 6 7 8 9 10
Number of Attributes
Proposed Approach Hong et al.'s Approach
108. Time vs #Linguistic terms
70.00
Runtime (minutes)
60.00
50.00
40.00
30.00
20.00
3 4 5 6 7
Number of Linguistic Terms
Proposed Approach Hong et al.'s Approach
109. Example of Rules
If number if children is Low and
Classic Fuzzy hours head worked last week is Low
Association Rule then head’s personal income is Low
(Factor of confidence 0.87)
If number if children is (Low, -0.16) and
Rule with 2-Tuples hours head worked last week is (Low, -0.06)
Representation then head’s personal income is (Low, 0.1)
(Factor of confidence 0.99)
113. T. Hong, C. Chen,Y. Wu,Y. Lee, Using divide-and-conquer GA strategy in fuzzy data mining, IEEE Symp. on Fuzzy Systems,
Budapest, Hungary, 2004, pp. 116–121.
120. Reference
• L. Eshelman, The CHC adaptive search algorithm: How to have safe search when
engaging in nontraditional genetic recombination, in: G. Rawlin (Ed.), Foundations of
Genetic Algorithms, Vol. 1, Morgan Kaufmann, Los Altos, CA, 1991, pp. 265–283.
• F. Herrera, L. Martínez, A 2-tuple fuzzy linguistic representation model for computing
with words, IEEE Trans. Fuzzy Systems 8 (6) (2000) 746–752.
• F. Herrera, M. Lozano, A.M. Sánchez, A taxonomy for the crossover operator for real-
coded genetic algorithms: An experimental study. Int. J. Intell. Syst. 18 (2003) 309-338.
• T. Hong, C. Chen, Y. Wu,Y. Lee, Using divide-and-conquer GA strategy in fuzzy data
mining, in: IEEE Symp. on Fuzzy Systems, Budapest, Hungary, 2004, pp. 116–121.
• T. Hong, C. Chen, Y. Wu,Y. Lee, quot;Genetic-Fuzzy Data Mining with Divide-and-Conquer
Strategyquot;, IEEE Transactions on Evolutionary Computation 12 (2) 252-265.
• T. Hong, C. Kuo, S. Chi, Trade-off between time complexity and number of rules for
fuzzy mining from quantitative data, Journal of Uncertain Fuzziness Knowledge-Based
Systems 9 (5) (2001) 587–604.
• H. Ishibuchi, T. Nakashima, T.Yamamoto, Fuzzy association rules for handling continuous
attributes, in: IEEE Internat. Symp. on Industrial Electronics Proceedings, Pusan, Korea,
2001, pp. 118–121.
• P.-N. Tan, M. Steinbach, and V. Kumar. Introduction to Data Mining, Addison Wesley, May
2005.