SlideShare una empresa de Scribd logo
1 de 32
Data Mining
Spring 2007
• Frequent-Pattern Tree Approach
Towards ARM
Lecture 11-12
2
In this lecture
The lecture is based on
• Jiawei Han, Jian Pei, Yiwen Yin And Runying Mao Data,
“Mining Frequent Patterns without Candidate
Generation: A Frequent-Pattern Tree Approach”,
Mining and Knowledge Discovery, Kluwer Academic
Publishers, 2004
• Jiawei Han, Jian Pei, Yiwen Yin, “Mining Frequent
Patterns without Candidate Generation”, In Proc. 2000
ACM SIGMOD Int. Conf. Management of Data (SIGMOD’00), Dallas,
TX, pp. 1–12.
Some slides are adapted from official text book slides of
• Jiawei Han and Micheline Kamber, “Data Mining: Concepts and
Techniques”, Morgan Kaufmann Publishers, August 2000
3
Is Apriori Fast Enough? — Performance
Bottlenecks
• The core of the Apriori algorithm:
– Use frequent (k – 1)-itemsets to generate candidate frequent k-
itemsets
– Use database scan and pattern matching to collect counts for the
candidate itemsets
• The bottleneck of Apriori: candidate generation
– Huge candidate sets:
• 104
frequent 1-itemset will generate 107
candidate 2-itemsets
• To discover a frequent pattern of size 100, e.g., {a1, a2, …, a100},
one needs to generate 2100
≈ 1030
candidates.
– Multiple scans of database:
• Needs (n +1 ) scans, n is the length of the longest pattern
4
Mining Frequent Patterns Without
Candidate Generation
• Steps
1. Compress a large database into a compact,
Frequent-Pattern tree (FP-tree) structure
1. highly condensed, but complete for frequent pattern mining
2. avoid costly database scans
2. Develop an efficient, FP-tree-based frequent pattern
mining method
1. A divide-and-conquer methodology: decompose mining
tasks into smaller ones
2. Avoid candidate generation: sub-database test only!
5
FP-tree Construction
Item frequency head
f 4
c 4
a 3
b 3
m 3
p 3
TID Items bought (ordered) frequent items
100 {f, a, c, d, g, i, m, p} {f, c, a, m, p}
200 {a, b, c, f, l, m, o} {f, c, a, b, m}
300 {b, f, h, j, o} {f, b}
400 {b, c, k, s, p} {c, b, p}
500 {a, f, c, e, l, p, m, n} {f, c, a, m, p}
Steps:
1. Scan DB once, find frequent 1-
itemset (single item pattern)
2. Order frequent items in frequency
descending order
3. Scan DB again, construct FP-tree
6
• Steps Contd. (Example)
– Scan of the first transaction leads to the
construction of the first branch of the tree
listing
{}
f:1
c:1
a:1
m:1
p:1
FP-tree Construction (contd.)
(ordered) frequent items
{f, c, a, m, p}
{f, c, a, b, m}
{f, b}
{c, b, p}
{f, c, a, m, p}
7
{}
f:2
c:2
a:2
b:1m:1
p:1 m:1
FP-tree Construction (contd.)
(ordered) frequent items
{f, c, a, m, p}
{f, c, a, b, m}
{f, b}
{c, b, p}
{f, c, a, m, p}
• Steps Contd. (Example)
– Scan of the first transaction leads to the
construction of the first branch of the tree
listing
– Second transaction shares a common prefix
with the existing path the count of each node
along the prefix is incremented by 1
– Two new nodes are created and linked as
children of (a:2) and (b:1) respec.
8
• Steps Contd. (Example)
– Scan of the first transaction leads to the
construction of the first branch of the tree
listing
– Second transaction shares a common prefix
with the existing path the count of each node
along the prefix is incremented by 1
– Two new nodes are created and linked as
children of (a:2) and (b:1) respec.
– Similarly for the third transaction
{}
f:3
b:1c:2
a:2
b:1m:1
p:1 m:1
FP-tree Construction (contd.)
(ordered) frequent items
{f, c, a, m, p}
{f, c, a, b, m}
{f, b}
{c, b, p}
{f, c, a, m, p}
9
• Steps Contd. (Example)
– Scan of the first transaction leads to the
construction of the first branch of the tree
listing
– Second transaction shares a common prefix
with the existing path the count of each node
along the prefix is incremented by 1
– Two new nodes are created and linked as
children of (a:2) and (b:1) respec.
– Similarly for the third transaction
– The scan of the fourth transaction leads to the
construction of the second branch of the tree,
(c:1), (b:1), (p:1).
{}
f:3 c:1
b:1
p:1
b:1c:2
a:2
b:1m:1
p:1 m:1
FP-tree Construction (contd.)
(ordered) frequent items
{f, c, a, m, p}
{f, c, a, b, m}
{f, b}
{c, b, p}
{f, c, a, m, p}
10
• Steps Contd. (Example)
– Scan of the first transaction leads to the
construction of the first branch of the tree
listing
– Second transaction shares a common prefix
with the existing path the count of each node
along the prefix is incremented by 1
– Two new nodes are created and linked as
children of (a:2) and (b:1) respec.
– Similarly for the third transaction
– The scan of the fourth transaction leads to the
construction of the second branch of the tree,
(c:1), (b:1), (p:1).
– For the last transaction, since its frequent item
list is identical to the first one, the path is
shared.
{}
f:4 c:1
b:1
p:1
b:1c:3
a:3
b:1m:2
p:2 m:1
FP-tree Construction (contd.)
(ordered) frequent items
{f, c, a, m, p}
{f, c, a, b, m}
{f, b}
{c, b, p}
{f, c, a, m, p}
11
• Create a Header
table
– Each entry in the
frequent-item-header
table consists of two
fields,
(1) item-name
(2) head of node-link
(a pointer pointing to
the first node in the
FP-tree carrying the
item-name).
FP-tree Construction (contd.)
{}
f:4 c:1
b:1
p:1
b:1c:3
a:3
b:1m:2
p:2 m:1
Header Table
Item frequency head
f 4
c 4
a 3
b 3
m 3
p 3
12
Mining frequent patterns using FP-tree
• Mining frequent patterns out of FP-tree is based
upon following Node-link property
– For any frequent item ai , all the possible patterns
containing only frequent items and ai can be obtained by
following ai ’s node-links, starting from ai ’s head in the
FP-tree header.
• Lets go through an example to understand the full
implication of this property in the mining process.
13
• For node p, its immediate frequent
pattern is (p:3), and it has two paths
in the FP-tree: (f :4, c:3,
a:3,m:2,p:2) and (c:1, b:1, p:1)
• These two prefix paths of p,
“{( f cam:2), (cb:1)}”, form p’s
conditional pattern base
• Now, we build an FP- tree on P’s
conditional pattern base.
• Leads to an FP tree with one
branch only i.e. C:3 hence the
frequent patter n associated with P
is just CP
{}
f:4 c:1
b:1
p:1
b:1c:3
a:3
b:1m:2
p:2 m:1
Header
Table
Item head
f
c
a
b
m
p
Mining frequent patterns of p
14
Mining frequent patterns of m
• Constructing an FP-tree on m, we derive m’s conditional
FP-tree, f :3, c:3, a:3, a single frequent pattern path.
• This conditional FP-tree is then mined recursively.
m-conditional
pattern base:
fca:2, fcab:1
{}
f:3
c:3
a:3
m-conditional FP-tree
All frequent patterns
concerning m
m,
fm, cm, am,
fcm, fam, cam,
fcam


{}
f:4 c:1
b:1
p:1
b:1c:3
a:3
b:1m:2
p:2 m:1
Header Table
Item frequency head
f 4
c 4
a 3
b 3
m 3
p 3
15
Mining frequent patterns of m
{}
f:3
c:3
a:3
m-conditional FP-tree
Cond. pattern base of “am”: (fc:3)
{}
f:3
c:3
am-conditional FP-tree
Cond. pattern base of “cm”: (f:3)
{}
f:3
cm-conditional FP-tree
Cond. pattern base of “cam”: (f:3)
{}
f:3
cam-conditional FP-tree
16
Mining Frequent Patterns by Creating
Conditional Pattern-Bases
EmptyEmptyf
{(f:3)}|c{(f:3)}c
{(f:3, c:3)}|a{(fc:3)}a
Empty{(fca:1), (f:1), (c:1)}b
{(f:3, c:3, a:3)}|m{(fca:2), (fcab:1)}m
{(c:3)}|p{(fcam:2), (cb:1)}p
Conditional FP-treeConditional pattern-baseItem
17
Single FP-tree Path Generation
• Suppose an FP-tree T has a single path P
• The complete set of frequent pattern of T can be
generated by enumeration of all the combinations of the
sub-paths of P
{}
f:3
c:3
a:3
m-conditional FP-tree
All frequent patterns
concerning m
m,
fm, cm, am,
fcm, fam, cam,
fcam

18
Why Is Frequent Pattern Growth
Fast?
• Our performance study shows
– FP-growth is an order of magnitude faster than Apriori, and is
also faster than tree-projection
• Reasoning
– No candidate generation, no candidate test
– Use compact data structure
– Eliminate repeated database scan
– Basic operation is counting and FP-tree building
19
FP-Growth vs. Apriori: Scalability With the Support
Threshold
0
10
20
30
40
50
60
70
80
90
100
0 0.5 1 1.5 2 2.5 3
Support threshold(%)
Runtime(sec.)
D1 FP-grow th runtime
D1 Apriori runtime
Data set T25I20D10K
#Transactions Items Average Transaction Length
250,000 1000 12
20
null
A:7
B:5
B:3
C:3
D:1
C:1
D:1
C:3
D:1
D:1
E:1
E:1
TID Items
1 {A,B}
2 {B,C,D}
3 {A,C,D,E}
4 {A,D,E}
5 {A,B,C}
6 {A,B,C,D}
7 {B,C}
8 {A,B,C}
9 {A,B,D}
10 {B,C,E}
Pointers are used to assist
frequent itemset generation
D:1
E:1
Transaction
Database
Item Pointer
A
B
C
D
E
Header table
Frequent Itemset Using FP-Growth
(Example)
21
null
A:7
B:5
B:3
C:3
D:1
C:1
D:1
C:3
D:1
E:1
D:1
E:1
Build conditional pattern
base for E:
P = {(A:1,C:1,D:1),
(A:1,D:1),
(B:1,C:1)}
Recursively apply FP-
growth on P
E:1
D:1
FP Growth Algorithm: FP Tree Mining
Frequent Itemset Using FP-Growth
(Example)
22
AA BB CC DD EE
AB AC AD AE BC BD BE CD CE DEAB AC AD AE BC BD BE CD CE DE
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDEABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
ABCDABCD ABCEABCE ABDEABDE ACDE BCDEACDE BCDE
ABCDEABCDE
Frequent Itemset Using FP-Growth
(Example)
23
null
A:2 B:1
C:1
C:1
D:1
D:1
E:1
E:1
Conditional Pattern base
for E:
P = {(A:1,C:1,D:1,E:1),
(A:1,D:1,E:1),
(B:1,C:1,E:1)}
Count for E is 3: {E} is
frequent itemset
Recursively apply FP-
growth on P (Conditional
tree for D within
conditional tree for E)
E:1
Conditional tree for E:
FP Growth Algorithm: FP Tree Mining
Frequent Itemset Using FP-Growth
(Example)
24
AA BB CC DD EE
AB AC AD AE BC BD BE CD CE DEAB AC AD AE BC BD BE CD CE DE
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDEABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
ABCDABCD ABCEABCE ABDEABDE ACDE BCDEACDE BCDE
ABCDEABCDE
Frequent Itemset Using FP-Growth
(Example)
25
Conditional pattern base
for D within conditional
base for E:
P = {(A:1,C:1,D:1),
(A:1,D:1)}
Count for D is 2: {D,E} is
frequent itemset
Recursively apply FP-
growth on P (Conditional
tree for C within
conditional tree D within
conditional tree for E)
Conditional tree for D
within conditional tree
for E:
null
A:2
C:1
D:1
D:1
FP Growth Algorithm: FP Tree Mining
Frequent Itemset Using FP-Growth
(Example)
26
Conditional pattern base
for C within D within E:
P = {(A:1,C:1)}
Count for C is 1: {C,D,E}
is NOT frequent itemset
Recursively apply FP-
growth on P
(Conditional tree for A
within conditional tree D
within conditional tree
for E)
Conditional tree for C
within D within E:
null
A:1
C:1
FP Growth Algorithm: FP Tree Mining
Frequent Itemset Using FP-Growth
(Example)
27
Count for A is 2: {A,D,E}
is frequent itemset
Next step:
Construct conditional tree
C within conditional tree
E
Conditional tree for A
within D within E:
null
A:2
FP Growth Algorithm: FP Tree Mining
Frequent Itemset Using FP-Growth
(Example)
28
null
A:2 B:1
C:1
C:1
D:1
D:1
E:1
E:1
Recursively apply FP-
growth on P (Conditional
tree for C within
conditional tree for E)
E:1
Conditional tree for E:
FP Growth Algorithm: FP Tree Mining
Frequent Itemset Using FP-Growth
(Example)
29
null
A:1 B:1
C:1
C:1
E:1 E:1
FP Growth Algorithm: FP Tree Mining
Conditional pattern base
for C within conditional
base for E:
P = {(B:1,C:1),
(A:1,C:1)}
Count for C is 2: {C,E} is
frequent itemset
Recursively apply FP-
growth on P (Conditional
tree for B within
conditional tree C within
conditional tree for E)Conditional tree for C within conditional
tree for E:
Frequent Itemset Using FP-Growth
(Example)
30
null
A:7
B:5
B:3
C:3
D:1
C:1
D:1
C:3
D:1
D:1
E:1
E:1
TID Items
1 {A,B}
2 {B,C,D}
3 {A,C,D,E}
4 {A,D,E}
5 {A,B,C}
6 {A,B,C,D}
7 {B,C}
8 {A,B,C}
9 {A,B,D}
10 {B,C,E}
D:1
E:1
Transaction
Database
Item Pointer
A
B
C
D
E
Header table
FP Growth Algorithm: FP Tree Mining
Frequent Itemset Using FP-Growth
(Example)
31
AA BB CC DD EE
AB AC AD AE BC BD BE CD CE DEAB AC AD AE BC BD BE CD CE DE
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDEABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
ABCDABCD ABCEABCE ABDEABDE ACDE BCDEACDE BCDE
ABCDEABCDE
Frequent Itemset Using FP-Growth
(Example)
32
AA BB CC DD EE
AB AC AD AE BC BD BE CD CE DEAB AC AD AE BC BD BE CD CE DE
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDEABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
ABCDABCD ABCEABCE ABDEABDE ACDE BCDEACDE BCDE
ABCDEABCDE
Frequent Itemset Using FP-Growth
(Example)

Más contenido relacionado

La actualidad más candente

12. Indexing and Hashing in DBMS
12. Indexing and Hashing in DBMS12. Indexing and Hashing in DBMS
12. Indexing and Hashing in DBMS
koolkampus
 

La actualidad más candente (20)

Disjoint sets
Disjoint setsDisjoint sets
Disjoint sets
 
String matching algorithms
String matching algorithmsString matching algorithms
String matching algorithms
 
AVL Tree in Data Structure
AVL Tree in Data Structure AVL Tree in Data Structure
AVL Tree in Data Structure
 
Knuth morris pratt string matching algo
Knuth morris pratt string matching algoKnuth morris pratt string matching algo
Knuth morris pratt string matching algo
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
Binary search tree(bst)
Binary search tree(bst)Binary search tree(bst)
Binary search tree(bst)
 
12. Indexing and Hashing in DBMS
12. Indexing and Hashing in DBMS12. Indexing and Hashing in DBMS
12. Indexing and Hashing in DBMS
 
Data Structure and Algorithms Binary Search Tree
Data Structure and Algorithms Binary Search TreeData Structure and Algorithms Binary Search Tree
Data Structure and Algorithms Binary Search Tree
 
Unit I - Evaluation of expression
Unit I - Evaluation of expressionUnit I - Evaluation of expression
Unit I - Evaluation of expression
 
10. Search Tree - Data Structures using C++ by Varsha Patil
10. Search Tree - Data Structures using C++ by Varsha Patil10. Search Tree - Data Structures using C++ by Varsha Patil
10. Search Tree - Data Structures using C++ by Varsha Patil
 
Data Structure and Algorithms AVL Trees
Data Structure and Algorithms AVL TreesData Structure and Algorithms AVL Trees
Data Structure and Algorithms AVL Trees
 
Kmp
KmpKmp
Kmp
 
Kadane's continuous subarray algorithm
Kadane's continuous subarray algorithmKadane's continuous subarray algorithm
Kadane's continuous subarray algorithm
 
Polynomial reppresentation using Linkedlist-Application of LL.pptx
Polynomial reppresentation using Linkedlist-Application of LL.pptxPolynomial reppresentation using Linkedlist-Application of LL.pptx
Polynomial reppresentation using Linkedlist-Application of LL.pptx
 
16 Fibonacci Heaps
16 Fibonacci Heaps16 Fibonacci Heaps
16 Fibonacci Heaps
 
Expression trees
Expression treesExpression trees
Expression trees
 
AVL Tree Data Structure
AVL Tree Data StructureAVL Tree Data Structure
AVL Tree Data Structure
 
Rabin karp string matching algorithm
Rabin karp string matching algorithmRabin karp string matching algorithm
Rabin karp string matching algorithm
 
Lec18
Lec18Lec18
Lec18
 
Binary Heap Tree, Data Structure
Binary Heap Tree, Data Structure Binary Heap Tree, Data Structure
Binary Heap Tree, Data Structure
 

Destacado

Survey on Frequent Pattern Mining on Graph Data - Slides
Survey on Frequent Pattern Mining on Graph Data - SlidesSurvey on Frequent Pattern Mining on Graph Data - Slides
Survey on Frequent Pattern Mining on Graph Data - Slides
Kasun Gajasinghe
 
Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data mining
Slideshare
 

Destacado (20)

Improved Frequent Pattern Mining Algorithm using Divide and Conquer Technique...
Improved Frequent Pattern Mining Algorithm using Divide and Conquer Technique...Improved Frequent Pattern Mining Algorithm using Divide and Conquer Technique...
Improved Frequent Pattern Mining Algorithm using Divide and Conquer Technique...
 
Frequent Itemset Mining(FIM) on BigData
Frequent Itemset Mining(FIM) on BigDataFrequent Itemset Mining(FIM) on BigData
Frequent Itemset Mining(FIM) on BigData
 
Frequent itemset mining methods
Frequent itemset mining methodsFrequent itemset mining methods
Frequent itemset mining methods
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Concepts
 
Fp growth
Fp growthFp growth
Fp growth
 
Efficient frequent pattern mining in distributed system
Efficient frequent pattern mining in distributed systemEfficient frequent pattern mining in distributed system
Efficient frequent pattern mining in distributed system
 
Temporal Pattern Mining
Temporal Pattern MiningTemporal Pattern Mining
Temporal Pattern Mining
 
Data preprocessing ng
Data preprocessing   ngData preprocessing   ng
Data preprocessing ng
 
REVIEW: Frequent Pattern Mining Techniques
REVIEW: Frequent Pattern Mining TechniquesREVIEW: Frequent Pattern Mining Techniques
REVIEW: Frequent Pattern Mining Techniques
 
Frequent Pattern Mining - Krishna Sridhar, Feb 2016
Frequent Pattern Mining - Krishna Sridhar, Feb 2016Frequent Pattern Mining - Krishna Sridhar, Feb 2016
Frequent Pattern Mining - Krishna Sridhar, Feb 2016
 
Hadoop implementation for algorithms apriori, pcy, son
Hadoop implementation for algorithms apriori, pcy, sonHadoop implementation for algorithms apriori, pcy, son
Hadoop implementation for algorithms apriori, pcy, son
 
Fp growth
Fp growthFp growth
Fp growth
 
A vertical representation in frequent item set mining
A vertical representation in frequent item set miningA vertical representation in frequent item set mining
A vertical representation in frequent item set mining
 
Survey on Frequent Pattern Mining on Graph Data - Slides
Survey on Frequent Pattern Mining on Graph Data - SlidesSurvey on Frequent Pattern Mining on Graph Data - Slides
Survey on Frequent Pattern Mining on Graph Data - Slides
 
Fp growth algorithm
Fp growth algorithmFp growth algorithm
Fp growth algorithm
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
The comparative study of apriori and FP-growth algorithm
The comparative study of apriori and FP-growth algorithmThe comparative study of apriori and FP-growth algorithm
The comparative study of apriori and FP-growth algorithm
 
4.2 spatial data mining
4.2 spatial data mining4.2 spatial data mining
4.2 spatial data mining
 
Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data mining
 
Data Mining: Applying data mining
Data Mining: Applying data miningData Mining: Applying data mining
Data Mining: Applying data mining
 

Similar a Frequent itemset mining using pattern growth method

ARM_03_FPtreefrequency pattern data warehousing .ppt
ARM_03_FPtreefrequency pattern data warehousing .pptARM_03_FPtreefrequency pattern data warehousing .ppt
ARM_03_FPtreefrequency pattern data warehousing .ppt
ChellamuthuHaripriya
 
FP growth algorithm, data mining, data analystics
FP growth algorithm, data mining, data analysticsFP growth algorithm, data mining, data analystics
FP growth algorithm, data mining, data analystics
AlketaAlia
 
Sequential pattern mining
Sequential pattern miningSequential pattern mining
Sequential pattern mining
kiran said
 
Associations1
Associations1Associations1
Associations1
mancnilu
 
Wireless sensor network Apriori an N-RMP
Wireless sensor network Apriori an N-RMP Wireless sensor network Apriori an N-RMP
Wireless sensor network Apriori an N-RMP
Amrit Khandelwal
 

Similar a Frequent itemset mining using pattern growth method (20)

ARM_03_FPtreefrequency pattern data warehousing .ppt
ARM_03_FPtreefrequency pattern data warehousing .pptARM_03_FPtreefrequency pattern data warehousing .ppt
ARM_03_FPtreefrequency pattern data warehousing .ppt
 
My6asso
My6assoMy6asso
My6asso
 
Cs501 mining frequentpatterns
Cs501 mining frequentpatternsCs501 mining frequentpatterns
Cs501 mining frequentpatterns
 
FP growth algorithm, data mining, data analystics
FP growth algorithm, data mining, data analysticsFP growth algorithm, data mining, data analystics
FP growth algorithm, data mining, data analystics
 
Mining Algorithm for Weighted FP-Growth Frequent Item Sets based on Ordered F...
Mining Algorithm for Weighted FP-Growth Frequent Item Sets based on Ordered F...Mining Algorithm for Weighted FP-Growth Frequent Item Sets based on Ordered F...
Mining Algorithm for Weighted FP-Growth Frequent Item Sets based on Ordered F...
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
FP-growth.pptx
FP-growth.pptxFP-growth.pptx
FP-growth.pptx
 
Associations.ppt
Associations.pptAssociations.ppt
Associations.ppt
 
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
 
Sequential pattern mining
Sequential pattern miningSequential pattern mining
Sequential pattern mining
 
Associations1
Associations1Associations1
Associations1
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
 
Interval intersection
Interval intersectionInterval intersection
Interval intersection
 
5.3 mining sequential patterns
5.3 mining sequential patterns5.3 mining sequential patterns
5.3 mining sequential patterns
 
1.10.association mining 2
1.10.association mining 21.10.association mining 2
1.10.association mining 2
 
Graph mining ppt
Graph mining pptGraph mining ppt
Graph mining ppt
 
Wireless sensor network Apriori an N-RMP
Wireless sensor network Apriori an N-RMP Wireless sensor network Apriori an N-RMP
Wireless sensor network Apriori an N-RMP
 
Fp growth algorithm
Fp growth algorithmFp growth algorithm
Fp growth algorithm
 
06FPBasic02.pdf
06FPBasic02.pdf06FPBasic02.pdf
06FPBasic02.pdf
 
6asso
6asso6asso
6asso
 

Más de Shani729 (20)

Python tutorialfeb152012
Python tutorialfeb152012Python tutorialfeb152012
Python tutorialfeb152012
 
Python tutorial
Python tutorialPython tutorial
Python tutorial
 
Interaction design _beyond_human_computer_interaction
Interaction design _beyond_human_computer_interactionInteraction design _beyond_human_computer_interaction
Interaction design _beyond_human_computer_interaction
 
Fm lecturer 13(final)
Fm lecturer 13(final)Fm lecturer 13(final)
Fm lecturer 13(final)
 
Lecture slides week14-15
Lecture slides week14-15Lecture slides week14-15
Lecture slides week14-15
 
Dwh lecture slides-week15
Dwh lecture slides-week15Dwh lecture slides-week15
Dwh lecture slides-week15
 
Dwh lecture slides-week10
Dwh lecture slides-week10Dwh lecture slides-week10
Dwh lecture slides-week10
 
Dwh lecture slidesweek7&8
Dwh lecture slidesweek7&8Dwh lecture slidesweek7&8
Dwh lecture slidesweek7&8
 
Dwh lecture slides-week5&6
Dwh lecture slides-week5&6Dwh lecture slides-week5&6
Dwh lecture slides-week5&6
 
Dwh lecture slides-week3&4
Dwh lecture slides-week3&4Dwh lecture slides-week3&4
Dwh lecture slides-week3&4
 
Dwh lecture slides-week2
Dwh lecture slides-week2Dwh lecture slides-week2
Dwh lecture slides-week2
 
Dwh lecture slides-week1
Dwh lecture slides-week1Dwh lecture slides-week1
Dwh lecture slides-week1
 
Dwh lecture slides-week 13
Dwh lecture slides-week 13Dwh lecture slides-week 13
Dwh lecture slides-week 13
 
Dwh lecture slides-week 12&13
Dwh lecture slides-week 12&13Dwh lecture slides-week 12&13
Dwh lecture slides-week 12&13
 
Data warehousing and mining furc
Data warehousing and mining furcData warehousing and mining furc
Data warehousing and mining furc
 
Lecture 40
Lecture 40Lecture 40
Lecture 40
 
Lecture 39
Lecture 39Lecture 39
Lecture 39
 
Lecture 38
Lecture 38Lecture 38
Lecture 38
 
Lecture 37
Lecture 37Lecture 37
Lecture 37
 
Lecture 35
Lecture 35Lecture 35
Lecture 35
 

Último

Online crime reporting system project.pdf
Online crime reporting system project.pdfOnline crime reporting system project.pdf
Online crime reporting system project.pdf
Kamal Acharya
 
Final DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manualFinal DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manual
BalamuruganV28
 
Complex plane, Modulus, Argument, Graphical representation of a complex numbe...
Complex plane, Modulus, Argument, Graphical representation of a complex numbe...Complex plane, Modulus, Argument, Graphical representation of a complex numbe...
Complex plane, Modulus, Argument, Graphical representation of a complex numbe...
MohammadAliNayeem
 

Último (20)

Filters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility ApplicationsFilters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility Applications
 
Online crime reporting system project.pdf
Online crime reporting system project.pdfOnline crime reporting system project.pdf
Online crime reporting system project.pdf
 
Operating System chapter 9 (Virtual Memory)
Operating System chapter 9 (Virtual Memory)Operating System chapter 9 (Virtual Memory)
Operating System chapter 9 (Virtual Memory)
 
E-Commerce Shopping using MERN Stack where different modules are present
E-Commerce Shopping using MERN Stack where different modules are presentE-Commerce Shopping using MERN Stack where different modules are present
E-Commerce Shopping using MERN Stack where different modules are present
 
BRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWING
BRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWINGBRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWING
BRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWING
 
Piping and instrumentation diagram p.pdf
Piping and instrumentation diagram p.pdfPiping and instrumentation diagram p.pdf
Piping and instrumentation diagram p.pdf
 
Supermarket billing system project report..pdf
Supermarket billing system project report..pdfSupermarket billing system project report..pdf
Supermarket billing system project report..pdf
 
Artificial Intelligence Bayesian Reasoning
Artificial Intelligence Bayesian ReasoningArtificial Intelligence Bayesian Reasoning
Artificial Intelligence Bayesian Reasoning
 
Electrical shop management system project report.pdf
Electrical shop management system project report.pdfElectrical shop management system project report.pdf
Electrical shop management system project report.pdf
 
15-Minute City: A Completely New Horizon
15-Minute City: A Completely New Horizon15-Minute City: A Completely New Horizon
15-Minute City: A Completely New Horizon
 
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdfInvolute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
 
Theory for How to calculation capacitor bank
Theory for How to calculation capacitor bankTheory for How to calculation capacitor bank
Theory for How to calculation capacitor bank
 
Final DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manualFinal DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manual
 
Multivibrator and its types defination and usges.pptx
Multivibrator and its types defination and usges.pptxMultivibrator and its types defination and usges.pptx
Multivibrator and its types defination and usges.pptx
 
Intelligent Agents, A discovery on How A Rational Agent Acts
Intelligent Agents, A discovery on How A Rational Agent ActsIntelligent Agents, A discovery on How A Rational Agent Acts
Intelligent Agents, A discovery on How A Rational Agent Acts
 
Lab Manual Arduino UNO Microcontrollar.docx
Lab Manual Arduino UNO Microcontrollar.docxLab Manual Arduino UNO Microcontrollar.docx
Lab Manual Arduino UNO Microcontrollar.docx
 
Software Engineering - Modelling Concepts + Class Modelling + Building the An...
Software Engineering - Modelling Concepts + Class Modelling + Building the An...Software Engineering - Modelling Concepts + Class Modelling + Building the An...
Software Engineering - Modelling Concepts + Class Modelling + Building the An...
 
Complex plane, Modulus, Argument, Graphical representation of a complex numbe...
Complex plane, Modulus, Argument, Graphical representation of a complex numbe...Complex plane, Modulus, Argument, Graphical representation of a complex numbe...
Complex plane, Modulus, Argument, Graphical representation of a complex numbe...
 
ChatGPT Prompt Engineering for project managers.pdf
ChatGPT Prompt Engineering for project managers.pdfChatGPT Prompt Engineering for project managers.pdf
ChatGPT Prompt Engineering for project managers.pdf
 
Research Methodolgy & Intellectual Property Rights Series 2
Research Methodolgy & Intellectual Property Rights Series 2Research Methodolgy & Intellectual Property Rights Series 2
Research Methodolgy & Intellectual Property Rights Series 2
 

Frequent itemset mining using pattern growth method

  • 1. Data Mining Spring 2007 • Frequent-Pattern Tree Approach Towards ARM Lecture 11-12
  • 2. 2 In this lecture The lecture is based on • Jiawei Han, Jian Pei, Yiwen Yin And Runying Mao Data, “Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach”, Mining and Knowledge Discovery, Kluwer Academic Publishers, 2004 • Jiawei Han, Jian Pei, Yiwen Yin, “Mining Frequent Patterns without Candidate Generation”, In Proc. 2000 ACM SIGMOD Int. Conf. Management of Data (SIGMOD’00), Dallas, TX, pp. 1–12. Some slides are adapted from official text book slides of • Jiawei Han and Micheline Kamber, “Data Mining: Concepts and Techniques”, Morgan Kaufmann Publishers, August 2000
  • 3. 3 Is Apriori Fast Enough? — Performance Bottlenecks • The core of the Apriori algorithm: – Use frequent (k – 1)-itemsets to generate candidate frequent k- itemsets – Use database scan and pattern matching to collect counts for the candidate itemsets • The bottleneck of Apriori: candidate generation – Huge candidate sets: • 104 frequent 1-itemset will generate 107 candidate 2-itemsets • To discover a frequent pattern of size 100, e.g., {a1, a2, …, a100}, one needs to generate 2100 ≈ 1030 candidates. – Multiple scans of database: • Needs (n +1 ) scans, n is the length of the longest pattern
  • 4. 4 Mining Frequent Patterns Without Candidate Generation • Steps 1. Compress a large database into a compact, Frequent-Pattern tree (FP-tree) structure 1. highly condensed, but complete for frequent pattern mining 2. avoid costly database scans 2. Develop an efficient, FP-tree-based frequent pattern mining method 1. A divide-and-conquer methodology: decompose mining tasks into smaller ones 2. Avoid candidate generation: sub-database test only!
  • 5. 5 FP-tree Construction Item frequency head f 4 c 4 a 3 b 3 m 3 p 3 TID Items bought (ordered) frequent items 100 {f, a, c, d, g, i, m, p} {f, c, a, m, p} 200 {a, b, c, f, l, m, o} {f, c, a, b, m} 300 {b, f, h, j, o} {f, b} 400 {b, c, k, s, p} {c, b, p} 500 {a, f, c, e, l, p, m, n} {f, c, a, m, p} Steps: 1. Scan DB once, find frequent 1- itemset (single item pattern) 2. Order frequent items in frequency descending order 3. Scan DB again, construct FP-tree
  • 6. 6 • Steps Contd. (Example) – Scan of the first transaction leads to the construction of the first branch of the tree listing {} f:1 c:1 a:1 m:1 p:1 FP-tree Construction (contd.) (ordered) frequent items {f, c, a, m, p} {f, c, a, b, m} {f, b} {c, b, p} {f, c, a, m, p}
  • 7. 7 {} f:2 c:2 a:2 b:1m:1 p:1 m:1 FP-tree Construction (contd.) (ordered) frequent items {f, c, a, m, p} {f, c, a, b, m} {f, b} {c, b, p} {f, c, a, m, p} • Steps Contd. (Example) – Scan of the first transaction leads to the construction of the first branch of the tree listing – Second transaction shares a common prefix with the existing path the count of each node along the prefix is incremented by 1 – Two new nodes are created and linked as children of (a:2) and (b:1) respec.
  • 8. 8 • Steps Contd. (Example) – Scan of the first transaction leads to the construction of the first branch of the tree listing – Second transaction shares a common prefix with the existing path the count of each node along the prefix is incremented by 1 – Two new nodes are created and linked as children of (a:2) and (b:1) respec. – Similarly for the third transaction {} f:3 b:1c:2 a:2 b:1m:1 p:1 m:1 FP-tree Construction (contd.) (ordered) frequent items {f, c, a, m, p} {f, c, a, b, m} {f, b} {c, b, p} {f, c, a, m, p}
  • 9. 9 • Steps Contd. (Example) – Scan of the first transaction leads to the construction of the first branch of the tree listing – Second transaction shares a common prefix with the existing path the count of each node along the prefix is incremented by 1 – Two new nodes are created and linked as children of (a:2) and (b:1) respec. – Similarly for the third transaction – The scan of the fourth transaction leads to the construction of the second branch of the tree, (c:1), (b:1), (p:1). {} f:3 c:1 b:1 p:1 b:1c:2 a:2 b:1m:1 p:1 m:1 FP-tree Construction (contd.) (ordered) frequent items {f, c, a, m, p} {f, c, a, b, m} {f, b} {c, b, p} {f, c, a, m, p}
  • 10. 10 • Steps Contd. (Example) – Scan of the first transaction leads to the construction of the first branch of the tree listing – Second transaction shares a common prefix with the existing path the count of each node along the prefix is incremented by 1 – Two new nodes are created and linked as children of (a:2) and (b:1) respec. – Similarly for the third transaction – The scan of the fourth transaction leads to the construction of the second branch of the tree, (c:1), (b:1), (p:1). – For the last transaction, since its frequent item list is identical to the first one, the path is shared. {} f:4 c:1 b:1 p:1 b:1c:3 a:3 b:1m:2 p:2 m:1 FP-tree Construction (contd.) (ordered) frequent items {f, c, a, m, p} {f, c, a, b, m} {f, b} {c, b, p} {f, c, a, m, p}
  • 11. 11 • Create a Header table – Each entry in the frequent-item-header table consists of two fields, (1) item-name (2) head of node-link (a pointer pointing to the first node in the FP-tree carrying the item-name). FP-tree Construction (contd.) {} f:4 c:1 b:1 p:1 b:1c:3 a:3 b:1m:2 p:2 m:1 Header Table Item frequency head f 4 c 4 a 3 b 3 m 3 p 3
  • 12. 12 Mining frequent patterns using FP-tree • Mining frequent patterns out of FP-tree is based upon following Node-link property – For any frequent item ai , all the possible patterns containing only frequent items and ai can be obtained by following ai ’s node-links, starting from ai ’s head in the FP-tree header. • Lets go through an example to understand the full implication of this property in the mining process.
  • 13. 13 • For node p, its immediate frequent pattern is (p:3), and it has two paths in the FP-tree: (f :4, c:3, a:3,m:2,p:2) and (c:1, b:1, p:1) • These two prefix paths of p, “{( f cam:2), (cb:1)}”, form p’s conditional pattern base • Now, we build an FP- tree on P’s conditional pattern base. • Leads to an FP tree with one branch only i.e. C:3 hence the frequent patter n associated with P is just CP {} f:4 c:1 b:1 p:1 b:1c:3 a:3 b:1m:2 p:2 m:1 Header Table Item head f c a b m p Mining frequent patterns of p
  • 14. 14 Mining frequent patterns of m • Constructing an FP-tree on m, we derive m’s conditional FP-tree, f :3, c:3, a:3, a single frequent pattern path. • This conditional FP-tree is then mined recursively. m-conditional pattern base: fca:2, fcab:1 {} f:3 c:3 a:3 m-conditional FP-tree All frequent patterns concerning m m, fm, cm, am, fcm, fam, cam, fcam   {} f:4 c:1 b:1 p:1 b:1c:3 a:3 b:1m:2 p:2 m:1 Header Table Item frequency head f 4 c 4 a 3 b 3 m 3 p 3
  • 15. 15 Mining frequent patterns of m {} f:3 c:3 a:3 m-conditional FP-tree Cond. pattern base of “am”: (fc:3) {} f:3 c:3 am-conditional FP-tree Cond. pattern base of “cm”: (f:3) {} f:3 cm-conditional FP-tree Cond. pattern base of “cam”: (f:3) {} f:3 cam-conditional FP-tree
  • 16. 16 Mining Frequent Patterns by Creating Conditional Pattern-Bases EmptyEmptyf {(f:3)}|c{(f:3)}c {(f:3, c:3)}|a{(fc:3)}a Empty{(fca:1), (f:1), (c:1)}b {(f:3, c:3, a:3)}|m{(fca:2), (fcab:1)}m {(c:3)}|p{(fcam:2), (cb:1)}p Conditional FP-treeConditional pattern-baseItem
  • 17. 17 Single FP-tree Path Generation • Suppose an FP-tree T has a single path P • The complete set of frequent pattern of T can be generated by enumeration of all the combinations of the sub-paths of P {} f:3 c:3 a:3 m-conditional FP-tree All frequent patterns concerning m m, fm, cm, am, fcm, fam, cam, fcam 
  • 18. 18 Why Is Frequent Pattern Growth Fast? • Our performance study shows – FP-growth is an order of magnitude faster than Apriori, and is also faster than tree-projection • Reasoning – No candidate generation, no candidate test – Use compact data structure – Eliminate repeated database scan – Basic operation is counting and FP-tree building
  • 19. 19 FP-Growth vs. Apriori: Scalability With the Support Threshold 0 10 20 30 40 50 60 70 80 90 100 0 0.5 1 1.5 2 2.5 3 Support threshold(%) Runtime(sec.) D1 FP-grow th runtime D1 Apriori runtime Data set T25I20D10K #Transactions Items Average Transaction Length 250,000 1000 12
  • 20. 20 null A:7 B:5 B:3 C:3 D:1 C:1 D:1 C:3 D:1 D:1 E:1 E:1 TID Items 1 {A,B} 2 {B,C,D} 3 {A,C,D,E} 4 {A,D,E} 5 {A,B,C} 6 {A,B,C,D} 7 {B,C} 8 {A,B,C} 9 {A,B,D} 10 {B,C,E} Pointers are used to assist frequent itemset generation D:1 E:1 Transaction Database Item Pointer A B C D E Header table Frequent Itemset Using FP-Growth (Example)
  • 21. 21 null A:7 B:5 B:3 C:3 D:1 C:1 D:1 C:3 D:1 E:1 D:1 E:1 Build conditional pattern base for E: P = {(A:1,C:1,D:1), (A:1,D:1), (B:1,C:1)} Recursively apply FP- growth on P E:1 D:1 FP Growth Algorithm: FP Tree Mining Frequent Itemset Using FP-Growth (Example)
  • 22. 22 AA BB CC DD EE AB AC AD AE BC BD BE CD CE DEAB AC AD AE BC BD BE CD CE DE ABC ABD ABE ACD ACE ADE BCD BCE BDE CDEABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABCDABCD ABCEABCE ABDEABDE ACDE BCDEACDE BCDE ABCDEABCDE Frequent Itemset Using FP-Growth (Example)
  • 23. 23 null A:2 B:1 C:1 C:1 D:1 D:1 E:1 E:1 Conditional Pattern base for E: P = {(A:1,C:1,D:1,E:1), (A:1,D:1,E:1), (B:1,C:1,E:1)} Count for E is 3: {E} is frequent itemset Recursively apply FP- growth on P (Conditional tree for D within conditional tree for E) E:1 Conditional tree for E: FP Growth Algorithm: FP Tree Mining Frequent Itemset Using FP-Growth (Example)
  • 24. 24 AA BB CC DD EE AB AC AD AE BC BD BE CD CE DEAB AC AD AE BC BD BE CD CE DE ABC ABD ABE ACD ACE ADE BCD BCE BDE CDEABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABCDABCD ABCEABCE ABDEABDE ACDE BCDEACDE BCDE ABCDEABCDE Frequent Itemset Using FP-Growth (Example)
  • 25. 25 Conditional pattern base for D within conditional base for E: P = {(A:1,C:1,D:1), (A:1,D:1)} Count for D is 2: {D,E} is frequent itemset Recursively apply FP- growth on P (Conditional tree for C within conditional tree D within conditional tree for E) Conditional tree for D within conditional tree for E: null A:2 C:1 D:1 D:1 FP Growth Algorithm: FP Tree Mining Frequent Itemset Using FP-Growth (Example)
  • 26. 26 Conditional pattern base for C within D within E: P = {(A:1,C:1)} Count for C is 1: {C,D,E} is NOT frequent itemset Recursively apply FP- growth on P (Conditional tree for A within conditional tree D within conditional tree for E) Conditional tree for C within D within E: null A:1 C:1 FP Growth Algorithm: FP Tree Mining Frequent Itemset Using FP-Growth (Example)
  • 27. 27 Count for A is 2: {A,D,E} is frequent itemset Next step: Construct conditional tree C within conditional tree E Conditional tree for A within D within E: null A:2 FP Growth Algorithm: FP Tree Mining Frequent Itemset Using FP-Growth (Example)
  • 28. 28 null A:2 B:1 C:1 C:1 D:1 D:1 E:1 E:1 Recursively apply FP- growth on P (Conditional tree for C within conditional tree for E) E:1 Conditional tree for E: FP Growth Algorithm: FP Tree Mining Frequent Itemset Using FP-Growth (Example)
  • 29. 29 null A:1 B:1 C:1 C:1 E:1 E:1 FP Growth Algorithm: FP Tree Mining Conditional pattern base for C within conditional base for E: P = {(B:1,C:1), (A:1,C:1)} Count for C is 2: {C,E} is frequent itemset Recursively apply FP- growth on P (Conditional tree for B within conditional tree C within conditional tree for E)Conditional tree for C within conditional tree for E: Frequent Itemset Using FP-Growth (Example)
  • 30. 30 null A:7 B:5 B:3 C:3 D:1 C:1 D:1 C:3 D:1 D:1 E:1 E:1 TID Items 1 {A,B} 2 {B,C,D} 3 {A,C,D,E} 4 {A,D,E} 5 {A,B,C} 6 {A,B,C,D} 7 {B,C} 8 {A,B,C} 9 {A,B,D} 10 {B,C,E} D:1 E:1 Transaction Database Item Pointer A B C D E Header table FP Growth Algorithm: FP Tree Mining Frequent Itemset Using FP-Growth (Example)
  • 31. 31 AA BB CC DD EE AB AC AD AE BC BD BE CD CE DEAB AC AD AE BC BD BE CD CE DE ABC ABD ABE ACD ACE ADE BCD BCE BDE CDEABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABCDABCD ABCEABCE ABDEABDE ACDE BCDEACDE BCDE ABCDEABCDE Frequent Itemset Using FP-Growth (Example)
  • 32. 32 AA BB CC DD EE AB AC AD AE BC BD BE CD CE DEAB AC AD AE BC BD BE CD CE DE ABC ABD ABE ACD ACE ADE BCD BCE BDE CDEABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABCDABCD ABCEABCE ABDEABDE ACDE BCDEACDE BCDE ABCDEABCDE Frequent Itemset Using FP-Growth (Example)