2. Table of Contents
Introduction
Ø
Ø Data
Ø
Ø
Mining Process
Ø Classification
Ø Association
Ø
Ø
Ø
Ø
Motivation
Literature Survey
Problem Formulation
Objectives
Ø
Methodology
Facilities Required
References
3. Data Mining
Data mining computational process of finding patterns
in large data sets including methods at the intersection
of machine learning, artificial intelligence, statistics
and database systems. The main focus of data mining
process is to obtain information from the data and
converted it into an knowledgeable and reasonable
structure for further use.
5. Classification
Classification is the problem of identifying to which of
a set of categories a new observation belongs, on the
basis of a training set of data containing observations
(or instances) whose category membership is known.
6. Association
Association learning method for discovering interesting
relations between variables in large databases. It is
intended to identify strong rules discovered in
databases using different measures of interestingness.
For example, the rule :
{onions, potatoes} => {burger}.
7. Example : The Weather Problem
ID
outlook
temperature
humidity
windy
play
1
sunny
hot
high
false
no
2
sunny
hot
high
true
no
3
overcast
hot
high
false
yes
4
rainy
mild
high
false
yes
5
rainy
cool
normal
false
yes
6
rainy
cool
normal
true
no
7
overcast
cool
normal
true
yes
8
sunny
mild
high
false
no
9
sunny
cool
normal
false
yes
10
rainy
mild
normal
false
yes
11
sunny
mild
normal
true
yes
12
overcast
mild
high
true
yes
13
overcast
hot
normal
false
yes
14
rainy
mild
high
true
no
10. Literature Survey
Ø
Liao et al. [8] author report about data mining techniques and application,
development through a survey of literature, form 2000 to 2011. Paper surveys
three areas of data mining research: knowledge types, analysis types, and
architecture types. A discussion deals with future progress in social science and
Engineering methodologies implement data mining techniques and the development
of applications in problem- oriented
Ø
The first association rule mining algorithm was the Apriori algorithm [3] developed
by Agrawal, and swami. The Apriori algorithm generates the candidate item sets in
one pass through only the item sets with large support in the previous pass, without
considering the transactions in the database.
11. Continue…
Ø
Kwon et al.[9] evaluated the data set features are most affective on
classification algorithms performance. It is a complex problem to find out
which algorithm is highly effective in relation to which data set. Author’s
research experimentally examines how data set characteristics affect
algorithm performance, in terms of elapsed time and accuracy.
Ø
B. Liu et al. [2] presented an associative classification, to integrate
classification rules and association rule mining. The integration is done by
focusing on mining a special subset of association rules whose consequent
parts are restricted to the classification class labels, called Class Association
Rules (CARs).
12. Problem Formulation
Ø
Associative and classification suffers from inefficiency due to the fact that it
often generates a very large number of rules in association rule mining.
Often this leads to generation of a large number of insignificant rules and
at the same time good rules with relatively low support are not produced. It
takes efforts to select high quality rules from among them.
Ø
Most of the associative classification algorithms adopt the exhaustive search
method presented in the famous Apriori algorithm to discover the rules and
require multiple passes over the database. Furthermore, they find frequent
items in one phase and generate the rules in a separate phase consuming
more resources such as storage and processing time.
13. Objectives
Ø
Ø
Ø
Purpose a framework that can generate
Classification Association Rules (CARs) efficiently.
Perform evaluation of proposed approach.
Comparative analysis of proposed Algorithm with
other state-of-the-art techniques.
14. Methodology
Ø
Ø
Ø
Ø
Review of the classification and association rule
generation methods.
Understanding the existing model associative
classification.
Implement a classification system based on
association rules and compare the performance of
several model construction methods or algorithms in
Weka environment.
Comparison of proposed approach with exiting
methods.
16. References
Ø
Ø
Ø
Ø
Ø
Ø
Tom M. Mitchell, “Machine Learning”, 1st ed.U.K.: McGraw-Hill, 1997.
Bing Liu, Wynne Hsu, and Yiming Ma, “Integrating classification and association rule
mining”. In Knowledge Discovery and Data Mining, New York, vol. 2, pp 80–86,
1998.
R. Agrawal and R. Srikant, “Fast algorithms for mining association rules”, In VLDB,
pp. 487-499, Santiago, Chile, September 12-15, 1994.
Wenmin Li, Jiawei Han, and Jian Pei, “CMAR: Accurate and efficient classifi- cation
based on multiple class-association rules”. In ICDM'01 Proc. of the 2001 IEEE
International Conference on Data Mining, pp 369–376, IEEE Computer Society
Washington, DC, USA , 2001.
X. Yin and J. Han, “CPAR: Classification based on Predictive Association Rules,” Proc.
SIAM Int. Conf. on Data Mining, pp. 331-335, San Francisco, CA, May 2003.
Thabtah, Fadi Abdeljaber, “A review of associative classification mining”. Knowledge
Engineering Review, vol. 1, pp. 37-65, 2007.
17. Continue …
Ø
Ø
Ø
Ø
T.V.Mahendra, N.Deepika and N.Keasava Rao, “Data Mining for High Performance
Data Cloud using Association Rule Mining”, International Journal of Advanced
Research in Computer Science and Software Engineering, vol. 2, Issue 1, 2012.
S. H. Liao, P. H. Chu, and P. Y. Hsiao, “Data mining techniques and applications – A
decade review from 2000 to 2011”, Elsevier Expert Systems with Applications, vol.
39, pp. 11303–11311, 2012.
Ohbyung Kwon and Jae Mun Sim, “Effects of data set features on the performances
of classification algorithms”, Expert Systems with Applications, vol. 40, pp. 1847–
1857, 2013.
http://www.infovis-wiki.net/index.php?title=File:Fayyad96kdd-process.png