1) The document discusses using data mining techniques like naive Bayesian classification to help enterprises better understand customer behavior and improve customer relationship management.
2) It proposes using a naive Bayesian classifier to classify customers into groups based on attributes like sex, age, income from a customer database to predict whether they will propose a credit card.
3) An example is provided applying the naive Bayesian classification algorithm to sample customer data to predict whether an unknown customer record would propose a credit card or not.
2. services and management. In this study, we suggest a depicting n measurements made on the sample from
customer classification and prediction model in commercial n attributes, respectively, A1 , A2 ,..., An .
bank that uses collected information of customers as inputs to 2. Suppose that there are m classes,
make a prediction for credit card proposing. In particular, we
C1 , C 2 ,..., C m . Given an unknown data sample, X (having
chose Naive Bayesian classifier from the various data mining
methods because it is easy to apply and maintain. no class label) ,the classifier will predict that X belongs to
The paper is organized as follows. Section 2 provides a the class having the highest posterior probability, conditioned
brief review of previous research and the next section on X . That is, the naïve Bayesian classifier assigns an
describes our proposed model, the Naive Bayesian classifier
unknown sample X to the class C i if and only if
approach. In Section 4, the research of the example is given to
apply the algorithm to customer classification and prediction
P(C i X ) > P(C j X ) for 1 ≤ j ≤ m, j ≠ i .
in commercial bank. The final section presents the
contributions and future researches of this study.
Thus we maximize P(C i X ) . The class C i for which
II. LITERATURE REVIEW
P(C i X ) is maximized is called the maximum posteriori
In recent years, data mining has gained widespread
attention and increasing popularity in the commercial world, hypothesis. By Bayes theorem,
as in [3],[4]. According to the professional and trade literature,
P( X C i ) P(C i )
more companies are using data mining as the foundation for P(C i X ) =
P( X )
strategies that help them outsmart competitors, identify new
customers and lower costs, as in [5]. In particular, data mining
3. As P( X ) is constant for all classes, only
is widely used in marketing, risk management and fraud
control, as in [6].
P( X C i ) P(C i ) need be maximized. If the class prior
Although recent surveys found that data mining had
grown in usage and effectiveness, data mining applications in probabilities are not known, then it is commonly assumed that
the commercial world have not been widely. Literature about the classes are equally likely, that is,
data mining applications in the fields about economic and
P(C1) = P(C 2) = ... = P(C m) and we would therefore
management are still few. With the realization of importance
of business intelligence, we need to strengthen the research on
maximize P( X C i ) . Otherwise, we maximize
data mining applications in the commercial world.
P( X C i ) P(C i ) . Note that the class prior probabilities may
III. NAIVE BAYESIAN CLASSIFIER ALGORITHM
Bayesian classification is based on Bayes theorem. Studies si
be estimated by P(C i ) = , where si is the number of
comparing classification algorithms have found a simple s
Bayesian classifier known as the naive Bayesian classifier to training samples of class C i , and s is the total number of
be comparable in performance with decision and neural training samples.
network classifiers. Bayesian classifiers have also exhibited 4. Given data sets with many attributes, it would be
high accuracy and speed when applied to large databases.
extremely computationally expensive to compute P( X C i ) .
The naive Bayesian classifier works as follows, as in[7]:
1. Each data sample is represented by an
In order to reduce computation in evaluating P( X C i ) , the
n-dimensional feature vector X = ( x1 , x2 ,..., x n) ,
naive assumption of class conditional independence is made.
3. This presumes that the values of the attributes are Suppose commercial banks hope to increase the customers
conditionally independent of one another, given the class label who will propose credit card. There is a large number of
of the sample, that is, there are no dependence relationships valuable customer information in huge amounts of data
among the attributes. Thus, accumulated by commercial banks, which is used to identify
n
customers and provide decision support. We wish to predict
P( X C i = ∏ P ( x k C i ) . the class label of an unknown sample using naive Bayesian
k =1
classification, given the training data as Table 1. The data
samples are described by the attributes: sex, age, student and
The probabilities P( x1 C i ) , P( x2 C i ) …,
income. The class label attribute, creditcard_proposing has
two distinct values(namely,{yes, no}).
P( x n C i ) can be estimated from the training samples, where
Table 1 Training data from the customer database
RID sex age Student income credit card_
sik , where
(a) If Ak is categorical, then P ( x k C i) = proposing
si 1 male >45 no high yes
sik Ci
is the number of training sample of class having the 2 female 31~45 no high yes
value x k forAk , and si is the number of training samples 3 female 20~30 yes low yes
belonging to C i . 4 male <20 yes low no
(b) If Ak is continuous-valued, then the attribute is 5 female 20~30 yes medium no
typically assumed to have a Gaussian distribution so that 6 female 20~30 no medium yes
P( x k C i ) = g ( x k , μ , ) 7 female 31~45 no high yes
Ci σ Ci
2 8 male 31~45 yes medium no
(xk −μ )
1 −
Ci
9 male 31~45 no medium yes
=
2π σ C i
e 2 σ Ci
2
10 female <20 yes low yes
11 female 31~45
g ( xk , μ , ) is the normal density function for no medium yes
Ci σ Ci
Where
12 male 20~30 no medium no
13 male <20
Ak , while μ C i and σ C i are the mean and
yes low no
attribute
14 female >45 no high no
standard deviation, respectively, given the values for attribute 15 male 20~30 yes low yes
Ak for training samples of class C i . C1 correspond to the class creditcard_proposing
Let
5. In order to classify an unknown sample X , =”yes” and C2 correspond to the class
creditcard_proposing=”no”. The unknown sample we wish
P( X C i ) P(C i ) is evaluated for each class C i . Sample X
to classify is
is then assigned to the class Ci if and only if
X = ( sex =" female" , age ="31 ~ 45" ,
P ( X C i ) P (C i ) > P ( X C j ) P (C j ) student =" no" , income =" medium" )
for 1 ≤ j ≤ m, j ≠ i . We need to maximize P( X C i ) P(C i ) , for i = 1,2 .
In other words, it is assigned to the class C i for which
P(C i ) , the prior probability of each class, can be computed
P( X C i ) P(C i ) is the maximum.
based on the training samples:
P (creditcard _ propo sin g =" yes" ) = 10 / 15 = 0.667
IV AN EXAMPLE P (creditcard _ propo sin g =" no" ) = 5 / 15 = 0.333
4. are faced with lots of data from databases, we can make
To compute P( X C i ) , for i = 1,2 , we compute the
classification and prediction by special data mining software
following conditional probabilities: as SPSS Clementine or SQL server 2005.
P(sex =" female" creditcard _ propo sin g =" yes" )
V. CONCLUSION
=7/10=0.7 Data mining provides the technology to analyze mass
volume of data and/or detect hidden patterns in data to
P ( sex =" female" creditcard _ propo sin g =" no" )
convert raw data into valuable information. This paper mainly
=1/5=0.2 focused on the research of the customer classification and
prediction in Customer Relation Management concerned with
P ( age ="31 ~ 45" creditcard _ propo sin g =" yes" )
data mining based on Naive Bayesian classification algorithm,
=4/10=0.4 which have a try to the optimization of the business process.
The study will help the company to analyze and forecast
P ( age ="31 ~ 45" creditcard _ propo sin g =" no" )
customer’s pattern of consumption, and the premise of
=1/5=0.2 personalized marketing services and management. Although
the paper focuses mainly on the banking industry, the issues
P( student =" no" creditcard _ propo sin g =" yes" )
and applications discussed are applicable to other industries,
=7/10=0.7 such as insurance industry, retail industry, manufacture
industries, and so on.
P( student =" no" creditcard _ propo sin g =" no" )
=1/5=0.2 REFERENCES
[1] Jenkins D(1999). “Customer relationship management and the data
P(income =" medium" creditcard _ propo sin g =" yes" ) =
warehouse” [J]. Call center Solutions, 1892):88-92
3/10=0.3 [2] Chung HM and P Gray(1999). “Data mining” [J]. Journal of
MIS,16(1):11-13
P(income =" medium" creditcard _ propo sin g =" no" ) =3/
[3]HOKEY MIN, Developing the Profiles of Supermarket Customers through
5=0.6 Data Mining[J], The Service Industries Journal, Vol.26, No.7, October 2006,
Using the above probabilities, we obtain pp.747–763
P( X creditcard _ propo sin g =" yes" ) [4] Balaji Padmanabhan, Alexander Tuzhilin , On the Use of Optimization for
= 0.7 × 0.4 × 0.7 × 0.3 = 0.0588 Data Mining: Theoretical Interactions and eCRM Opportunities[J],
P( X creditcard _ propo sin g =" no" ) Management Science . Vol. 49, No. 10, October 2003, pp. 1327 1343
= 0.2 × 0.2 × 0.2 × 0.6 = 0.0048 [5] Davis B(1999). “Data mining transformed” [J]. Informationweek,
P( X creditcard _ propo sin g =" yes" ) P(creditcard _ propo sin g =" yes" ) 751:86-88
= 0.0588 × 0.667 = 0.0392 [6] Kuykendall L(1999).”The data-mining toolbox”[J]. Credit Card
P( X creditcard _ propo sin g =" no" ) P(creditcard _ propo sin g =" no" ) Management,12(6):30-40
= 0.0048 × 0.333 = 0.0016 [7] Han, J. and Kamber M., (2001) Data Mining: Concepts and
Therefore, we predicts creditcard_proposing=”yes” for Techniques[M], San Francisco, CA: Morgan Kaufmann Publishers
sample X. Of course, the example above only illustrates the [8] Kantardzic, M. (2003) Data Mining: Concepts, Models, Methods, and
Bayesian classification algorithm with training data. When we Algorithms[M], Piscataway, NJ: IEEE Press.