1. VihangShah
Data mining
Introduction
Data mining is a process of retrieving data from huge database. Data mining is
automatically searching large data to discover patterns and trends that is different from
simple analysis. Data mining is also known as Knowledge Discovery in Data (KDD).
Data mining Process
Problem Definition
Problem definition in this stage the need of project, objective of project and
requirements are defined and from that the basic plan should be implement on primary
level.
Problem
Defination
Data Gathering
& Preparation
Model building
& Evaluation
Knowledge
Deployment
2. VihangShah
Data gathering & Preparation
As you know in earlier phase you collect all requirements in this phase the additional data or
some data be omitted for further phases. This is also a time to identify data quality problem.
In short data preparation can significantly improve the information that can be discovered
through data mining. The outcome of the data preparation is final data set.
Once the data sources are identified, they need to be selected, cleaned, constructed and
formatted into the desired form.
Model Building and evaluation
In this phase selection and apply various modeling techniques for retrieving optimal values.
The test will be generated to validate the quality and validity of the model. One or more
model are created and run on the prepared dataset.
Knowledge deployment
The knowledge or information which we gain from data mining process need to present in
such a way that it will be use when we need knowledge or information. In this phase the
plans for deployment, maintenance and monitoring have to be created for implementation
and also future supports.
What can data mining do and Not Do?
Do:-
Data mining can help to find pattern and relationships within your data.
Data mining help you to discover hidden information in your data.
Data mining actually give optimize result from huge databases.
Data mining can help you to analyze the data for future use.
3. VihangShah
Not Do:-
Data mining cannot work automatically.
Data mining cannot give you information about value of the information to your
organization.
Data mining does not eliminate the need to know your business, to understand your
data.
Data Mining Technique
Data mining have basically six different techniques and that are Association, classification,
clustering, prediction, sequential pattern and decision tree.
Association
Association basically works on relation between items that why it also called relation
technique. It is used in marketing analysis to identify a set of customer’s frequently
purchase together.
Retailers are using association technique to research customer’s buying habits. Based on
historical sale data, retailers might found out that customers buy bread they also buy butter.
Classification
Classification is used to classify each item into predefined set of data or group. For example:
- We can apply classification in application that gives all records of employees who left the
company, predict who will probably leave the company in a future period.
Clustering
In clustering the classes are defined and the objects are put in each class, while in
classification technique object are assigned into predefined classes.
For example:- Consider book management in library there is wide range of book that having
a different topic. So now reader must have easy searching facility of books that having same
topics so for that we make a cluster that can keep books that have some kind of similarities
in one cluster or one shelf and label it with a meaningful name.
4. VihangShah
Prediction
Prediction is technique that predicts relationship between independent variable and
relationship between dependent and independent variables.
For instance the prediction technique can be used in sales to predict profit for the future if
we consider sale is an independent variable, profit could be a dependent variable.
Sequential Patterns
This technique seeks to discover or identity similar patterns, regular events or trends in
transaction data over a business period.
Decision Tree
It is most used technique of data mining because it is easy to understand. In this the root of
decision tree is a simple question or condition that has a multiple answers.
Each answer leads to a set of questions or conditions that help us determine the data.
Note: - we often combine two or more data mining techniques together to form an
appropriate process that meets the business needs.
Data mining Applications
Data mining help in marketing such as it will used for analysis to provide information
on what product together, when they were bought and in what sequence and it will
also help to find customer’s behavior.
Data mining help in banking/finance sector such as it will used to identify customer
loyalty by analyzing the data of customer’s purchasing activities and it will also help
retain credit card customers.
Data mining help in health care and insurance sector such as it will analysis the
claims which medical procedures are claimed together and it will also forecasts
which customer will potentially purchase new policies.
NOTE: - Data mining is also used to analyze the data in many sectors.