2. Agenda
u Introduction
u Data Mining Techniques
u Neural Networks for Data Mining?
Neural Networks Classification
Neural Networks Pruning
Neural Networks Rule Extraction
u Conclusion
u Questions?
3. Extraction of interesting (non-trivial,
implicit, previously unknown and
potentially useful) information or patterns
from data in large databases
It is an essential step in the process of
knowledge discovery.
Data Mining
4. • data cleaning
• data integration
• data selection
• data transformation
• data mining
• pattern evaluation
• knowledge presentation.
Steps of Knowledge Discovery
5. Data Mining: A KDD Process
Data mining—core of
knowledge discovery
process
Data Cleaning
Data Integration
Databases
Data Warehouse
Task-relevant Data
Selection
Data Mining
Pattern Evaluation
6. Why Data mining?
Data explosion problem
Automated data collection tools and
mature database technology lead to
tremendous amounts of data stored in
databases, data warehouses and other
information repositories
We are drowning in data, but starving for
knowledge!
Solution: Data warehousing and data mining
7. Tasks of data mining
Concept Description
Association
Classification
Prediction
Cluster Analysis
Outlier Analysis
8. Classification
It is the process of finding a model
that is able to predict the class of
objects whose label is unknown.
For eg. It can classify the customers who can pay the
loan based on the existing records in the bank database.
9. Decision trees
Bayesian classification
Neural networks
Genetic algorithms
Memory Based Reasoning
etc.,
Classification methods
10. high tolerance of noisy data.
ability to classify patterns on which they
have not been trained.
can be used when there is little
knowledge of the relationships between
attribute and classes.
Why Neural networks?
11. well suited for continuous valued inputs
and outputs unlike most decision tree
algorithms.
rules can be extracted easily by available
techniques from trained neural network.
Why Neural networks?
- Contd.
12. Neural Networks
It is the study of how to make
computers to make sensible
decisions and to learn by ordinary
experience as we do.
13. Neurons
The human brain consists has about 100 billion neurons and
100 trillion connections (synapses) between them.
Here is what a typical neuron looks like:
Many highly specialized types of neurons exist, and these
differ widely in appearance. Characteristically, neurons are
highly asymmetric in shape.
14. It consists of an input layer, one or more
hidden layers and an output layer.
Input Layer
Hidden Layer
Output Layer
Structure of Multi layer feed forward neural network
Multi layer feed forward neural network
15. Backpropagation
Backpropagation is the neural network
learning algorithm.
It learns by iteratively processing a
dataset of training examples, comparing the
network's prediction for each example with
the actual known target value.
16. Overview of BP
The backpropagation algorithm learns
the network by iteratively processing the
np training examples of a dataset,
comparing the networks result ok for
each example with the desired known
target value dk for each target class k in
a dataset.
17. Consider a fully connected three layer
feedforward neural network as in figure ,
X1
X2
Xi
Xl
…
…
…
h1
O1
…
w11
w12
wl1
wlm
hm
On
v11
v12
vm1
vmn
Overviewof BP – Contd.
Bias (-1)
Bias (-1)
h2
18. Consists of l input neurons, m hidden
neurons and n output neurons
np be the number of examples consider
for training.
Let xip be the ith input unit of pth
example in a dataset, where i= 1, 2,… l.
Wij be the weight between input unit
neuron i to hidden unit neuron j, where
j=1,2…m,
Overview of BP – Contd.
19. vjk be the weight between hidden neuron
j to output neuron k, where k=1, 2,… n.
initially the weights wij and vjk takes the
random value between -1 to 1.
Let hj be the activation value of the
hidden neuron j
ok be the actual output of the kth neuron.
Overview of BP – Contd.
20. Bias
• It is a threshold value that serves to
vary the activity of the neuron.
• The bias input is fixed and always
equals -1.
Overview of BP – Contd.
21. The activation value of hidden neuron
hj for pth examples can be calculated by,
Overview of BP – Contd.
23. Weights are modified for each example
so as to minimize the mean squared
error (mse).
The value of mse can be calculated
according to the following equation
Overview of BP – Contd.
24. Weight updation are made in the
backward direction i.e., from the
output layer through hidden layer
and to input layer.
Overview of BP – Contd.
25. Learning Rate λ
avoids local minimum (where the
weights appear to converge but are not
at the optimal solution).
encourages finding global minimum.
Typically having a value between 0.0 to
1.0.
Overview of BP – Contd.
26. For each unit k in the output layer
compute the Error using
Errk = ok(1-ok)(dk-ok)
For each weight vjk in network
compute weight increment using
Δvjk=(λ) Errk*hj
update the weight vjk using
vjk = vjk + Δvjk
Overview of BP – Contd.
27. For each unit j in the hidden layers, from
the last to the first hidden layer
compute the Error using
Errj = hj (1-hj) Σ Errk*vjk;
For each weight wij in network
compute weight increment using
Δwij=(λ)*Errj*xip
update the weight wij using
wij = wij + Δwij
Overview of BP – Contd.
28. Overview of BP – Contd.
For each bias Ǿj in network
compute the bias increment
using
Δ Ǿj = (λ)*Errj
update the bias weight using
Ǿj = Ǿj + Δ Ǿj
29. The algorithm stops the learning when,
• The mean squared error is below a
threshold value.
• A pre specified number of epochs has
expired
Overview of BP – Contd.
30. Random data selection method
The training and testing examples are
taken randomly from each class.
K-fold cross validation method
Example
The iris dataset is having 3 classes with
50 examples for each class. From each
class 25 examples are taken randomly for
training and another 25 examples are
taken randomly for testing the network.
Data selection method.
31. Performance Measures
Accuracy
It is the percentage of test dataset
that are correctly classified by the
classifier.
Speed
It refers to computational time and
cost involved in generating and using
given classifier.
32. Evolving Network Architectures
The success of ANNs largely depends on their
architecture.
Small networks require long training time and can
be easily get trapped into a Local Minima.
Large networks able to learn fast and avoids local
minima but with poor generalization.
Optimal architecture is a network that is large
enough to learn the problem and is small enough
to generalize well.
33. approaches for optimizing Neural
Networks
Constructive methods
- new hidden units are added during the training
process, also called as Growing methods.
Destructive methods
- a large network is trained and then unimportant
nodes or weights are removed, also called as Pruning
methods.
Hybrid methods
- can both add and remove.
34. Pruning is defined as a network trimming within the
assumed initial architecture.
This can be accomplished by estimating the sensitivity
of the total error to the exclusion of each weight or
neuron in the network.
The weights or neurons which are insensitive to error
changes can be discarded after each step of training.
The trimmed network is of smaller size and is likely
give higher accuracy than before its trimming.
What is Pruning ?
35. Hepatitis Pruning Results
Step Current
Architecture
Acctest % Epochs Pruned Neurons
1 19-25-2 78.2 200 18 hidden neurons
2 19-7-2 80.5 50 5 hidden neurons
3 19-2-2 83.95 50 Pruning stops
Original network with architecture 19-25-2 with accuracy
78.2% is reduced to the architecture 19-2-2.
Requires 0.76 seconds to obtain the pruned network.
36. Rule Extraction
Why Rule extraction?
An important drawback of neural networks is their lack of
explanation capability i.e., it is very difficult to understand how
an ANN has solved a problem. To overcome this problem
various rule extraction algorithms have been developed.
Rule extraction : It changes a black box system into a white
box system by translating the internal knowledge of a neural
network into a set of symbolic rules .
The classification process of a neural networks can be
described by a set of simple rules.
38. •robots that can see, feel, and predict the world around them
•improved stock prediction
•common usage of self-driving cars
•composition of music
•handwritten documents to be automatically transformed into
formatted word processing documents
•trends found in the human genome to aid in the
understanding of the data compiled by the Human Genome
Project
•self-diagnosis of medical problems using neural networks
and much more!
NNs might, in the future, allow: