2. Data mining
• Data mining is the process of automatically discovering
useful information in large data repositories.
• It helps to find novel and useful patterns in the data
• Predict the outcome of a future observation
• For example, with data mining we can predict if a
newly arrived customer will spend more than 100USD
at a department store
• Data mining is based on knowledge discovery in
databases
IRENE KAFEZA
3. Approaches in learning algorithms
• Classification: takes as input a collection of records
(instance, example) and maps each record to a
predefined class label
Classification
model
Attribute set
x
Class label
y
Input
Output
• Classification: Predicts a certain outcome based on a given
input
• Next slide shows what features define a vertebrate as a
mammal, reptile, bird, fish or amphibian
IRENE KAFEZA
5. • Suppose that we are given the following
characteristics of a creature called gila
monster:
• We can use classification based on the data of
the previous slide to determine in which class
it belongs.
IRENE KAFEZA
6. How to solve a classification problem
• Use a learning algorithm to create a model
that best fits the relationship between the
attribute set and the class label of the input
data.
• Create a training set consisting of records
whose labels are known
• Use s test set to measure the accuracy of your
model
IRENE KAFEZA
7. Artificial Neural Networks
• Inspired by attempts to simulate biological neural systems
• The human brain consists of nerve cells called neurons
• Neurons are linked together with other neurons via strands
of fiber called axons.
• Axons are used to transmit nerve impulses from one
neuron to another via dendrites which are extensions from
the cell body of the neuron.
• The contact point between a dendrite and an axon is called
synapse.
• Neurologists have discovered that the human brain learns
by changing the strength of the synaptic connection
between neurons upon repeated stimulation by the same
impulse.
IRENE KAFEZA
8. • The Perceptron model
• An artificial neural network (ANN) is
composed of nodes and directed links
IRENE KAFEZA
9. • The Perceptron model
• nodes are the neurons and the links represent
the strength of synaptic connection between the
neurons.
• As in a biological neural system training a
perceptron model means to adapt the weights of
the links until they fit the input output
relationships of the underlying data
• In the specific example the output is
• 1 if 0.3*x1+0.3*x2+0.3*x3-0.4>0 and it is
• -1 if 0.3*x1+0.3*x2+0.3*x3-0.4<0
• The weight at the arcs is 0.3 and 0.4 is a bias
factor.
IRENE KAFEZA
10. • In this example we can see how the data of the given
set are divided in two sets. The line is the decision
boundary that was decided by applying the perceptron
learning algorithm to the data set.
IRENE KAFEZA
12. Perceptron learning algorithm
• The algorithm maintains a “guess” at good parameters
(weights and bias) as it runs.
• It processes one example at a time.
• For a given example, it makes a prediction.
• It checks to see if this prediction is correct (recall that this
is training data, so we have access to true labels).
• If the prediction is correct, it does nothing.
• Only when the prediction is incorrect does it change its
parameters, and it changes them in such a way that it
would do better on this example next time around.
• It then goes on to the next example. Once it hits the last
example in the training set, it loops back around for a
specified number of iterations
IRENE KAFEZA