2. • Activation level: The activation level of a neuron in an artificial neural network is a real number
often limited to the range 0 to 1, or 1 to 1. In the case of an input neuron the value is obtained
externally to the network. In the case of a hidden neuron or output neuron the value is obtained
from the neuron's activation function. The activation of a neuron is sometimes thought of as
corresponding to the average firing rate of a biological neuron.
• Activation function: the activation function is the result of applying a squashing function to the
total net input.
• Attributes: An attribute is a property of an instance that may be used to determine its classification.
For example, when classifying objects into different types in a robotic vision task, the size and
shape of an instance may be appropriate attributes. Determining useful attributes that can be
reasonably calculated may be a difficult job - for example, what attributes of an arbitrary chess end-
game position would you use to decide who can win the game? This particular attribute selection
problem has been solved, but with considerable effort and difficulty.
• Axon: The axon is the "output" part of a biological neuron. When a neuron fires, a pulse of
electrical activity flows along the axon. Towards its end, or ends, the axon splits into a tree. The
ends of the axon come into close contact with the dendrites of other neurons. These junctions are
termed synapses. Axons may be short (a couple of millimeters) or long (e.g. the axons of the nerves
that run down the legs of a reasonably large animal.)
3. Backed-up error estimate: In decision tree pruning one of the issues in deciding
whether to prune a branch of the tree is whether the estimated error in classification is
greater if the branch is present or pruned. To estimate the error if the branch is present,
one takes the estimated errors associated with the children of the branch nodes (which
of course must have been previously computed), multiplies them by the estimated
frequencies that the current branch will classify data to each child node, and adds up
the resulting products. The frequencies are estimated from the numbers of training data
instances that are classified as belonging to each child node. This sum is called the
backed-up error estimate for the branch node. (The concept of a backed-up error
estimate does not make sense for a leaf node.).
Backward pass in back propagation: The phase of the error back propagation learning
algorithm when the weights are updated, using the delta rule or some modification of
it.
Bias: In feed forward and some other neural networks, each hidden unit and
each output unit is connected via a trainable weight to a unit (the bias unit) that always
has an activation level of 1.
4. • Clamping: When a neuron in a neural network has its value forcibly set and fixed to some externally
provided value, it is said to be clamped. Such a neuron serves as an input unit for the net.
• Classes in classification tasks: In a classification task in machine learning, the task is to take
each instance and assign it to a particular class. For example, in a machine vision application, the
task might involve analyzing images of objects on a conveyor belt, and classifying them
as nuts, bolts, or other components of some object being assembled. In an optical character
recognition task, the task would involve taking instances representing images of characters, and
classifying according to which character they are. Frequently in examples, for the sake of simplicity
if nothing else, just two classes, sometimes called positive and negative, are used.
• Connectionism: Connectionism is the neural network approach to solving problems in artificial
intelligence - the idea being that connectionists feel that it is appropriate to encode knowledge in
the weighted connections between nodes in a neural net.
• Covering algorithm: A covering algorithm, in the context of propositional learning systems, is an
algorithm that develops a cover for the set of positive examples - that is, a set of conjunctive
expressions that account for all the examples but none of the non-examples.
5. • Decision trees: A decision tree is a tree in which each non-
leaf node is labeled with an attribute or a question of some
sort, and in which the branches at that node correspond to
the possible values of the attribute, or answers to the
question.
• Delta rule: The delta rule in error back propagation
learning specifies the update to be made to
each weight during backdrop learning. Roughly speaking, it
states that the change to the weight from node I to
node j should be proportional to output of node j and also
proportional to the "local gradient" at node j.
• Dendrite: A dendrite is one of the branches on the input
end of a biological neuron. It has connections,
via synapses to the axons of other neurons.
6. • Epoch: In training a neural net, the term epoch is used to describe a complete pass through all of
the training patterns. The weights in the neural net may be updated after each pattern is presented
to the net, or they may be updated just once at the end of the epoch. Frequently used as a measure
of speed of learning - as in "training was complete after x epochs".
•
• Error back propagation learning algorithm The error back propagation learning algorithm is a form
of supervised learning used to train mainly feed forward neural networks to perform some task. In
outline, the algorithm is as follows:
• Initialization: the weights of the network are initialized to small random values.
• Forward pass: The inputs of each training pattern are presented to the network.
• Backward pass: In this phase, the weights of the net are updated. See the main article on
the backward pass for some more detail.
• Go back to step 2. Continue doing forward and backward passes until the stopping criterion is
satisfied.
• Error surface: When total error of a back propagation-trained neural network is expressed as a
function of the weights, and graphed (to the extent that this is possible with a large number of
weights), the result is a surface termed the error surface. The course of learning can be traced on
the error surface: as learning is supposed to reduce error, when the learning algorithm causes the
weights to change, the current point on the error surface should descend into a valley of the error
surface.
7. • Gradient descent: When an artificial neural
network learning algorithm causes the weights of
the net to change, it will do so in such a way that
the current point on the error surface will
descend into a valley of the error surface, in a
direction that corresponds to the steepest
(downhill) gradient or slope at the current point
on the error surface. For this reason, back prop is
said to be a gradient descent method, and to
perform gradient descent in weight space.
8. • Hidden layer: Neurons or units in a feed forward net are usually
structured into two or more layers. The input units constitute
the input layer. The output units constitute the output layer. Layers
in between the input and output layers (that is, layers that consist
of hidden units) are termed hidden layers.
• Hypothesis language: Term used in analyzing machine
learning methods. The hypothesis language refers to the notation
used by the learning method to represent what it has learned so far.
For example, in ID3, the hypothesis language would be the notation
used to represent the decision tree (including partial descriptions of
incomplete decision trees). In back prop, the hypothesis language
would be the notation used to represent the current set of weights.
In as, the hypothesis language would be the notation used to
represent the class descriptions
9. • ID3: A decision tree induction algorithm, developed by Quinlan. ID3
stands for "Iterative Dichotomize (version) 3".
• Input unit: An input unit in a neural network is a neuron with no
input connections of its own. Its activation thus comes from outside
the neural net. The input unit is said to have its value clamped to
the external value.
• Instance: This term has two, only distantly related, and uses:
• An instance is a term used in machine learning particularly
with symbolic learning algorithm, to describe a single training item,
usually in the form of a description of the item, along with its
intended classification.
• In general AI parlance, an instance frame is a frame representing a
particular individual, as opposed to a generic frame.
10. • Learning program: Normal programs P produce the same output y each time they
receive a particular input x. Learning programs are capable of improving their
performance so that they may produce different (better) results on second or later
times that they receive the same input x.
• Learning rate: a constant used in error back propagation learning and
other artificial neural network learning algorithms to affect the speed of learning.
• Linear threshold unit (LTU): A linear threshold unit is a simple
artificial neuron whose output is its threshold total net input. That is, an LTU with
threshold T calculates the weighted sum of its inputs, and then outputs 0 if this
sum is less than T and 1 if the sum is greater than T. LTU's form the basis
of perceptions.
• Local minimum: When an artificial neural network learning algorithm causes
the total error of the net to descend into a valley of the error surface, that valley
may or may not lead to the lowest point on the entire error surface. If it does not,
the minimum into which the total error will eventually fall is termed a local
minimum. The learning algorithm is sometimes referred to in this case as "trapped
in a local minimum."
11. • Machine learning: Machine learning is said to
occur in a program that can modify some
aspect of it, often referred to as its state, so
that on a subsequent execution with the same
input, a different (hopefully better) output is
produced. See unsupervised
learning and supervised learning, and
also function approximation
algorithms and symbolic learning algorithms.
12. • Neural network: An artificial neural network is a collection of simple artificial neurons connected
by directed weighted connections. When the system is set running, the activation levels of
the input units is clamped to desired values. After this the activation is propagated, at each time
step, along the directed weighted connections to other units. The activations of non-input neurons
are computing using each neuron's activation function. The system might either settle into a stable
state after a number of time steps, or in the case of a feed forward network, the activation might
flow through to output units.
• Neuron (artificial): A simple model of a biological neuron used in neural networks to perform a
small part of some overall computational problem. It has inputs from other neurons, with each of
which is associated a weight - that is, a number which indicates the degree of importance which
this neuron attaches to that input. It also has an activation function, and a bias. The bias acts like a
threshold in a perception.
•
• Noisy data in machine learning: The term "noise" in this context refers to errors in the training data
for machine learning algorithms. If a problem is difficult enough and complicated enough to be
worth doing with machine learning techniques, then any reasonable training set is going to be large
enough that there are likely to be errors in it. This will of course cause problems for the learning
algorithm.
13. • Observation language: Term used in
analyzing machine learning methods. The
observation language refers to the notation
used by the learning method to represent the
data it uses for training. Perception: A
perception is a simple
artificial neuron whose activation
function consists of taking the total net
input and outputting 1 if this is above
a threshold T and 0 otherwise.
14. • Perception learning: The perception learning algorithm:
• All weights are initially set to zero.
• For each training example:
• If the perception outputs 0 when it should output 1, then add the input vector to
the weight vector.
• If the perception outputs 1 when it should output 0, and then subtract the input
vector to the weight vector.
• Repeat step 2 until the perception yields the correct result for each training
example.
• Propositional learning systems: A propositional learning system represents what it
learns about the training instances by expressions equivalent to sentences in some
form of logic
• Pruning decision trees: The data used to generate a decision tree, using an
algorithm like the ID3 tree induction algorithm, can include noise - that
is, instances that contain errors, either in the form of a wrong classification, or in
the form of a wrong attribute value. There could also be instances whose
classification is problematical because the attributes available are not sufficient to
discriminate between some cases.
15. • Recurrent network A recurrent network is
a neural network in which there is at least one
cycle of activation flow. To put it another way,
underlying any neural network there is
a directed graph, obtained by regarding
the neurons as nodes in the graph and
the weights as directed edges in the graph. If
this graph is not acyclic, the network is
recurrent.
16. • Sequence prediction tasks: Feed forward networks are fine for classifying objects, but their units (as distinct from
their weights) have no memory of previous inputs. Consequently they are unable to cope with sequence
prediction tasks - tasks like predicting, given a sequence of sunspot activity counts, what the sunspot activity for
the next time period will be, and financial prediction tasks (e.g. given share prices for the last n days, and
presumably other economic data, etc., predict tomorrow's share price).
• Sigmoid nonlinearity: Another name for the logistic function and certain related functions (such as tanh(x)).
Sigmoid functions are a type of squashing function. They are called because sigma is the Greek letter "s", and the
logistic functions look somewhat like a sloping letter "s" when graphed.
• Squashing function: One of several functions, including sigmoid and step functions (see threshold used to
transform the total net input to a neuron to obtain the final output of the neuron.
• Supervised learning: Supervised learning is a kind of machine learning where the learning algorithm is provided
with a set of inputs for the algorithm along with the corresponding correct outputs, and learning involves the
algorithm comparing its current actual output with the correct or target outputs, so that it knows what its error is,
and modify things accordingly.
• Symbolic learning algorithms: Symbolic learning algorithms learn concepts by constructing a symbolic expression
(such as a decision tree, as in as) that describes a class (or classes) of objects. Many such systems work with
representations equivalent to first order logic.
• Synapse: A synapse, in a biological neuron, is a connection between the axon of one neuron and the dendrite of
another. It corresponds to a weight in an artificial neuron. Synapses have varying strengths and can be changed by
learning (like weights). The operation mechanism of synapses is biochemical in nature, with transmitter
substances crossing the tiny "synaptic gap" between axon and dendrite when the axon fires.
17. • Threshold: One of the parameters of a perceptron. The output of the perceptron is 0 if the total net
input of the perceptron is less than its threshold T, otherwise it is 1.
• Trainable weight: In artificial neural networks that use weighted connections, some of those
connections may have fixed values - i.e. they are specified as not being subject to change when the
network is trained, e.g. using the error back propagation learning algorithm.
• The other weights, the ones that are subject to change when the net is learning, are referred to
as trainable weights.
• Training pattern: This term is used to refer to a set of input values and the corresponding set of
desired or target output values for use in training an artificial neural network. Usually, a largish
number of training patterns would be used in the training of any genuinely useful neural network.
• Tree induction algorithm: This article describes the basic tree induction algorithm used by ID3 and
successors. The basic idea is to pick an attribute A with values a1, a2, ar, split the
training instances into subsetsSa1, Sa2, Sar consisting of those instances that have the
corresponding attribute value. Then if a subset has only instances in a single class, that part of the
tree stops with a leaf node labeled with the single class. If not, then the subset is split again,
recursively, using a different attribute.
18. • Unsupervised learning: Unsupervised learning
signifies a mode of machine learning where the
system is not told the "right answer" - for
example, it is not trained on pairs consisting of an
input and the desired output. Instead the system
is given the input patterns and is left to find
interesting patterns, regularities, or clustering’s
among them. To be contrasted to supervised
learning, as in error back propagation and ID3,
where the system is told the desired or target
outputs to associate with each input pattern used
for training.
19. • Weight: A weight, in an artificial neural
network, is a parameter associated with a
connection from one neuron, M, to another
neuron N. It corresponds to a synapse in
a biological neuron, and it determines how
much notice the neuron N pays to the
activation it receives from neuron N. If the
weight is positive, the connection is
called excitatory, while if the weight is
negative, the connection is called inhibitory.
20. • XOR problem: p XOR q is a binary logical operator which
can be defined either as not (p equiv q), or
as not p and q or p and not q. It rose to fame in the late 60's
when Minsky and Paper demonstrated that perceptron-
based systems cannot learn to compute the XOR function.
Subsequently, when backprogation-trained multi-layer
perceptrons were introduced, one of the first tests
performed was to demonstrate that they could in fact learn
to compute XOR. No one in their right mind would probably
ever have cause to compute XOR in this way in practice, but
XOR and generalizations of XOR remain popular as tests of
back propagation-type systems.
21. • Yandex – Google Equivalent Search engine
from Russian uses Machine Learning
Algorithms.
23. Visit more self help tutorials
• Pick a tutorial of your choice and browse
through it at your own pace.
• The tutorials section is free, self-guiding and
will not involve any additional support.
• Visit us at www.dataminingtools.net