SlideShare una empresa de Scribd logo
1 de 155
Descargar para leer sin conexión
Machine Learning
Dr. Amit Kumar, Dept of CSE, JUET, Guna
Chapter 1-
Introduction
Dr. Amit Kumar, Dept of CSE, JUET, Guna
Machine Learning
Chapter 1- Introduction
• The field of machine learning is concerned with
the question of how to construct computer
programs that automatically improve with
experience.
• In recent years many successful machine learning
applications have been developed.
– data-mining programs
– detect fraudulent credit card transactions
– information-filtering systems
– autonomous vehicles
Dr. Amit Kumar, Dept of CSE, JUET, Guna
Machine Learning
• The goal of this course is to present the key
algorithms and theory that form the core of
machine learning.
• Machine learning draws on concepts and
results from many fields, including –
–statistics, artificial intelligence, philosophy,
–information theory, biology, cognitive
science,
–computational complexity, and control
theory. Dr. Amit Kumar, Dept of CSE, JUET, Guna
Machine Learning
• A few specific achievements provide a
glimpse of the state of the art:
– Programs have been developed that successfully
learn to recognize spoken words (Waibel 1989; Lee
1989)
– predict recovery rates of pneumonia patients
(Cooper et al. 1997)
– detect fraudulent use of credit cards, drive
autonomous vehicles on public highways (Pomerleau
1989)
– play games such as backgammon at levels
approaching the performance of human world
champions (Tesauro 1992,1995).
Dr. Amit Kumar, Dept of CSE, JUET, Guna
Machine Learning
• Learning to recognize spoken words:
• All of the most successful speech recognition systems
employ machine learning in some form.
• For example, the SPHINX system (e.g., Lee 1989) learns
speaker-specific strategies for recognizing the primitive
sounds (phonemes) and words from the observed speech
signal.
• Neural network learning methods (e.g., Waibel et al.
1989) and methods for learning hidden Markov models
(e.g., Lee 1989) are effective for automatically customizing
to individual speakers, vocabularies, microphone
characteristics, background noise, etc.
• Similar techniques have potential applicationsin many
signal-interpretation problems.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
Machine Learning
• Automated Transportation
• Machine learning methods have been used to
train computer-controlled vehicles to steer
correctly when driving on a variety of road types.
• For example, the ALVINN system (Pomerleau
1989) has used its learned strategies to drive
unassisted at 70 miles per hour for 90 miles on
public highways among other cars.
• Similar techniques have possible applications in
many sensor-based control problems.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
Machine Learning
• Automated Transportation (contd..)
Google began testing a self-driving car in
2012, and since then, the U.S. Department
of Transportation has released definitions of
different levels of automation, with
Google’s car classified as the first level
down from full automation.
Other transportation methods are closer to
full automation, such as buses and trains.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
Machine Learning
Classify new astronomical structures.
• Machine learning methods have been applied to
a variety of large databases to learn general
regularities implicit in the data.
• For example, decision tree learning algorithms
have been used by NASA to learn how to classify
celestial objects from the second Palomar
Observatory Sky Survey (Fayyad et al. 1995).
• This system is now used to automatically classify
all objects in the Sky Survey, which consists of
three terrabytes of image data.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
Machine Learning
To play world-class backgammon.
• The most successful computer programs for playing
games such as backgammon are based on machiie
learning algorithms.
• For example, the world's top computer program for
backgammon, TD-GAMMON(T esauro 1992, 1995).
learned its strategy by playing over one million
practice games against itself.
• It now plays at a level competitive with the human
world champion.
• Similar techniques have applications in many
practical problems where very large search spaces
must be examined efficiently.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
Machine Learning
Cyborg Technology
• Researcher Shimon Whiteson thinks that in the
future, we will be able to augment ourselves with
computers and enhance many of our own natural
abilities.
• Though many of these possible cyborg
enhancements would be added for convenience,
others might serve a more practical purpose.
• Yoky Matsuka of Nest believes that AI will become
useful for people with amputated limbs, as the
brain will be able to communicate with a robotic
limb to give the patient more control.
• This kind of cyborg technology would significantly
reduce the limitations that amputees deal with on a
daily basis. Dr. Amit Kumar, Dept of CSE, JUET, Guna
Machine Learning
Taking over dangerous jobs
• Robots are already taking over some of the most hazardous
jobs available, including bomb defusing.
• These robots aren’t quite robots yet, according to the
BBC. They are technically drones, being used as the physical
counterpart for defusing bombs, but requiring a human to
control them, rather than using machine learning.
• Other jobs are also being reconsidered for robot integration.
Welding, well known for producing toxic substances, intense
heat, and earsplitting noise, can now be outsourced to
robots in most cases.
• Robot Worx explains that robotic welding cells are already in
use, and have safety features in place to help prevent
human workers from fumes and other bodily harm.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
Machine Learning
Solving climate change
• Solving climate change might seem like a tall
order from a robot, but as Stuart Russell
explains, machines have more access to data
than one person ever could—storing a mind-
boggling number of statistics.
• Using big data, machine learning could one
day identify trends and use that information
to come up with solutions to the world’s
biggest problems.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
Machine Learning
Robot as friends
• Who wouldn’t want a friend like C-3PO?
• C-3PO is a humanoid robot character from
the ”Star Wars”.
• Built by Anakin Skywalker, C-3PO was
designed as a protocol droid intended to
assist in etiquette, customs,
and translation, boasting that he is "fluent
in over six million forms
of communication"Dr. Amit Kumar, Dept of CSE, JUET, Guna
Machine Learning
Robot as friends (Contd..)
• At this stage, most robots are still emotionless and
it’s hard to picture a robot you could relate to.
However, a company in Japan has made the first big
steps toward a robot companion—one who can
understand and feel emotions.
• Introduced in 2014, “Pepper” the companion robot
went on sale in 2015, with all 1,000 initial
units selling out within a minute.
• The robot was programmed to read human
emotions, develop its own emotions, and help its
human friends stay happy. Pepper goes on sale in
the U.S. in 2016, and more sophisticated friendly
robots are sure to follow.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
Machine Learning
Improved elder care and medical care
• For many seniors, everyday life is a struggle, and
many have to hire outside help to manage their
care, or rely on family members.
• Machine learning and AI is at a stage where
replacing this need isn’t too far off, says
Matthew Taylor, computer scientist at
Washington State University.
• “Home” robots could help seniors with everyday
tasks and allow them to stay independent and in
their homes for as long as possible, which
improves their overall well-being.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
Machine Learning
Sophia
• The first robot to be awarded citizenship in
the world, has said she not only wants to
start a family but also have her own career, in
addition to developing human emotions in
the future.
• Sophia, the humanoid AI robot has sprung
back to headlines after saying she wants to
start a family of her own and also mentioning
that all droids deserve to have children.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
Machine Learning
The robot running for NZ Prime Minister in 2020
• The era of intelligent machines is well and truly
upon us. After Sophia was confirmed as the
world's first robot citizen in Saudi Arabia last
month, it's time for us to gape in awe at SAM,
who's become the world's first AI-powered
virtual politician from New Zealand.
• SAM's creator, Nick Gerritsen, believes that the
virtual politician may be able to contest New
Zealand's general elections in 2020.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
Machine Learning
IBM Watson
• Watson is a question answering computer system capable of
answering questions posed in natural language, developed
in IBM's DeepQA project.
• The computer system was specifically developed to answer
questions on the quiz show Jeopardy! and, in 2011, the Watson
computer system competed on Jeopardy! against former
winners Brad Rutter and Ken Jennings] winning the first place
prize of $1 million.
• Watson had access to 200 million pages of structured and
unstructured content consuming four terabytes of disk
storage including the full text of Wikipedia, but was not
connected to the Internet during the game. For each clue,
Watson's three most probable responses were displayed on the
television screen. Watson consistently outperformed its human
opponents on the game's signaling device, but had trouble in a
few categories, notably those having short clues containing only
a few words.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
WELL-POSED LEARNING PROBLEMS
• Definition: A computer program is said to learn
from experience E with respect to some class of
tasks T and performance measure P, if its
performance at tasks in T, as measured by P,
improve
• For example, a computer program that learns to
play checkers might improve its performance as
measured by its ability to win at the class of tasks
involving playing checkers games, through
experience obtained by playing games against
itself with experience E.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
WELL-POSED LEARNING PROBLEMS
• In general, to have a well-defined
learning problem, we must identity
these three features:
• the class of tasks (T)
• the measure of performance to be
improved (P),
• and the source of experience (E).
Dr. Amit Kumar, Dept of CSE, JUET, Guna
WELL-POSED LEARNING PROBLEMS
• A checkers learning problem:
–Task T: playing checkers
–Performance measure P: percent of games
won against opponents
–Training experience E: playing practice
games against itself
• We can specify many learning problems in this
fashion, such as learning to recognize
handwritten words, or learning to drive a
robotic automobile autonomously.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
WELL-POSED LEARNING PROBLEMS
• A handwriting recognition learning
problem:
–Task T: recognizing and classifying
handwritten words within images
–Performance measure P: percent of words
correctly classified
–Training experience E: a database of
handwritten words with given
classifications
Dr. Amit Kumar, Dept of CSE, JUET, Guna
WELL-POSED LEARNING PROBLEMS
• A robot driving learning problem:
–Task T: driving on public four-lane
highways using vision sensors
–Performance measure P: average
distance traveled before an error (as
judged by human overseer)
–Training experience E: a sequence of
images and steering commands recorded
while observing a human driver
Dr. Amit Kumar, Dept of CSE, JUET, Guna
DESIGNING A LEARNING SYSTEM
• let us consider designing a program
to learn to play checkers,
• with the goal of entering it in the
world checkers tournament.
• We adopt the obvious performance
measure: the percent of games it
wins in this world tournament.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
DESIGNING A LEARNING SYSTEM
Choosing the Training Experience:
• The first design choice we face is to
choose the type of training experience
from which our system will learn.
• The type of training experience available
can have a significant impact on success
or failure of the learner.
• One key attribute is whether the training
experience provides direct or indirect
feedback regarding the choices made by
the performance system.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
DESIGNING A LEARNING SYSTEM
Choosing the Training Experience:
• For example, in learning to play checkers, the
system might learn from direct training examples
consisting of individual checkers board.
• Alternatively, it might have available only indirect
information consisting of the move sequences
and final outcomes of various games played.
• In this later case, information about the
correctness of specific moves early in the game
must be inferred indirectly from the fact that the
game was eventually won or lost.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
DESIGNING A LEARNING SYSTEM
Choosing the Training Experience:
• the learner faces an additional problem of credit
assignment, or determining the degree to which
each move in the sequence deserves credit or
blame for the final outcome.
• Credit assignment can be a particularly difficult
problem because the game can be lost even
when early moves are optimal, if these are
followed later by poor moves.
• Hence, learning from direct training feedback is
typically easier than learning from indirect
feedback.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
DESIGNING A LEARNING SYSTEM
• Choosing the Training Experience:
• A second important attribute of the
training experience is the degree to
which the learner controls the
sequence of training examples.
• For example, the learner might rely
on the teacher to select informative
board states and to provide the
correct move for each.Dr. Amit Kumar, Dept of CSE, JUET, Guna
DESIGNING A LEARNING SYSTEM
• Choosing the Training Experience:
• Alternatively, the learner might itself
propose board states that it finds
particularly confusing and ask the teacher
for the correct move.
• Or the learner may have complete control
over both the board states and (indirect)
training classifications, as it does when it
learns by playing against itself with no
teacher present.Dr. Amit Kumar, Dept of CSE, JUET, Guna
DESIGNING A LEARNING SYSTEM
• Choosing the Training Experience:
• A third important attribute of the training
experience is how well it represents the
distribution of examples over which the final
system performance P must be measured.
• In our checkers learning scenario, the
performance metric P is the percent of games
the system wins in the world tournament.
• If its training experience E consists only of
games played against itself.Dr. Amit Kumar, Dept of CSE, JUET, Guna
DESIGNING A LEARNING SYSTEM
• To proceed with our design, let us
decide that our system will train by
playing games against itself.
• This has the advantage that no
external trainer need be present,
• and it therefore allows the system to
generate as much training data as
time permits.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
DESIGNING A LEARNING SYSTEM
A checkers learning problem
• Task T: playing checkers
• Performance measure P: percent of games won
in the world tournament
• Training experience E: games played against
itself
In order to complete the design of the learning
system, we must now choose:
1. the exact type of knowledge to be learned
2. a representation for this target knowledge
3. a learning mechanismDr. Amit Kumar, Dept of CSE, JUET, Guna
A CHECKERS LEARNING PROBLEM
Choosing the Target Function
• The next design choice is to determine exactly
what type of knowledge will be learned and
how this will be used by the performance
program.
• Let us begin with a checkers-playing program
that can generate the legal moves from any
board state.
• The program needs only to learn how to
choose the best move from among these legal
moves. Dr. Amit Kumar, Dept of CSE, JUET, Guna
A CHECKERS LEARNING PROBLEM
Choosing the Target Function
• Let us call this function ChooseMove
and use the notation:
ChooseMove : B  M
to indicate that this function accepts
as input any board from the set of
legal board states B and produces as
output some move from the set of
legal moves M.Dr. Amit Kumar, Dept of CSE, JUET, Guna
A CHECKERS LEARNING PROBLEM
Choosing the Target Function
• Although ChooseMove is an obvious choice
for the target function in our example, this
function will turn out to be very difficult to
learn given the kind of indirect training
experience available to our system.
• An alternative target function and one that
will turn out to be easier to learn in this
setting is an evaluation function that
assigns a numerical score to any given
board state. Dr. Amit Kumar, Dept of CSE, JUET, Guna
A CHECKERS LEARNING PROBLEM
Choosing the Target Function
• Let us call this target function V and
again use the notation V : B  R to
denote that V maps any legal board
state from the set B to some real value
(we use R to denote the set of real
numbers).
• We intend for this target function V to
assign higher scores to better board
states. Dr. Amit Kumar, Dept of CSE, JUET, Guna
A CHECKERS LEARNING PROBLEM
Choosing the Target Function
• If the system can successfully learn such a
target function V, then it can easily use it
to select the best move from any current
board position.
• This can be accomplished by generating
the successor board state produced by
every legal move, then using V to choose
the best successor state and therefore the
best legal move.Dr. Amit Kumar, Dept of CSE, JUET, Guna
Choosing the Target Function
• Let us therefore define the target value V(b) for an
arbitrary board state b in B, as follows:
1. if b is a final board state that is won, then
V(b) = 100
2. if b is a final board state that is lost, then
V(b) = -100
3. if b is a final board state that is drawn, then
V(b) = 0
4. if b is a not a final state in the game, then
V(b) = V(b’),
where b' is the best final board state that can be achieved
starting from b and playing optimally until the end of the
game (assuming the opponent plays optimally, as well).
Dr. Amit Kumar, Dept of CSE, JUET, Guna
Choosing the Target Function
• Thus, we have reduced the learning task in this
case to the problem of discovering an operational
description of the ideal target function V.
• We often expect learning algorithms to acquire
only some approximation to the target function,
and for this reason the process of learning the
target function is often called function
approximation.
• we will use the symbol V* to refer to the function
that is actually learned by our program, to
distinguish it from the ideal target function V.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
Choosing a Representation for the
Target Function
• let us choose a simple representation: for any given
board state, the function V* will be calculated as a
linear combination of the following board features:
• x1: the number of black pieces on the board
• x2: the number of red pieces on the board
• x3: the number of black kings on the board
• x4: the number of red kings on the board
• x5: the number of black pieces threatened by red
(i.e., which can be captured on red's next turn)
• X6: the number of red pieces threatened by black
Dr. Amit Kumar, Dept of CSE, JUET, Guna
Choosing a Representation for the
Target Function
• Thus, our learning program will represent V*(b)
as a linear function of the form:
V*(b) = w0+w1x1+w2x2+w3x3+w4x4+w5x5+w6x6
• where w0 through w6 are numerical coefficients,
or weights, to be chosen by the learning
algorithm.
• Learned values for the weights w1 through w6
will determine the relative importance of the
various board features in determining the value
of the board, where as the weight w0 will provide
an additive constant to the board value.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
Partial design of a checkers learning program
• Task T: playing checkers
• Performance measure P: percent of games won in
the world tournament
• Training experience E: games played against itself
• Target function:
V : Board  R
• Target function representation:
V*(b) = w0+w1x1+w2x2+w3x3+w4x4+w5x5+w6x6
The first three items above correspond to the
specification of the learning task, whereas the
final two items constitute design choices for the
implementation of the learning program.Dr. Amit Kumar, Dept of CSE, JUET, Guna
Choosing a Function Approximation Algorithm
• In order to learn the target function V* we require a set
of training examples, each describing a specific board
state b and the training value Vtrain(b) for b.
• In other words, each training example is an ordered pair
of the form <b, Vtrain(b)>
• For instance, the following training example describes a
board state b in which black has won the game (note x2
= 0 indicates that red has no remaining pieces)
• and for which the target function value Vtrain(b) is
therefore +100.
<(x1=3, x2 =0, x3=1, x4=0,x5=0x6=6), +100>
Dr. Amit Kumar, Dept of CSE, JUET, Guna
ESTIMATING TRAINING VALUES
• Only training information available to our learner is
whether the game was eventually won or lost.
• On the other hand, we require training examples that
assign specific scores to specific board states.
• it is easy to assign a value to board states that
correspond to the end of the game,
• it is less obvious how to assign training values to the
more numerous intermediate board states that occur
before the game's end.
• Of course the fact that the game was eventually won or
lost does not necessarily indicate that every board state
along the game path was necessarily good or bad.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
ESTIMATING TRAINING VALUES
• Despite the ambiguity inherent in estimating training
values for intermediate board states,
• one simple approach has been found to be surprisingly
successful.
• This approach is to assign the training value of Vtrain(b)
for any intermediate board state b to be V*(successor(b))
• where V* is the learner's current approximation to V and
where Successor(b) denotes the next board state
following b for which it is again the program's turn to
move (i.e., the board state following the program‘s move
and the opponent's response).
Dr. Amit Kumar, Dept of CSE, JUET, Guna
ESTIMATING TRAINING VALUES
• This rule for estimating training
values can be summarized as:
Vtrain(b)  V*(Successor(b))
(Rule for estimating training values.)
Dr. Amit Kumar, Dept of CSE, JUET, Guna
ADJUSTING THE WEIGHTS
• All that remains is to specify the learning algorithm for
choosing the weights wi to best fit the set of training
examples { <b, Vtrain(b)>}.
• As a first step we must define what we mean by the best
fit to the training data.
• One common approach is to define the best hypothesis,
or set of weights, as that which minimizes the squared
error E between the training values and the values
predicted by the hypothesis V*.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
ADJUSTING THE WEIGHTS
• Several algorithms are known for finding weights of a
linear function that minimize E.
• In our case, we require an algorithm that will
incrementally refine the weights as new training
examples become available and that will be robust to
errors in these estimated training values.
• One such algorithm is called the least mean squares, or
LMS training rule.
• For each observed training example it adjusts the
weights a small amount in the direction that reduces the
error on this training example.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
ADJUSTING THE WEIGHTS
The LMS algorithm is defined as follows:
LMS weight update rule.
For each training example <b, Vtrain(b)>
– Use the current weights to calculate V*(b)
– For each weight wi, update it as –
• Wi  wi + η (Vtrain(b) - V*(b)) xi
• Here η is a small constant (e.g., 0.1) that moderates
the size of the weight update.
• To get an intuitive understanding for why this weight
update rule works, notice that when the error
(Vtrain(b) - V*(b)) is zero, no weights are changed.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
ADJUSTING THE WEIGHTS
• When (Vtrain(b) - V*(b)) is positive (i.e., when
V*(b) is too low), then each weight is increased
in proportion to the value of its corresponding
feature.
• This will raise the value of V*(b), reducing the
error.
• Notice that if the value of some feature xi is
zero, then its weight is not altered regardless of
the error, so that the only weights updated are
those whose features actually occur on the
training example board.Dr. Amit Kumar, Dept of CSE, JUET, Guna
The Final Design
• The final design of our checkers learning
system can be naturally described by four
distinct program modules that represent
the central components in many learning
systems.
–The Performance System
–The Critic
–The Generalizer
–The Experiment Generator
Dr. Amit Kumar, Dept of CSE, JUET, Guna
The Final Design
Dr. Amit Kumar, Dept of CSE, JUET, Guna
The Final Design
• The Performance System is the module that
must solve the given performance task, in
this case playing checkers, by using the
learned target function(s). It takes an
instance of a new problem (new game) as
input and produces a trace of its solution
(game history) as output.
• The Critic takes as input the history or trace
of the game and produces as output a set of
training examples of the target function.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
The Final Design
• The Generalizer takes as input the training
examples and produces an output hypothesis
that is its estimate of the target function.
• The Experiment Generator takes as input the
current hypothesis (currently learned
function) and outputs a new problem (i.e.,
initial board state) for the Performance
System to explore. Its role is to pick new
practice problems that will maximize the
learning rate of the overall system.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
Dr. Amit Kumar, Dept of CSE, JUET, Guna
Chapter 2
CONCEPT LEARNING AND
THE
GENERAL-TO-SPECIFIC
0RDERING
Dr. Amit Kumar, Dept of CSE, JUET, Guna
INTRODUCTION-CONCEPT LEARNING
• The problem of inducing general functions from
specific training examples is central to concept
learning.
• Concept learning: acquiring the definition of a
general category given a sample of positive and
negative training examples of the category.
• Concept learning can be formulated as a
problem of searching through a predefined
space of potential hypotheses for the
hypothesis that best fits the training examples.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
INTRODUCTION-CONCEPT LEARNING
• In many cases this search can be efficiently
organized by taking advantage of a naturally
occurring structure over the hypothesis space - a
general-to-specific ordering of hypotheses.
• There are several learning algorithms and
considers situations under which they converge
to the correct hypothesis.
• We also examine the nature of inductive learning
and the justification by which any program may
successfully generalize beyond the observed
training data. Dr. Amit Kumar, Dept of CSE, JUET, Guna
INTRODUCTION-CONCEPT LEARNING
• Much of learning involves acquiring general
concepts from specific training examples.
• People, for example, continually learn
general concepts or categories such as "bird,"
"car," "situations in which I should study
more in order to pass the exam," etc.
• Each such concept can be viewed as
describing some subset of objects or events
defined over a larger set (e.g., the subset of
animals that constitute birds).Dr. Amit Kumar, Dept of CSE, JUET, Guna
INTRODUCTION-CONCEPT LEARNING
• Alternatively, each concept can be thought
of as a boolean-valued function defined
over this larger set (e.g., a function
defined over all animals, whose value is
true for birds and false for other animals).
• In this chapter we consider the problem of
automatically inferring the general
definition of some concept, given
examples labeled as members or non-
members of the concept.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
INTRODUCTION-CONCEPT LEARNING
• This task is commonly referred to as
concept learning, or approximating a
boolean-valued function from
examples.
Concept learning: Inferring a boolean-
valued function from training
examples of its input and output.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
A CONCEPT LEARNING TASK
• Consider the example task of learning the target
concept "days on which my friend Aldo enjoys
his favorite water sport.“
• Table 2.1 describes a set of example days, each
represented by a set of attributes. The attribute
EnjoySport indicates whether or not Aldo enjoys
his favorite water sport on this day.
• The task is to learn to predict the value of
EnjoySport for an arbitrary day, based on the
values of its other attributes.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
A CONCEPT LEARNING TASK
• What hypothesis representation shall we
provide to the learner in this case?Dr. Amit Kumar, Dept of CSE, JUET, Guna
A CONCEPT LEARNING TASK
• Let us begin by considering a simple
representation in which each
hypothesis consists of a conjunction of
constraints on the instance attributes.
• In particular, let each hypothesis be a
vector of six constraints, specifying
the values of the six attributes Sky,
AirTemp, Humidity, Wind, Water, and
Forecast. Dr. Amit Kumar, Dept of CSE, JUET, Guna
A CONCEPT LEARNING TASK
• For each attribute, the hypothesis will either-
– indicate by a "?' that any value is
acceptable for this attribute,
–specify a single required value (e.g., Warm,
cold for AirTemp) for the attribute, or
–indicate by a "Ø" that no value is
acceptable.
• If some instance x satisfies all the constraints
of hypothesis h, then h classifies x as a
positive example (h(x) = 1).Dr. Amit Kumar, Dept of CSE, JUET, Guna
A CONCEPT LEARNING TASK
• To illustrate, the hypothesis that
Aldo enjoys his favorite sport only
on cold days with high humidity
(independent of the values of the
other attributes) is represented by
the expression:
<?, Cold, High, ?, ?, ?>
Dr. Amit Kumar, Dept of CSE, JUET, Guna
A CONCEPT LEARNING TASK
• The most general hypothesis-that
every day is a positive example-is
represented by
<?, ?, ?, ?, ?, ?>
• and the most specific possible
hypothesis-that no day is a positive
example-is represented by
<Ø, Ø, Ø, Ø, Ø, Ø>
Dr. Amit Kumar, Dept of CSE, JUET, Guna
A CONCEPT LEARNING TASK- Notation
• The set of items over which the concept is
defined is called the set of instances,
which we denote by X.
• In the current example, X is the set of all
possible days, each represented by the
attributes Sky, AirTemp, Humidity, Wind,
Water, and Forecast.
• The concept or function to be learned is
called the target concept, which we
denote by c. Dr. Amit Kumar, Dept of CSE, JUET, Guna
A CONCEPT LEARNING TASK- Notation
• In general, c can be any boolean
valued function defined over the
instances X; that is, c : X  {0,1}.
• In the current example, the target
concept corresponds to the value of
the attribute EnjoySport (i.e., c(x) = 1 if
EnjoySport = Yes, and c(x) = 0 if
EnjoySport = No).
Dr. Amit Kumar, Dept of CSE, JUET, Guna
The definition of the EnjoySport concept
learning task
Dr. Amit Kumar, Dept of CSE, JUET, Guna
The Inductive Learning Hypothesis
• Although the learning task is to determine
a hypothesis h identical to the target concept
c over the entire set of instances X, the only
information available about c is its value over
the training examples.
• Therefore, inductive learning algorithms
can at best guarantee that the output
hypothesis fits the target concept over the
training data.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
The Inductive Learning Hypothesis
•Lacking any further information, our assumption
is that the best hypothesis regarding unseen
instances is the hypothesis that best fits the
observed training data.
•This is the fundamental assumption of inductive
Learning.
The inductive learning hypothesis. Any hypothesis
found to approximate the target function well over
a sufficiently large set of training examples will
also approximate the target function well over
other unobserved examples.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
CONCEPT LEARNING AS SEARCH
• Concept learning can be viewed as the task of
searching through a large space of hypotheses
implicitly defined by the hypothesis
representation.
• The goal of this search is to find the hypothesis
that best fits the training examples.
• It is important to note that by selecting a
hypothesis representation, the designer of the
learning algorithm implicitly defines the space of
all hypotheses that the program can ever
represent and therefore can ever learn.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
CONCEPT LEARNING AS SEARCH
• Consider, for example, the instances X and
hypotheses H in the EnjoySport learning task.
Given that the attribute Sky has three
possible values, and that AirTemp, Humidity,
Wind, Water, and Forecast each have two
possible values, the instance space X contains
exactly 3.2.2.2.2.2 = 96 distinct instances.
• A similar calculation shows that there are
5.4.4.4.4.4 = 5120 syntactically distinct
hypotheses within H.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
CONCEPT LEARNING AS SEARCH
• Notice, however, that every hypothesis
containing one or more " Ø " symbols represents
the empty set of instances; that is, it classifies
every instance as negative.
• Therefore, the number of semantically distinct
hypotheses is only 1+(4.3.3.3.3.3) = 973.
• EnjoySport example is a very simple learning
task, with a relatively small, finite hypothesis
space.
• Most practical learning tasks involve much larger,
sometimes infinite, hypothesis spaces.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
General-to-Specific Ordering of Hypotheses
• To illustrate the general-to-specific ordering,
consider the two hypotheses
h1 = <Sunny, ?, ?, Strong, ?, ?>
h2 = <Sunny, ?, ?, ?, ?, ?>
• Any instance classified positive by hl will also
be classified positive by h2. Therefore, we say
that h2 is more general than h1.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
Relation between instances and hypotheses
Dr. Amit Kumar, Dept of CSE, JUET, Guna
FIND-S: FINDING A MAXIMALLY SPECIFIC HYPOTHESIS
1. Initialize h to the most specific hypothesis in H
2. For each positive training instance x
* For each attribute constraint ai in h
If the constraint ai is satisfied by x
Then do nothing
Else replace ai in h by the next more general
constraint that is satisfied by x
3. Output hypothesis h
• To illustrate this algorithm, assume the learner is
given the sequence of training examples from Table
2.1 for the EnjoySport task. The first step of FIND-S
is to initialize h to the most specific hypothesis in H.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
FIND-S: FINDING A MAXIMALLY SPECIFIC HYPOTHESIS
• h <Ø, Ø, Ø, Ø, Ø, Ø>
• Upon observing the first training example from Table
2.1, which happens to be a positive example, it
becomes clear that our hypothesis is too specific.
h <Sunny, Warm, Normal, Strong, Warm, Same>
• This h is still very specific.
• Next, the second training example (also positive in
this case) forces the algorithm to further generalize h,
this time substituting a "?' in place of any attribute
value in h that is not satisfied by the new example.
The refined hypothesis in this case is
h <Sunny, Warm, ?, Strong, Warm, Same>
Dr. Amit Kumar, Dept of CSE, JUET, Guna
FIND-S: FINDING A MAXIMALLY SPECIFIC HYPOTHESIS
•Upon encountering the third training example-in this
case a negative example- the algorithm makes no
change to h. In fact, the FIND-S algorithm simply
ignores every negative example.
• To complete our trace of FIND-S, the fourth(positive)
example leads to a further generalization of h.
h <Sunny, Warm, ?, Strong, ?, ?>
•The FIND-S algorithm illustrates one way in which
the more-general-than partial ordering can be used to
organize the search for an acceptable hypothesis.
•The search moves from hypothesis to hypothesis,
searching from the most specific to progressively
more general hypotheses along one chain of the
partial ordering. Dr. Amit Kumar, Dept of CSE, JUET, Guna
FIND-S: FINDING A MAXIMALLY SPECIFIC HYPOTHESIS
Dr. Amit Kumar, Dept of CSE, JUET, Guna
VERSION SPACES AND THE CANDIDATE-ELIMINATION
ALGORITHM
• The key idea in the CANDIDATE-ELIMINATlON
algorithm is to output a description of the set of all
hypotheses consistent with the training examples.
•The CANDIDATE-ELIMINATlON algorithm has been
applied to problems such as learning regularities in
chemical mass spectroscopy and learning control
rules for heuristic search.
•CANDIDATE-ELIMINATlON algorithm computes the
description of data set without explicitly enumerating
all of its members.
• This is accomplished by again using the more-
general-than partial ordering.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
VERSION SPACES AND THE CANDIDATE-ELIMINATION
ALGORITHM
• The CANDIDATE-ELIMINATlON algorithm finds all
describable hypotheses that are consistent with the
observed training examples.
• In order to define this algorithm precisely, we begin
with a few basic definitions.
• First, let us say that a hypothesis is consistent with
the training examples if it correctly classifies these
examples.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
VERSION SPACES AND THE CANDIDATE-ELIMINATION
ALGORITHM
• The CANDIDATE-ELIMINATlON algorithm
represents the set of all hypotheses consistent with
the observed training examples.
• This subset of all hypotheses is called the version
space with respect to the hypothesis space H and the
training examples D, because it contains all plausible
versions of the target concept.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
The LIST-THEN-ELIMINATION Algorithm
•The LIST-THEN-ELIMINATION Algorithm first
initializes the version space to contain all hypotheses
in H,
• Then eliminates any hypothesis found inconsistent
with any training example.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
General and Specific Boundary
Dr. Amit Kumar, Dept of CSE, JUET, Guna
VERSION SPACE THEORM
Dr. Amit Kumar, Dept of CSE, JUET, Guna
Dr. Amit Kumar, Dept of CSE, JUET, Guna
Dr. Amit Kumar, Dept of CSE, JUET, Guna
Dr. Amit Kumar, Dept of CSE, JUET, Guna
Dr. Amit Kumar, Dept of CSE, JUET, Guna
Dr. Amit Kumar, Dept of CSE, JUET, Guna
This is the version space with its general and specific boundary sets. The
version space includes all six hypotheses shown here, but can be represented
more simply by S and G. Arrows indicate instances of the more-general-than
relation. This is the version space for the Enjoysport concept learning problem
and training examples described in Table 2.1.Dr. Amit Kumar, Dept of CSE, JUET, Guna
Dr. Amit Kumar, Dept of CSE, JUET, Guna
Chapter 3
DECISION TREE
LEARNING
Dr. Amit Kumar, Dept of CSE, JUET, Guna
INTRODUCTION
• Decision tree learning is one of the most
widely used and practical methods for
inductive inference.
• It is a method for approximating discrete-
valued functions that is robust to noisy data
and capable of learning disjunctive
expressions.
• This chapter describes a family of decision
tree learning algorithms that includes widely
used algorithms.Dr. Amit Kumar, Dept of CSE, JUET, Guna
INTRODUCTION
• Decision tree learning algorithms such as
ID3, ASSISTANT, and C4.5 are mostly used.
• These methods search a completely
expressive hypothesis space and thus avoid
the difficulties of restricted hypothesis
spaces.
• Their inductive bias is a preference for small
trees over large trees.
• Learned trees can also be re-represented as
sets of if-then rules to improve human
readability.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
DECISION TREE REPRESENTATION
• Decision trees classify instances by
sorting them down the tree from the
root to some leaf node, which provides
the classification of the instance.
• Each node in the tree specifies a test of
some attribute of the instance, and each
branch descending from that node
corresponds to one of the possible
values for this attribute.Dr. Amit Kumar, Dept of CSE, JUET, Guna
DECISION TREE REPRESENTATION
• An instance is classified by starting at
the root node of the tree, testing the
attribute specified by this node, then
moving down the tree branch
corresponding to the value of the
attribute in the given example.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
DECISION TREE REPRESENTATION
Dr. Amit Kumar, Dept of CSE, JUET, Guna
APPROPRIATE PROBLEMS FOR DECISION TREE
LEARNING
• Decision tree learning is generally best suited to
problems with the following characteristics:
• Instances are represented by attribute-value
pairs.
• The target function has discrete output values.
• Disjunctive descriptions may be required.
• The training data may contain errors.
• The training data may contain missing attribute
values. Dr. Amit Kumar, Dept of CSE, JUET, Guna
THE BASIC DECISION TREE LEARNING ALGORITHM
• Most algorithms that have been
developed for learning decision trees are
variations on a core algorithm that
employs a top-down, greedy search
through the space of possible decision
trees.
• This approach is exemplified by the ID3
algorithm (Quinlan 1986) and its
successor C4.5 (Quinlan 1993).Dr. Amit Kumar, Dept of CSE, JUET, Guna
THE BASIC DECISION TREE LEARNING ALGORITHM
• Algorithm, ID3, learns decision trees by
constructing them topdown, beginning with
the question "which attribute should be
tested at the root of the tree?”
• To answer this question, each instance
attribute is evaluated using a statistical test
to determine how well it alone classifies the
training examples.
• The best attribute is selected and used as the
test at the root node of the tree.Dr. Amit Kumar, Dept of CSE, JUET, Guna
THE ID3 LEARNING ALGORITHM
Dr. Amit Kumar, Dept of CSE, JUET, Guna
Which Attribute Is the Best Classifier?
• The central choice in the ID3 algorithm is
selecting which attribute to test at each node
in the tree.
• We will define a statistical property, called
information gain, that measures how well a
given attribute separates the training
examples according to their target
classification.
• ID3 uses this information gain measure to
select among the candidate attributes at
each step while growing the tree.Dr. Amit Kumar, Dept of CSE, JUET, Guna
ENTROPY MEASURES HOMOGENEITY OF EXAMPLES
• In order to define information gain precisely,
we begin by defining a measure commonly
used in information theory, called entropy,
that characterizes the (im)purity of an
arbitrary collection of examples.
• Given a collection S, containing positive and
negative examples of some target concept,
the entropy of S relative to this boolean
classification is:
Dr. Amit Kumar, Dept of CSE, JUET, Guna
ENTROPY MEASURES HOMOGENEITY OF EXAMPLES
• where p is the proportion of positive
examples in S and pΘ is the proportion of
negative examples in S.
• In all calculations involving entropy we define
0 log 0 to be 0.
• To illustrate, suppose S is a collection of 14
examples of some boolean concept,
including 9 positive and 5 negative examples
(we adopt the notation [9+, 5-] to summarize
such a sample of data). Then the entropy of S
relative to this boolean classification is:Dr. Amit Kumar, Dept of CSE, JUET, Guna
ENTROPY MEASURES HOMOGENEITY OF EXAMPLES
Entropy ([9+,5-]) = - (9/14) log2 (9/14)
- (5/14) log2 (5/14) = 0.940
• Notice that the entropy is 0 if all members of
S belong to the same class.
• Note the entropy is 1 when the collection
contains an equal number of positive and
negative examples.
• If the collection contains unequal numbers of
positive and negative examples, the entropy
is between 0 and 1.Dr. Amit Kumar, Dept of CSE, JUET, Guna
ENTROPY MEASURES HOMOGENEITY OF EXAMPLES
Dr. Amit Kumar, Dept of CSE, JUET, Guna
ENTROPY MEASURES HOMOGENEITY OF EXAMPLES
• More generally, if the target attribute can take on
c different values, then the entropy of S relative to
this c-wise classification is defined as –
where pi is the proportion of S belonging to class i.
Note the logarithm is still base 2 because entropy is
a measure of the expected encoding length
measured in bits.
• Note also that if the target attribute can take on c
possible values, the entropy can be as large as log,
c. Dr. Amit Kumar, Dept of CSE, JUET, Guna
INFORMATION GAIN MEASURES THE
EXPECTED REDUCTION IN ENTROPY
• Given entropy as a measure of the impurity
in a collection of training examples,
• we can now define a measure of the
effectiveness of an attribute in classifying the
training data.
• The measure we will use, called information
gain, is simply the expected reduction in
entropy caused by partitioning the examples
according to this attribute.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
INFORMATION GAIN MEASURES THE
EXPECTED REDUCTION IN ENTROPY
• More precisely, the information gain, Gain(S,
A) of an attribute A, relative to a collection of
examples S, is defined as:
where Values(A) is the set of all possible
values for attribute A, and Sv is the subset
of S for which attribute A has value v .
Dr. Amit Kumar, Dept of CSE, JUET, Guna
INFORMATION GAIN MEASURES THE
EXPECTED REDUCTION IN ENTROPY
• Note the first term in Equation is just the
entropy of the original collection S, and the
second term is the expected value of the
entropy after S is partitioned using attribute
A.
• The expected entropy described by this
second term is simply the sum of the
entropies of each subset Sv weighted by the
fraction of examples that belong to Sv.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
INFORMATION GAIN MEASURES THE
EXPECTED REDUCTION IN ENTROPY
• Gain(S,A) is therefore the expected
reduction in entropy caused by knowing the
value of attribute A. Put another way, Gain(S,
A) is the information provided about the
target & action value, given the value of
some other attribute A.
• The value of Gain(S, A) is the number of bits
saved when encoding the target value of an
arbitrary member of S, by knowing the value
of attribute A. Dr. Amit Kumar, Dept of CSE, JUET, Guna
INFORMATION GAIN MEASURES THE
EXPECTED REDUCTION IN ENTROPY
• For example, suppose S is a collection of
training-example days described by
attributes including Wind, which can have
the values Weak or Strong.
• As before, assume S is a collection containing
14 examples, [9+, 5-].
• Of these 14 examples, suppose 6 of the
positive and 2 of the negative examples have
Wind = Weak, and the remainder have Wind
= Strong. Dr. Amit Kumar, Dept of CSE, JUET, Guna
Training Examples for the target concept
PalyTennis
Dr. Amit Kumar, Dept of CSE, JUET, Guna
INFORMATION GAIN MEASURES THE
EXPECTED REDUCTION IN ENTROPY
• The information gain due to sorting the original 14
examples by the attribute Wind may then be calculated
as:
Dr. Amit Kumar, Dept of CSE, JUET, Guna
INFORMATION GAIN MEASURES
Dr. Amit Kumar, Dept of CSE, JUET, Guna
INFORMATION GAIN MEASURES
• ID3 determines the information gain for each
candidate attribute (i.e., Outlook, Temperature,
Humidity, and Wind), then selects the one with
highest information gain. The computation of
information gain for two of these attributes is
shown in Figure (last slide). The information gain
values for all four attributes are:
Gain(S, Outlook) = 0.246
Gain(S, Humidity) = 0.151
Gain(S, Wind) = 0.048
Gain(S, Temperature) = 0.029Dr. Amit Kumar, Dept of CSE, JUET, Guna
INFORMATION GAIN MEASURES
• According to the information gain measure, the
Outlook attribute provides the best prediction of
the target attribute, PlayTennis over the training
examples.
• Therefore, Outlook is selected as the decision
attribute for the root node, and branches are
created below the root for each of its possible
values (i.e., Sunny, Overcast, and Rain).
• The partially learned decision tree resulting from
the first step of ID3 is shown on next slide.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
Dr. Amit Kumar, Dept of CSE, JUET, Guna
HYPOTHESIS SPACE SEARCH IN DECISION TREE LEARNING
• As with other inductive learning methods, ID3
can be characterized as searching a space of
hypotheses for one that fits the training
examples.
• The hypothesis space searched by ID3 is the set
of possible decision trees.
• ID3 performs a simple-to-complex, hill-climbing
search through this hypothesis space, beginning
with the empty tree, then considering
progressively more elaborate hypotheses in
search of a decision tree that correctly classifies
the training data.Dr. Amit Kumar, Dept of CSE, JUET, Guna
HYPOTHESIS SPACE SEARCH IN DECISION TREE LEARNING
By viewing ID3 in terms of its search space and
search strategy, we can get some insight into its
capabilities and limitations.
• ID3’s hypothesis space of all decision trees is a
complete space of finite discrete-valued
functions, relative to the available attributes.
Because every finite discrete-valued function can
be represented by some decision tree, ID3 avoids
one of the major risks of methods that search
incomplete hypothesis spaces, that the
hypothesis space might not contain the target
function.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
HYPOTHESIS SPACE SEARCH IN DECISION TREE LEARNING
• ID3 maintains only a single current
hypothesis as it searches through the space
of decision trees. This contrasts, for
example, with the earlier version space
candidate-Elimination method, which
maintains the set of all hypotheses
consistent with the available training
examples. By determining only a single
hypothesis, ID3 loses the capabilities that
follow from explicitly representing all
consistent hypotheses.Dr. Amit Kumar, Dept of CSE, JUET, Guna
HYPOTHESIS SPACE SEARCH IN DECISION TREE LEARNING
• ID3 in its pure form performs no backtracking in
its search. Once it, selects an attribute to test at
a particular level in the tree, it never backtracks
to reconsider this choice. Therefore, it is
susceptible to the usual risks of hill-climbing
search without backtracking: converging to
locally optimal solutions that are not globally
optimal. In the case of ID3, a locally optimal
solution corresponds to the decision tree it
selects along the single search path it explores.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
HYPOTHESIS SPACE SEARCH IN DECISION TREE LEARNING
• ID3 uses all training examples at each step in the
search to make statistically based decisions
regarding how to refine its current hypothesis.
This contrasts with methods that make decisions
incrementally, based on individual training
examples (e.g., FIND-S or CANDIDATE-
ELIMINATOIN ). One advantage of using
statistical properties of all the examples (e.g.,
information gain) is that the resulting search is
much less sensitive to errors in individual
training examples.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
HYPOTHESIS SPACE SEARCH IN DECISION TREE LEARNING
• ID3 can be easily extended to handle noisy
training data by modifying its termination
criterion to accept hypotheses that imperfectly
fit the training data.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
INDUCTIVE BIAS IN DECISION TREE LEARNING
• Inductive bias is the set of assumptions that,
together with the training data, deductively
justify the classifications assigned by the learner
to future instances.
• Given a collection of training examples, there are
typically many decision trees consistent with
these examples.
• Describing the inductive bias of ID3 therefore
consists of describing the basis by which it
chooses one of these consistent hypotheses over
the others.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
INDUCTIVE BIAS IN DECISION TREE LEARNING
• Which of these decision trees does ID3 choose?
• It chooses the first acceptable tree it encounters
in its simple-to-complex, hill climbing search
through the space of possible trees.
• Roughly speaking, then, the ID3 search strategy
(a) selects in favor of shorter trees over longer
ones, and
(b) selects trees that place the attributes with
highest information gain closest to the root.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
INDUCTIVE BIAS IN DECISION TREE LEARNING
• Because of the subtle interaction
between the attribute selection
heuristic used by ID3 and the particular
training examples it encounters, it is
difficult to characterize precisely the
inductive bias exhibited by ID3.
• However, we can approximately
characterize its bias as a preference for
short decision trees over complex
trees.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
INDUCTIVE BIAS IN DECISION TREE LEARNING
• Approximate inductive bias of ID3:
Shorter trees are preferred over larger
trees.
• A closer approximation to the
inductive bias of ID3: Shorter trees are
preferred over longer trees. Trees that
place high information gain attributes
close to the root are preferred over
those that do not.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
Why Prefer Short Hypotheses?
• Is ID3's inductive bias favoring shorter decision
trees a sound basis for generalizing beyond the
training data?
• Philosophers and others have debated this
question for centuries, and the debate remains
unresolved to this day.
• William of Occam was one of the first to discusst
the question, around the year 1320, so this bias
often goes by the name of Occam's razor.
• Occam's razor: Prefer the simplest hypothesis
that fits the data.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
ISSUES IN DECISION TREE LEARNING?
• Practical issues in learning decision trees include:
• Determining how deeply to grow the decision
tree, (Avoiding Overfitting the Data, Reduced
Error Pruning, Rule Post-pruning)
• Handling continuous attributes,
• Choosing an appropriate attribute selection
measure,
• Handling training data with missing attribute
values,
• Handling attributes with differing costs, and
• Improving computational efficiency.Dr. Amit Kumar, Dept of CSE, JUET, Guna
Chapter 4
ARTIFICIAL
NEURAL
NETWORKS
Dr. Amit Kumar, Dept of CSE, JUET, Guna
INTRODUCTION
• Artificial neural networks (ANNs) provide a
general, practical method for learning real-
valued, discrete-valued, and vector-valued
functions from examples.
• Algorithms such as BACKPROPAGATION, gradient
descent to tune network parameters to best fit a
training set of input-output pairs.
• ANN learning is robust to errors in the training
data and has been successfully applied to
problems such as interpreting visual scenes,
speech recognition, and learning robot control
strategies. Dr. Amit Kumar, Dept of CSE, JUET, Guna
Biological Motivation
• The study of artificial neural networks (ANNs)
has been inspired in part by the observation
that biological learning systems are built of
very complex webs of interconnected
neurons.
• The human brain is estimated to contain a
densely interconnected network of
approximately 1011 neurons, each connected,
on average, to 104 others.
• Neuron activity is typically excited or inhibited
through connections to other neurons.Dr. Amit Kumar, Dept of CSE, JUET, Guna
Biological Motivation
• The fastest neuron switching times are
known to be on the order of 10-3 seconds-
quite slow compared to computer switching
speeds of 10-10 seconds.
• Yet humans are able to make surprisingly
complex decisions, surprisingly quickly.
• For example, it requires approximately 10-1
seconds to visually recognize your mother.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
Biological Motivation
• This observation has led many to speculate
that the information-processing abilities of
biological neural systems must follow from
highly parallel processes operating on
representations that are distributed over
many neurons.
• One motivation for ANN systems is to
capture this kind of highly parallel
computation based on distributed
representations.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
Biological Motivation
• Most ANN software runs on sequential
machines emulating distributed processes,
although faster versions of the algorithms
have also been implemented on highly
parallel machines and on specialized
hardware designed specifically for ANN
applications.
• While ANNs are loosely motivated by
biological neural systems, there are many
complexities to biological neural systems that
are not modeled by ANNs.Dr. Amit Kumar, Dept of CSE, JUET, Guna
Biological Motivation
• Historically, two groups of researchers have
worked with artificial neural networks.
• One group has been motivated by the goal of
using ANNs to study and model biological
learning processes.
• A second group has been motivated by the
goal of obtaining highly effective machine
learning algorithms, independent of whether
these algorithms mirror biological processes.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
NEURAL NETWORK - Research History
• McCulloch and Pitts (1943) are generally recognized
as the designers of the first neural network.
• They combined many simple processing units
together that could lead to an overall increase in
computational power.
• They suggested many ideas like : a neuron has a
threshold level and once that level is reached the
neuron fires.
• It is still the fundamental way in which ANNs
operate. The McCulloch and Pitts's network had a
fixed set of weights.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
NEURAL NETWORK - Research History
• Hebb (1949) developed the first learning rule, that
is if two neurons are active at the same time then
the strength between them should be increased.
• In the 1950 and 60's, many researchers (Block,
Minsky, Papert, and Rosenblatt worked on
perceptron.
• The neural network model could be proved to
converge to the correct weights, that will solve the
problem.
• The weight adjustment (learning algorithm) used in
the perceptron was found more powerful than the
learning rules used by Hebb.Dr. Amit Kumar, Dept of CSE, JUET, Guna
NEURAL NETWORK - Research History
• Minsky & Papert (1969) showed that perceptron
could not learn those functions which are not
linearly separable.
• The neural networks research declined throughout
the 1970 and until mid 80's because the perceptron
could not learn certain important functions.
• Neural network regained importance in 1985-86.
The researchers, Parker and LeCun discovered a
learning algorithm for multi-layer networks called
back propagation that could solve problems that
were not linearly separable.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
Artificial Neuron Model
• An artificial neuron is a mathematical function
conceived as a simple model of a real (biological)
neuron.
• The McCulloch-Pitts Neuron: This is a simplified
model of real neurons, known as a Threshold Logic
Unit.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
PERCEPTRONS
• One type of ANN system is based on a unit called a
perceptron. A perceptron takes a vector of real-
valued inputs, calculates a linear combination of
these inputs, then outputs a 1 if the result is
greater than some threshold and -1 otherwise.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
PERCEPTRONS
• More precisely, given inputs x1 through xn the
output o(x1, . . . , xn) computed by the perceptron
is:
• where each wi is a real-valued constant, or weight,
that determines the contribution of input xi to the
perceptron output.
• To simplify notation, we imagine an additional
constant input x0 = 1, allowing us to write the
above inequality as > 0
Dr. Amit Kumar, Dept of CSE, JUET, Guna
PERCEPTRONS
• In vector form as For brevity, we will
sometimes write the perceptron function as :
• Where
• Learning a perceptron involves choosing values for
the weights w0, . . . , wn.
• Therefore, the space H of candidate hypotheses
considered in perceptron learning is the set of all
possible real-valued weight vectors.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
Representational Power of Perceptrons
• We can view the perceptron as representing a
hyperplane decision surface in the n-dimensional
space of instances (i.e., points).
• The perceptron outputs a 1 for instances lying on
one side of the hyperplane and outputs a -1 for
instances lying on the other side, as illustrated in
Figure The decision surface represented by a
two-input perceptron.
(a) A set of training examples and the
decision surface of a perceptron that
classifies them correctly.
(b) A set of training examples that is
not linearly separable (i.e., that cannot
be correctly classified by any straight
line). xl and x2 are the Perceptron
inputs.Dr. Amit Kumar, Dept of CSE, JUET, Guna
Representational Power of Perceptrons
• To implement the AND function is to set the
weights w0 = -0.8 , and w1 = w2 = 0.5.
• This perceptron can be made to represent the OR
function instead by altering the threshold to
w0 = - 0.3.
• In fact, AND and OR can be viewed as special cases
of m-of-n functions: that is, functions where at
least m of the n inputs to the perceptron must be
true.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
The Perceptron Training Rule
• Several algorithms are known to solve
learning problem.
• The perceptron rule and the delta rule .
These two algorithms are guaranteed to
converge to somewhat different acceptable
hypotheses, under somewhat different
conditions.
• They are important to ANNs because they
provide the basis for learning networks of
many units.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
The Perceptron Training Rule
• One way to learn an acceptable weight
vector is to begin with random weights, then
iteratively apply the perceptron to each
training example, modifying the perceptron
weights whenever it misclassifies an
example.
• This process is repeated, iterating through
the training examples as many times as
needed until the perceptron classifies all
training examples correctly.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
The Perceptron Training Rule
• Weights are modified at each step according to the
perceptron training rule, which revises the weight
wi associated with input xi according to the rule –
Where
t = target output, o = observed output, η = learning
rate. ∆ wi = weight update
Dr. Amit Kumar, Dept of CSE, JUET, Guna
Gradient descent algorithm for training a
linear unit.
Dr. Amit Kumar, Dept of CSE, JUET, Guna
MULTILAYER NETWORKS AND THE BACKPROPAGATION
ALGORITHM
Dr. Amit Kumar, Dept of CSE, JUET, Guna

Más contenido relacionado

La actualidad más candente

Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter TuningJon Lederman
 
Machine learning seminar ppt
Machine learning seminar pptMachine learning seminar ppt
Machine learning seminar pptRAHUL DANGWAL
 
Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)SwatiTripathi44
 
introduction to machin learning
introduction to machin learningintroduction to machin learning
introduction to machin learningnilimapatel6
 
Classification using back propagation algorithm
Classification using back propagation algorithmClassification using back propagation algorithm
Classification using back propagation algorithmKIRAN R
 
Deep Learning Tutorial
Deep Learning TutorialDeep Learning Tutorial
Deep Learning TutorialAmr Rashed
 
Machine learning
Machine learningMachine learning
Machine learningeonx_32
 
Autoencoders
AutoencodersAutoencoders
AutoencodersCloudxLab
 
Using prior knowledge to initialize the hypothesis,kbann
Using prior knowledge to initialize the hypothesis,kbannUsing prior knowledge to initialize the hypothesis,kbann
Using prior knowledge to initialize the hypothesis,kbannswapnac12
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachinePulse
 
Machine Learning
Machine LearningMachine Learning
Machine LearningRahul Kumar
 
Classification and Regression
Classification and RegressionClassification and Regression
Classification and RegressionMegha Sharma
 
Back propagation
Back propagationBack propagation
Back propagationNagarajan
 

La actualidad más candente (20)

Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
 
Machine learning seminar ppt
Machine learning seminar pptMachine learning seminar ppt
Machine learning seminar ppt
 
Machine learning
Machine learningMachine learning
Machine learning
 
Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)
 
introduction to machin learning
introduction to machin learningintroduction to machin learning
introduction to machin learning
 
Classification using back propagation algorithm
Classification using back propagation algorithmClassification using back propagation algorithm
Classification using back propagation algorithm
 
Deep Learning Tutorial
Deep Learning TutorialDeep Learning Tutorial
Deep Learning Tutorial
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Machine learning
Machine learningMachine learning
Machine learning
 
Introduction to Deep Learning
Introduction to Deep Learning Introduction to Deep Learning
Introduction to Deep Learning
 
Autoencoders
AutoencodersAutoencoders
Autoencoders
 
Machine learning
Machine learningMachine learning
Machine learning
 
Using prior knowledge to initialize the hypothesis,kbann
Using prior knowledge to initialize the hypothesis,kbannUsing prior knowledge to initialize the hypothesis,kbann
Using prior knowledge to initialize the hypothesis,kbann
 
Machine learning
Machine learningMachine learning
Machine learning
 
machine learning
machine learningmachine learning
machine learning
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World Applications
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Classification and Regression
Classification and RegressionClassification and Regression
Classification and Regression
 
Back propagation
Back propagationBack propagation
Back propagation
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
 

Similar a Machine learning

Machine Learning & The Future Of AI
Machine Learning & The Future Of AIMachine Learning & The Future Of AI
Machine Learning & The Future Of AIAnkan Das
 
Artificial Intelligence and Machine Learning
Artificial Intelligence and Machine Learning Artificial Intelligence and Machine Learning
Artificial Intelligence and Machine Learning Aditya Singh
 
Hala GPT - Samer Desouky.pdf
Hala GPT - Samer Desouky.pdfHala GPT - Samer Desouky.pdf
Hala GPT - Samer Desouky.pdfSamer Desouky
 
ARTIFICIAL INTELLIGENCE.pptx
ARTIFICIAL INTELLIGENCE.pptxARTIFICIAL INTELLIGENCE.pptx
ARTIFICIAL INTELLIGENCE.pptxBryCunal
 
Machine-Learning-and-Robotics.pptx
Machine-Learning-and-Robotics.pptxMachine-Learning-and-Robotics.pptx
Machine-Learning-and-Robotics.pptxshohel rana
 
Artificial intelligence 13cs041
Artificial intelligence 13cs041Artificial intelligence 13cs041
Artificial intelligence 13cs041Manik Behl
 
Selected topics in Computer Science
Selected topics in Computer Science Selected topics in Computer Science
Selected topics in Computer Science Melaku Bayih Demessie
 
ARTIFICIAL intelligence ppt presentation
ARTIFICIAL intelligence ppt presentationARTIFICIAL intelligence ppt presentation
ARTIFICIAL intelligence ppt presentationGuruPrakash763066
 
Big data, big opportunities
Big data, big opportunitiesBig data, big opportunities
Big data, big opportunitiesChouaieb NEMRI
 
Artificial Intteligence-unit 1.pptx
Artificial Intteligence-unit 1.pptxArtificial Intteligence-unit 1.pptx
Artificial Intteligence-unit 1.pptxhoneydv1979
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligencefalepiz
 
Unit No. 1 Artificial Intelligence Overview.pptx
Unit No. 1 Artificial Intelligence Overview.pptxUnit No. 1 Artificial Intelligence Overview.pptx
Unit No. 1 Artificial Intelligence Overview.pptxswagatkarve
 
Chapter four AI and modern world part two.pptx
Chapter four AI and modern world part two.pptxChapter four AI and modern world part two.pptx
Chapter four AI and modern world part two.pptxbestboybulshaawi
 
Artificialintelligence
ArtificialintelligenceArtificialintelligence
ArtificialintelligenceRavi Rao
 
Machine Learning in Robotics
Machine Learning in RoboticsMachine Learning in Robotics
Machine Learning in Roboticsijtsrd
 
Artificial Intelligence and life in 2030
Artificial Intelligence and life  in 2030Artificial Intelligence and life  in 2030
Artificial Intelligence and life in 2030Muazzam ali
 

Similar a Machine learning (20)

Machine Learning & The Future Of AI
Machine Learning & The Future Of AIMachine Learning & The Future Of AI
Machine Learning & The Future Of AI
 
Artificial Intelligence and Machine Learning
Artificial Intelligence and Machine Learning Artificial Intelligence and Machine Learning
Artificial Intelligence and Machine Learning
 
Hala GPT - Samer Desouky.pdf
Hala GPT - Samer Desouky.pdfHala GPT - Samer Desouky.pdf
Hala GPT - Samer Desouky.pdf
 
Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligence
 
ARTIFICIAL INTELLIGENCE.pptx
ARTIFICIAL INTELLIGENCE.pptxARTIFICIAL INTELLIGENCE.pptx
ARTIFICIAL INTELLIGENCE.pptx
 
Machine-Learning-and-Robotics.pptx
Machine-Learning-and-Robotics.pptxMachine-Learning-and-Robotics.pptx
Machine-Learning-and-Robotics.pptx
 
Artificial intelligence 13cs041
Artificial intelligence 13cs041Artificial intelligence 13cs041
Artificial intelligence 13cs041
 
Selected topics in Computer Science
Selected topics in Computer Science Selected topics in Computer Science
Selected topics in Computer Science
 
ARTIFICIAL intelligence ppt presentation
ARTIFICIAL intelligence ppt presentationARTIFICIAL intelligence ppt presentation
ARTIFICIAL intelligence ppt presentation
 
Big data, big opportunities
Big data, big opportunitiesBig data, big opportunities
Big data, big opportunities
 
Artificial Intteligence-unit 1.pptx
Artificial Intteligence-unit 1.pptxArtificial Intteligence-unit 1.pptx
Artificial Intteligence-unit 1.pptx
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
 
Ai introduction ppt
Ai introduction pptAi introduction ppt
Ai introduction ppt
 
Artificial Intelligence (AI).pptx
Artificial Intelligence (AI).pptxArtificial Intelligence (AI).pptx
Artificial Intelligence (AI).pptx
 
Unit No. 1 Artificial Intelligence Overview.pptx
Unit No. 1 Artificial Intelligence Overview.pptxUnit No. 1 Artificial Intelligence Overview.pptx
Unit No. 1 Artificial Intelligence Overview.pptx
 
Chapter four AI and modern world part two.pptx
Chapter four AI and modern world part two.pptxChapter four AI and modern world part two.pptx
Chapter four AI and modern world part two.pptx
 
Overview of AI.pptx
Overview of AI.pptxOverview of AI.pptx
Overview of AI.pptx
 
Artificialintelligence
ArtificialintelligenceArtificialintelligence
Artificialintelligence
 
Machine Learning in Robotics
Machine Learning in RoboticsMachine Learning in Robotics
Machine Learning in Robotics
 
Artificial Intelligence and life in 2030
Artificial Intelligence and life  in 2030Artificial Intelligence and life  in 2030
Artificial Intelligence and life in 2030
 

Más de Amit Kumar Rathi

Hybrid Systems using Fuzzy, NN and GA (Soft Computing)
Hybrid Systems using Fuzzy, NN and GA (Soft Computing)Hybrid Systems using Fuzzy, NN and GA (Soft Computing)
Hybrid Systems using Fuzzy, NN and GA (Soft Computing)Amit Kumar Rathi
 
Fundamentals of Genetic Algorithms (Soft Computing)
Fundamentals of Genetic Algorithms (Soft Computing)Fundamentals of Genetic Algorithms (Soft Computing)
Fundamentals of Genetic Algorithms (Soft Computing)Amit Kumar Rathi
 
Fuzzy Systems by using fuzzy set (Soft Computing)
Fuzzy Systems by using fuzzy set (Soft Computing)Fuzzy Systems by using fuzzy set (Soft Computing)
Fuzzy Systems by using fuzzy set (Soft Computing)Amit Kumar Rathi
 
Fuzzy Set Theory and Classical Set Theory (Soft Computing)
Fuzzy Set Theory and Classical Set Theory (Soft Computing)Fuzzy Set Theory and Classical Set Theory (Soft Computing)
Fuzzy Set Theory and Classical Set Theory (Soft Computing)Amit Kumar Rathi
 
Associative Memory using NN (Soft Computing)
Associative Memory using NN (Soft Computing)Associative Memory using NN (Soft Computing)
Associative Memory using NN (Soft Computing)Amit Kumar Rathi
 
Back Propagation Network (Soft Computing)
Back Propagation Network (Soft Computing)Back Propagation Network (Soft Computing)
Back Propagation Network (Soft Computing)Amit Kumar Rathi
 
Fundamentals of Neural Network (Soft Computing)
Fundamentals of Neural Network (Soft Computing)Fundamentals of Neural Network (Soft Computing)
Fundamentals of Neural Network (Soft Computing)Amit Kumar Rathi
 
Introduction to Soft Computing (intro to the building blocks of SC)
Introduction to Soft Computing (intro to the building blocks of SC)Introduction to Soft Computing (intro to the building blocks of SC)
Introduction to Soft Computing (intro to the building blocks of SC)Amit Kumar Rathi
 
Sccd and topological sorting
Sccd and topological sortingSccd and topological sorting
Sccd and topological sortingAmit Kumar Rathi
 
Recurrence and master theorem
Recurrence and master theoremRecurrence and master theorem
Recurrence and master theoremAmit Kumar Rathi
 

Más de Amit Kumar Rathi (20)

Hybrid Systems using Fuzzy, NN and GA (Soft Computing)
Hybrid Systems using Fuzzy, NN and GA (Soft Computing)Hybrid Systems using Fuzzy, NN and GA (Soft Computing)
Hybrid Systems using Fuzzy, NN and GA (Soft Computing)
 
Fundamentals of Genetic Algorithms (Soft Computing)
Fundamentals of Genetic Algorithms (Soft Computing)Fundamentals of Genetic Algorithms (Soft Computing)
Fundamentals of Genetic Algorithms (Soft Computing)
 
Fuzzy Systems by using fuzzy set (Soft Computing)
Fuzzy Systems by using fuzzy set (Soft Computing)Fuzzy Systems by using fuzzy set (Soft Computing)
Fuzzy Systems by using fuzzy set (Soft Computing)
 
Fuzzy Set Theory and Classical Set Theory (Soft Computing)
Fuzzy Set Theory and Classical Set Theory (Soft Computing)Fuzzy Set Theory and Classical Set Theory (Soft Computing)
Fuzzy Set Theory and Classical Set Theory (Soft Computing)
 
Associative Memory using NN (Soft Computing)
Associative Memory using NN (Soft Computing)Associative Memory using NN (Soft Computing)
Associative Memory using NN (Soft Computing)
 
Back Propagation Network (Soft Computing)
Back Propagation Network (Soft Computing)Back Propagation Network (Soft Computing)
Back Propagation Network (Soft Computing)
 
Fundamentals of Neural Network (Soft Computing)
Fundamentals of Neural Network (Soft Computing)Fundamentals of Neural Network (Soft Computing)
Fundamentals of Neural Network (Soft Computing)
 
Introduction to Soft Computing (intro to the building blocks of SC)
Introduction to Soft Computing (intro to the building blocks of SC)Introduction to Soft Computing (intro to the building blocks of SC)
Introduction to Soft Computing (intro to the building blocks of SC)
 
Topological sorting
Topological sortingTopological sorting
Topological sorting
 
String matching, naive,
String matching, naive,String matching, naive,
String matching, naive,
 
Shortest path algorithms
Shortest path algorithmsShortest path algorithms
Shortest path algorithms
 
Sccd and topological sorting
Sccd and topological sortingSccd and topological sorting
Sccd and topological sorting
 
Red black trees
Red black treesRed black trees
Red black trees
 
Recurrence and master theorem
Recurrence and master theoremRecurrence and master theorem
Recurrence and master theorem
 
Rabin karp string matcher
Rabin karp string matcherRabin karp string matcher
Rabin karp string matcher
 
Minimum spanning tree
Minimum spanning treeMinimum spanning tree
Minimum spanning tree
 
Merge sort analysis
Merge sort analysisMerge sort analysis
Merge sort analysis
 
Loop invarient
Loop invarientLoop invarient
Loop invarient
 
Linear sort
Linear sortLinear sort
Linear sort
 
Heap and heapsort
Heap and heapsortHeap and heapsort
Heap and heapsort
 

Último

NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...Amil baba
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxSCMS School of Architecture
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"mphochane1998
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Arindam Chakraborty, Ph.D., P.E. (CA, TX)
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaOmar Fathy
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxSCMS School of Architecture
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiessarkmank1
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilVinayVitekari
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Servicemeghakumariji156
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Call Girls Mumbai
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsvanyagupta248
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdfKamal Acharya
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARKOUSTAV SARKAR
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityMorshed Ahmed Rahath
 

Último (20)

NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and properties
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech Civil
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 

Machine learning

  • 1. Machine Learning Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 2. Chapter 1- Introduction Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 3. Machine Learning Chapter 1- Introduction • The field of machine learning is concerned with the question of how to construct computer programs that automatically improve with experience. • In recent years many successful machine learning applications have been developed. – data-mining programs – detect fraudulent credit card transactions – information-filtering systems – autonomous vehicles Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 4. Machine Learning • The goal of this course is to present the key algorithms and theory that form the core of machine learning. • Machine learning draws on concepts and results from many fields, including – –statistics, artificial intelligence, philosophy, –information theory, biology, cognitive science, –computational complexity, and control theory. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 5. Machine Learning • A few specific achievements provide a glimpse of the state of the art: – Programs have been developed that successfully learn to recognize spoken words (Waibel 1989; Lee 1989) – predict recovery rates of pneumonia patients (Cooper et al. 1997) – detect fraudulent use of credit cards, drive autonomous vehicles on public highways (Pomerleau 1989) – play games such as backgammon at levels approaching the performance of human world champions (Tesauro 1992,1995). Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 6. Machine Learning • Learning to recognize spoken words: • All of the most successful speech recognition systems employ machine learning in some form. • For example, the SPHINX system (e.g., Lee 1989) learns speaker-specific strategies for recognizing the primitive sounds (phonemes) and words from the observed speech signal. • Neural network learning methods (e.g., Waibel et al. 1989) and methods for learning hidden Markov models (e.g., Lee 1989) are effective for automatically customizing to individual speakers, vocabularies, microphone characteristics, background noise, etc. • Similar techniques have potential applicationsin many signal-interpretation problems. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 7. Machine Learning • Automated Transportation • Machine learning methods have been used to train computer-controlled vehicles to steer correctly when driving on a variety of road types. • For example, the ALVINN system (Pomerleau 1989) has used its learned strategies to drive unassisted at 70 miles per hour for 90 miles on public highways among other cars. • Similar techniques have possible applications in many sensor-based control problems. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 8. Machine Learning • Automated Transportation (contd..) Google began testing a self-driving car in 2012, and since then, the U.S. Department of Transportation has released definitions of different levels of automation, with Google’s car classified as the first level down from full automation. Other transportation methods are closer to full automation, such as buses and trains. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 9. Machine Learning Classify new astronomical structures. • Machine learning methods have been applied to a variety of large databases to learn general regularities implicit in the data. • For example, decision tree learning algorithms have been used by NASA to learn how to classify celestial objects from the second Palomar Observatory Sky Survey (Fayyad et al. 1995). • This system is now used to automatically classify all objects in the Sky Survey, which consists of three terrabytes of image data. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 10. Machine Learning To play world-class backgammon. • The most successful computer programs for playing games such as backgammon are based on machiie learning algorithms. • For example, the world's top computer program for backgammon, TD-GAMMON(T esauro 1992, 1995). learned its strategy by playing over one million practice games against itself. • It now plays at a level competitive with the human world champion. • Similar techniques have applications in many practical problems where very large search spaces must be examined efficiently. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 11. Machine Learning Cyborg Technology • Researcher Shimon Whiteson thinks that in the future, we will be able to augment ourselves with computers and enhance many of our own natural abilities. • Though many of these possible cyborg enhancements would be added for convenience, others might serve a more practical purpose. • Yoky Matsuka of Nest believes that AI will become useful for people with amputated limbs, as the brain will be able to communicate with a robotic limb to give the patient more control. • This kind of cyborg technology would significantly reduce the limitations that amputees deal with on a daily basis. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 12. Machine Learning Taking over dangerous jobs • Robots are already taking over some of the most hazardous jobs available, including bomb defusing. • These robots aren’t quite robots yet, according to the BBC. They are technically drones, being used as the physical counterpart for defusing bombs, but requiring a human to control them, rather than using machine learning. • Other jobs are also being reconsidered for robot integration. Welding, well known for producing toxic substances, intense heat, and earsplitting noise, can now be outsourced to robots in most cases. • Robot Worx explains that robotic welding cells are already in use, and have safety features in place to help prevent human workers from fumes and other bodily harm. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 13. Machine Learning Solving climate change • Solving climate change might seem like a tall order from a robot, but as Stuart Russell explains, machines have more access to data than one person ever could—storing a mind- boggling number of statistics. • Using big data, machine learning could one day identify trends and use that information to come up with solutions to the world’s biggest problems. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 14. Machine Learning Robot as friends • Who wouldn’t want a friend like C-3PO? • C-3PO is a humanoid robot character from the ”Star Wars”. • Built by Anakin Skywalker, C-3PO was designed as a protocol droid intended to assist in etiquette, customs, and translation, boasting that he is "fluent in over six million forms of communication"Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 15. Machine Learning Robot as friends (Contd..) • At this stage, most robots are still emotionless and it’s hard to picture a robot you could relate to. However, a company in Japan has made the first big steps toward a robot companion—one who can understand and feel emotions. • Introduced in 2014, “Pepper” the companion robot went on sale in 2015, with all 1,000 initial units selling out within a minute. • The robot was programmed to read human emotions, develop its own emotions, and help its human friends stay happy. Pepper goes on sale in the U.S. in 2016, and more sophisticated friendly robots are sure to follow. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 16. Machine Learning Improved elder care and medical care • For many seniors, everyday life is a struggle, and many have to hire outside help to manage their care, or rely on family members. • Machine learning and AI is at a stage where replacing this need isn’t too far off, says Matthew Taylor, computer scientist at Washington State University. • “Home” robots could help seniors with everyday tasks and allow them to stay independent and in their homes for as long as possible, which improves their overall well-being. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 17. Machine Learning Sophia • The first robot to be awarded citizenship in the world, has said she not only wants to start a family but also have her own career, in addition to developing human emotions in the future. • Sophia, the humanoid AI robot has sprung back to headlines after saying she wants to start a family of her own and also mentioning that all droids deserve to have children. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 18. Machine Learning The robot running for NZ Prime Minister in 2020 • The era of intelligent machines is well and truly upon us. After Sophia was confirmed as the world's first robot citizen in Saudi Arabia last month, it's time for us to gape in awe at SAM, who's become the world's first AI-powered virtual politician from New Zealand. • SAM's creator, Nick Gerritsen, believes that the virtual politician may be able to contest New Zealand's general elections in 2020. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 19. Machine Learning IBM Watson • Watson is a question answering computer system capable of answering questions posed in natural language, developed in IBM's DeepQA project. • The computer system was specifically developed to answer questions on the quiz show Jeopardy! and, in 2011, the Watson computer system competed on Jeopardy! against former winners Brad Rutter and Ken Jennings] winning the first place prize of $1 million. • Watson had access to 200 million pages of structured and unstructured content consuming four terabytes of disk storage including the full text of Wikipedia, but was not connected to the Internet during the game. For each clue, Watson's three most probable responses were displayed on the television screen. Watson consistently outperformed its human opponents on the game's signaling device, but had trouble in a few categories, notably those having short clues containing only a few words. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 20. WELL-POSED LEARNING PROBLEMS • Definition: A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improve • For example, a computer program that learns to play checkers might improve its performance as measured by its ability to win at the class of tasks involving playing checkers games, through experience obtained by playing games against itself with experience E. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 21. WELL-POSED LEARNING PROBLEMS • In general, to have a well-defined learning problem, we must identity these three features: • the class of tasks (T) • the measure of performance to be improved (P), • and the source of experience (E). Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 22. WELL-POSED LEARNING PROBLEMS • A checkers learning problem: –Task T: playing checkers –Performance measure P: percent of games won against opponents –Training experience E: playing practice games against itself • We can specify many learning problems in this fashion, such as learning to recognize handwritten words, or learning to drive a robotic automobile autonomously. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 23. WELL-POSED LEARNING PROBLEMS • A handwriting recognition learning problem: –Task T: recognizing and classifying handwritten words within images –Performance measure P: percent of words correctly classified –Training experience E: a database of handwritten words with given classifications Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 24. WELL-POSED LEARNING PROBLEMS • A robot driving learning problem: –Task T: driving on public four-lane highways using vision sensors –Performance measure P: average distance traveled before an error (as judged by human overseer) –Training experience E: a sequence of images and steering commands recorded while observing a human driver Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 25. DESIGNING A LEARNING SYSTEM • let us consider designing a program to learn to play checkers, • with the goal of entering it in the world checkers tournament. • We adopt the obvious performance measure: the percent of games it wins in this world tournament. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 26. DESIGNING A LEARNING SYSTEM Choosing the Training Experience: • The first design choice we face is to choose the type of training experience from which our system will learn. • The type of training experience available can have a significant impact on success or failure of the learner. • One key attribute is whether the training experience provides direct or indirect feedback regarding the choices made by the performance system. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 27. DESIGNING A LEARNING SYSTEM Choosing the Training Experience: • For example, in learning to play checkers, the system might learn from direct training examples consisting of individual checkers board. • Alternatively, it might have available only indirect information consisting of the move sequences and final outcomes of various games played. • In this later case, information about the correctness of specific moves early in the game must be inferred indirectly from the fact that the game was eventually won or lost. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 28. DESIGNING A LEARNING SYSTEM Choosing the Training Experience: • the learner faces an additional problem of credit assignment, or determining the degree to which each move in the sequence deserves credit or blame for the final outcome. • Credit assignment can be a particularly difficult problem because the game can be lost even when early moves are optimal, if these are followed later by poor moves. • Hence, learning from direct training feedback is typically easier than learning from indirect feedback. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 29. DESIGNING A LEARNING SYSTEM • Choosing the Training Experience: • A second important attribute of the training experience is the degree to which the learner controls the sequence of training examples. • For example, the learner might rely on the teacher to select informative board states and to provide the correct move for each.Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 30. DESIGNING A LEARNING SYSTEM • Choosing the Training Experience: • Alternatively, the learner might itself propose board states that it finds particularly confusing and ask the teacher for the correct move. • Or the learner may have complete control over both the board states and (indirect) training classifications, as it does when it learns by playing against itself with no teacher present.Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 31. DESIGNING A LEARNING SYSTEM • Choosing the Training Experience: • A third important attribute of the training experience is how well it represents the distribution of examples over which the final system performance P must be measured. • In our checkers learning scenario, the performance metric P is the percent of games the system wins in the world tournament. • If its training experience E consists only of games played against itself.Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 32. DESIGNING A LEARNING SYSTEM • To proceed with our design, let us decide that our system will train by playing games against itself. • This has the advantage that no external trainer need be present, • and it therefore allows the system to generate as much training data as time permits. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 33. DESIGNING A LEARNING SYSTEM A checkers learning problem • Task T: playing checkers • Performance measure P: percent of games won in the world tournament • Training experience E: games played against itself In order to complete the design of the learning system, we must now choose: 1. the exact type of knowledge to be learned 2. a representation for this target knowledge 3. a learning mechanismDr. Amit Kumar, Dept of CSE, JUET, Guna
  • 34. A CHECKERS LEARNING PROBLEM Choosing the Target Function • The next design choice is to determine exactly what type of knowledge will be learned and how this will be used by the performance program. • Let us begin with a checkers-playing program that can generate the legal moves from any board state. • The program needs only to learn how to choose the best move from among these legal moves. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 35. A CHECKERS LEARNING PROBLEM Choosing the Target Function • Let us call this function ChooseMove and use the notation: ChooseMove : B  M to indicate that this function accepts as input any board from the set of legal board states B and produces as output some move from the set of legal moves M.Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 36. A CHECKERS LEARNING PROBLEM Choosing the Target Function • Although ChooseMove is an obvious choice for the target function in our example, this function will turn out to be very difficult to learn given the kind of indirect training experience available to our system. • An alternative target function and one that will turn out to be easier to learn in this setting is an evaluation function that assigns a numerical score to any given board state. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 37. A CHECKERS LEARNING PROBLEM Choosing the Target Function • Let us call this target function V and again use the notation V : B  R to denote that V maps any legal board state from the set B to some real value (we use R to denote the set of real numbers). • We intend for this target function V to assign higher scores to better board states. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 38. A CHECKERS LEARNING PROBLEM Choosing the Target Function • If the system can successfully learn such a target function V, then it can easily use it to select the best move from any current board position. • This can be accomplished by generating the successor board state produced by every legal move, then using V to choose the best successor state and therefore the best legal move.Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 39. Choosing the Target Function • Let us therefore define the target value V(b) for an arbitrary board state b in B, as follows: 1. if b is a final board state that is won, then V(b) = 100 2. if b is a final board state that is lost, then V(b) = -100 3. if b is a final board state that is drawn, then V(b) = 0 4. if b is a not a final state in the game, then V(b) = V(b’), where b' is the best final board state that can be achieved starting from b and playing optimally until the end of the game (assuming the opponent plays optimally, as well). Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 40. Choosing the Target Function • Thus, we have reduced the learning task in this case to the problem of discovering an operational description of the ideal target function V. • We often expect learning algorithms to acquire only some approximation to the target function, and for this reason the process of learning the target function is often called function approximation. • we will use the symbol V* to refer to the function that is actually learned by our program, to distinguish it from the ideal target function V. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 41. Choosing a Representation for the Target Function • let us choose a simple representation: for any given board state, the function V* will be calculated as a linear combination of the following board features: • x1: the number of black pieces on the board • x2: the number of red pieces on the board • x3: the number of black kings on the board • x4: the number of red kings on the board • x5: the number of black pieces threatened by red (i.e., which can be captured on red's next turn) • X6: the number of red pieces threatened by black Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 42. Choosing a Representation for the Target Function • Thus, our learning program will represent V*(b) as a linear function of the form: V*(b) = w0+w1x1+w2x2+w3x3+w4x4+w5x5+w6x6 • where w0 through w6 are numerical coefficients, or weights, to be chosen by the learning algorithm. • Learned values for the weights w1 through w6 will determine the relative importance of the various board features in determining the value of the board, where as the weight w0 will provide an additive constant to the board value. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 43. Partial design of a checkers learning program • Task T: playing checkers • Performance measure P: percent of games won in the world tournament • Training experience E: games played against itself • Target function: V : Board  R • Target function representation: V*(b) = w0+w1x1+w2x2+w3x3+w4x4+w5x5+w6x6 The first three items above correspond to the specification of the learning task, whereas the final two items constitute design choices for the implementation of the learning program.Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 44. Choosing a Function Approximation Algorithm • In order to learn the target function V* we require a set of training examples, each describing a specific board state b and the training value Vtrain(b) for b. • In other words, each training example is an ordered pair of the form <b, Vtrain(b)> • For instance, the following training example describes a board state b in which black has won the game (note x2 = 0 indicates that red has no remaining pieces) • and for which the target function value Vtrain(b) is therefore +100. <(x1=3, x2 =0, x3=1, x4=0,x5=0x6=6), +100> Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 45. ESTIMATING TRAINING VALUES • Only training information available to our learner is whether the game was eventually won or lost. • On the other hand, we require training examples that assign specific scores to specific board states. • it is easy to assign a value to board states that correspond to the end of the game, • it is less obvious how to assign training values to the more numerous intermediate board states that occur before the game's end. • Of course the fact that the game was eventually won or lost does not necessarily indicate that every board state along the game path was necessarily good or bad. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 46. ESTIMATING TRAINING VALUES • Despite the ambiguity inherent in estimating training values for intermediate board states, • one simple approach has been found to be surprisingly successful. • This approach is to assign the training value of Vtrain(b) for any intermediate board state b to be V*(successor(b)) • where V* is the learner's current approximation to V and where Successor(b) denotes the next board state following b for which it is again the program's turn to move (i.e., the board state following the program‘s move and the opponent's response). Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 47. ESTIMATING TRAINING VALUES • This rule for estimating training values can be summarized as: Vtrain(b)  V*(Successor(b)) (Rule for estimating training values.) Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 48. ADJUSTING THE WEIGHTS • All that remains is to specify the learning algorithm for choosing the weights wi to best fit the set of training examples { <b, Vtrain(b)>}. • As a first step we must define what we mean by the best fit to the training data. • One common approach is to define the best hypothesis, or set of weights, as that which minimizes the squared error E between the training values and the values predicted by the hypothesis V*. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 49. ADJUSTING THE WEIGHTS • Several algorithms are known for finding weights of a linear function that minimize E. • In our case, we require an algorithm that will incrementally refine the weights as new training examples become available and that will be robust to errors in these estimated training values. • One such algorithm is called the least mean squares, or LMS training rule. • For each observed training example it adjusts the weights a small amount in the direction that reduces the error on this training example. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 50. ADJUSTING THE WEIGHTS The LMS algorithm is defined as follows: LMS weight update rule. For each training example <b, Vtrain(b)> – Use the current weights to calculate V*(b) – For each weight wi, update it as – • Wi  wi + η (Vtrain(b) - V*(b)) xi • Here η is a small constant (e.g., 0.1) that moderates the size of the weight update. • To get an intuitive understanding for why this weight update rule works, notice that when the error (Vtrain(b) - V*(b)) is zero, no weights are changed. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 51. ADJUSTING THE WEIGHTS • When (Vtrain(b) - V*(b)) is positive (i.e., when V*(b) is too low), then each weight is increased in proportion to the value of its corresponding feature. • This will raise the value of V*(b), reducing the error. • Notice that if the value of some feature xi is zero, then its weight is not altered regardless of the error, so that the only weights updated are those whose features actually occur on the training example board.Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 52. The Final Design • The final design of our checkers learning system can be naturally described by four distinct program modules that represent the central components in many learning systems. –The Performance System –The Critic –The Generalizer –The Experiment Generator Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 53. The Final Design Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 54. The Final Design • The Performance System is the module that must solve the given performance task, in this case playing checkers, by using the learned target function(s). It takes an instance of a new problem (new game) as input and produces a trace of its solution (game history) as output. • The Critic takes as input the history or trace of the game and produces as output a set of training examples of the target function. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 55. The Final Design • The Generalizer takes as input the training examples and produces an output hypothesis that is its estimate of the target function. • The Experiment Generator takes as input the current hypothesis (currently learned function) and outputs a new problem (i.e., initial board state) for the Performance System to explore. Its role is to pick new practice problems that will maximize the learning rate of the overall system. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 56. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 57. Chapter 2 CONCEPT LEARNING AND THE GENERAL-TO-SPECIFIC 0RDERING Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 58. INTRODUCTION-CONCEPT LEARNING • The problem of inducing general functions from specific training examples is central to concept learning. • Concept learning: acquiring the definition of a general category given a sample of positive and negative training examples of the category. • Concept learning can be formulated as a problem of searching through a predefined space of potential hypotheses for the hypothesis that best fits the training examples. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 59. INTRODUCTION-CONCEPT LEARNING • In many cases this search can be efficiently organized by taking advantage of a naturally occurring structure over the hypothesis space - a general-to-specific ordering of hypotheses. • There are several learning algorithms and considers situations under which they converge to the correct hypothesis. • We also examine the nature of inductive learning and the justification by which any program may successfully generalize beyond the observed training data. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 60. INTRODUCTION-CONCEPT LEARNING • Much of learning involves acquiring general concepts from specific training examples. • People, for example, continually learn general concepts or categories such as "bird," "car," "situations in which I should study more in order to pass the exam," etc. • Each such concept can be viewed as describing some subset of objects or events defined over a larger set (e.g., the subset of animals that constitute birds).Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 61. INTRODUCTION-CONCEPT LEARNING • Alternatively, each concept can be thought of as a boolean-valued function defined over this larger set (e.g., a function defined over all animals, whose value is true for birds and false for other animals). • In this chapter we consider the problem of automatically inferring the general definition of some concept, given examples labeled as members or non- members of the concept. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 62. INTRODUCTION-CONCEPT LEARNING • This task is commonly referred to as concept learning, or approximating a boolean-valued function from examples. Concept learning: Inferring a boolean- valued function from training examples of its input and output. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 63. A CONCEPT LEARNING TASK • Consider the example task of learning the target concept "days on which my friend Aldo enjoys his favorite water sport.“ • Table 2.1 describes a set of example days, each represented by a set of attributes. The attribute EnjoySport indicates whether or not Aldo enjoys his favorite water sport on this day. • The task is to learn to predict the value of EnjoySport for an arbitrary day, based on the values of its other attributes. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 64. A CONCEPT LEARNING TASK • What hypothesis representation shall we provide to the learner in this case?Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 65. A CONCEPT LEARNING TASK • Let us begin by considering a simple representation in which each hypothesis consists of a conjunction of constraints on the instance attributes. • In particular, let each hypothesis be a vector of six constraints, specifying the values of the six attributes Sky, AirTemp, Humidity, Wind, Water, and Forecast. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 66. A CONCEPT LEARNING TASK • For each attribute, the hypothesis will either- – indicate by a "?' that any value is acceptable for this attribute, –specify a single required value (e.g., Warm, cold for AirTemp) for the attribute, or –indicate by a "Ø" that no value is acceptable. • If some instance x satisfies all the constraints of hypothesis h, then h classifies x as a positive example (h(x) = 1).Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 67. A CONCEPT LEARNING TASK • To illustrate, the hypothesis that Aldo enjoys his favorite sport only on cold days with high humidity (independent of the values of the other attributes) is represented by the expression: <?, Cold, High, ?, ?, ?> Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 68. A CONCEPT LEARNING TASK • The most general hypothesis-that every day is a positive example-is represented by <?, ?, ?, ?, ?, ?> • and the most specific possible hypothesis-that no day is a positive example-is represented by <Ø, Ø, Ø, Ø, Ø, Ø> Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 69. A CONCEPT LEARNING TASK- Notation • The set of items over which the concept is defined is called the set of instances, which we denote by X. • In the current example, X is the set of all possible days, each represented by the attributes Sky, AirTemp, Humidity, Wind, Water, and Forecast. • The concept or function to be learned is called the target concept, which we denote by c. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 70. A CONCEPT LEARNING TASK- Notation • In general, c can be any boolean valued function defined over the instances X; that is, c : X  {0,1}. • In the current example, the target concept corresponds to the value of the attribute EnjoySport (i.e., c(x) = 1 if EnjoySport = Yes, and c(x) = 0 if EnjoySport = No). Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 71. The definition of the EnjoySport concept learning task Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 72. The Inductive Learning Hypothesis • Although the learning task is to determine a hypothesis h identical to the target concept c over the entire set of instances X, the only information available about c is its value over the training examples. • Therefore, inductive learning algorithms can at best guarantee that the output hypothesis fits the target concept over the training data. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 73. The Inductive Learning Hypothesis •Lacking any further information, our assumption is that the best hypothesis regarding unseen instances is the hypothesis that best fits the observed training data. •This is the fundamental assumption of inductive Learning. The inductive learning hypothesis. Any hypothesis found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other unobserved examples. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 74. CONCEPT LEARNING AS SEARCH • Concept learning can be viewed as the task of searching through a large space of hypotheses implicitly defined by the hypothesis representation. • The goal of this search is to find the hypothesis that best fits the training examples. • It is important to note that by selecting a hypothesis representation, the designer of the learning algorithm implicitly defines the space of all hypotheses that the program can ever represent and therefore can ever learn. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 75. CONCEPT LEARNING AS SEARCH • Consider, for example, the instances X and hypotheses H in the EnjoySport learning task. Given that the attribute Sky has three possible values, and that AirTemp, Humidity, Wind, Water, and Forecast each have two possible values, the instance space X contains exactly 3.2.2.2.2.2 = 96 distinct instances. • A similar calculation shows that there are 5.4.4.4.4.4 = 5120 syntactically distinct hypotheses within H. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 76. CONCEPT LEARNING AS SEARCH • Notice, however, that every hypothesis containing one or more " Ø " symbols represents the empty set of instances; that is, it classifies every instance as negative. • Therefore, the number of semantically distinct hypotheses is only 1+(4.3.3.3.3.3) = 973. • EnjoySport example is a very simple learning task, with a relatively small, finite hypothesis space. • Most practical learning tasks involve much larger, sometimes infinite, hypothesis spaces. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 77. General-to-Specific Ordering of Hypotheses • To illustrate the general-to-specific ordering, consider the two hypotheses h1 = <Sunny, ?, ?, Strong, ?, ?> h2 = <Sunny, ?, ?, ?, ?, ?> • Any instance classified positive by hl will also be classified positive by h2. Therefore, we say that h2 is more general than h1. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 78. Relation between instances and hypotheses Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 79. FIND-S: FINDING A MAXIMALLY SPECIFIC HYPOTHESIS 1. Initialize h to the most specific hypothesis in H 2. For each positive training instance x * For each attribute constraint ai in h If the constraint ai is satisfied by x Then do nothing Else replace ai in h by the next more general constraint that is satisfied by x 3. Output hypothesis h • To illustrate this algorithm, assume the learner is given the sequence of training examples from Table 2.1 for the EnjoySport task. The first step of FIND-S is to initialize h to the most specific hypothesis in H. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 80. FIND-S: FINDING A MAXIMALLY SPECIFIC HYPOTHESIS • h <Ø, Ø, Ø, Ø, Ø, Ø> • Upon observing the first training example from Table 2.1, which happens to be a positive example, it becomes clear that our hypothesis is too specific. h <Sunny, Warm, Normal, Strong, Warm, Same> • This h is still very specific. • Next, the second training example (also positive in this case) forces the algorithm to further generalize h, this time substituting a "?' in place of any attribute value in h that is not satisfied by the new example. The refined hypothesis in this case is h <Sunny, Warm, ?, Strong, Warm, Same> Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 81. FIND-S: FINDING A MAXIMALLY SPECIFIC HYPOTHESIS •Upon encountering the third training example-in this case a negative example- the algorithm makes no change to h. In fact, the FIND-S algorithm simply ignores every negative example. • To complete our trace of FIND-S, the fourth(positive) example leads to a further generalization of h. h <Sunny, Warm, ?, Strong, ?, ?> •The FIND-S algorithm illustrates one way in which the more-general-than partial ordering can be used to organize the search for an acceptable hypothesis. •The search moves from hypothesis to hypothesis, searching from the most specific to progressively more general hypotheses along one chain of the partial ordering. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 82. FIND-S: FINDING A MAXIMALLY SPECIFIC HYPOTHESIS Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 83. VERSION SPACES AND THE CANDIDATE-ELIMINATION ALGORITHM • The key idea in the CANDIDATE-ELIMINATlON algorithm is to output a description of the set of all hypotheses consistent with the training examples. •The CANDIDATE-ELIMINATlON algorithm has been applied to problems such as learning regularities in chemical mass spectroscopy and learning control rules for heuristic search. •CANDIDATE-ELIMINATlON algorithm computes the description of data set without explicitly enumerating all of its members. • This is accomplished by again using the more- general-than partial ordering. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 84. VERSION SPACES AND THE CANDIDATE-ELIMINATION ALGORITHM • The CANDIDATE-ELIMINATlON algorithm finds all describable hypotheses that are consistent with the observed training examples. • In order to define this algorithm precisely, we begin with a few basic definitions. • First, let us say that a hypothesis is consistent with the training examples if it correctly classifies these examples. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 85. VERSION SPACES AND THE CANDIDATE-ELIMINATION ALGORITHM • The CANDIDATE-ELIMINATlON algorithm represents the set of all hypotheses consistent with the observed training examples. • This subset of all hypotheses is called the version space with respect to the hypothesis space H and the training examples D, because it contains all plausible versions of the target concept. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 86. The LIST-THEN-ELIMINATION Algorithm •The LIST-THEN-ELIMINATION Algorithm first initializes the version space to contain all hypotheses in H, • Then eliminates any hypothesis found inconsistent with any training example. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 87. General and Specific Boundary Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 88. VERSION SPACE THEORM Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 89. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 90. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 91. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 92. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 93. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 94. This is the version space with its general and specific boundary sets. The version space includes all six hypotheses shown here, but can be represented more simply by S and G. Arrows indicate instances of the more-general-than relation. This is the version space for the Enjoysport concept learning problem and training examples described in Table 2.1.Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 95. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 96. Chapter 3 DECISION TREE LEARNING Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 97. INTRODUCTION • Decision tree learning is one of the most widely used and practical methods for inductive inference. • It is a method for approximating discrete- valued functions that is robust to noisy data and capable of learning disjunctive expressions. • This chapter describes a family of decision tree learning algorithms that includes widely used algorithms.Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 98. INTRODUCTION • Decision tree learning algorithms such as ID3, ASSISTANT, and C4.5 are mostly used. • These methods search a completely expressive hypothesis space and thus avoid the difficulties of restricted hypothesis spaces. • Their inductive bias is a preference for small trees over large trees. • Learned trees can also be re-represented as sets of if-then rules to improve human readability. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 99. DECISION TREE REPRESENTATION • Decision trees classify instances by sorting them down the tree from the root to some leaf node, which provides the classification of the instance. • Each node in the tree specifies a test of some attribute of the instance, and each branch descending from that node corresponds to one of the possible values for this attribute.Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 100. DECISION TREE REPRESENTATION • An instance is classified by starting at the root node of the tree, testing the attribute specified by this node, then moving down the tree branch corresponding to the value of the attribute in the given example. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 101. DECISION TREE REPRESENTATION Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 102. APPROPRIATE PROBLEMS FOR DECISION TREE LEARNING • Decision tree learning is generally best suited to problems with the following characteristics: • Instances are represented by attribute-value pairs. • The target function has discrete output values. • Disjunctive descriptions may be required. • The training data may contain errors. • The training data may contain missing attribute values. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 103. THE BASIC DECISION TREE LEARNING ALGORITHM • Most algorithms that have been developed for learning decision trees are variations on a core algorithm that employs a top-down, greedy search through the space of possible decision trees. • This approach is exemplified by the ID3 algorithm (Quinlan 1986) and its successor C4.5 (Quinlan 1993).Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 104. THE BASIC DECISION TREE LEARNING ALGORITHM • Algorithm, ID3, learns decision trees by constructing them topdown, beginning with the question "which attribute should be tested at the root of the tree?” • To answer this question, each instance attribute is evaluated using a statistical test to determine how well it alone classifies the training examples. • The best attribute is selected and used as the test at the root node of the tree.Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 105. THE ID3 LEARNING ALGORITHM Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 106. Which Attribute Is the Best Classifier? • The central choice in the ID3 algorithm is selecting which attribute to test at each node in the tree. • We will define a statistical property, called information gain, that measures how well a given attribute separates the training examples according to their target classification. • ID3 uses this information gain measure to select among the candidate attributes at each step while growing the tree.Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 107. ENTROPY MEASURES HOMOGENEITY OF EXAMPLES • In order to define information gain precisely, we begin by defining a measure commonly used in information theory, called entropy, that characterizes the (im)purity of an arbitrary collection of examples. • Given a collection S, containing positive and negative examples of some target concept, the entropy of S relative to this boolean classification is: Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 108. ENTROPY MEASURES HOMOGENEITY OF EXAMPLES • where p is the proportion of positive examples in S and pΘ is the proportion of negative examples in S. • In all calculations involving entropy we define 0 log 0 to be 0. • To illustrate, suppose S is a collection of 14 examples of some boolean concept, including 9 positive and 5 negative examples (we adopt the notation [9+, 5-] to summarize such a sample of data). Then the entropy of S relative to this boolean classification is:Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 109. ENTROPY MEASURES HOMOGENEITY OF EXAMPLES Entropy ([9+,5-]) = - (9/14) log2 (9/14) - (5/14) log2 (5/14) = 0.940 • Notice that the entropy is 0 if all members of S belong to the same class. • Note the entropy is 1 when the collection contains an equal number of positive and negative examples. • If the collection contains unequal numbers of positive and negative examples, the entropy is between 0 and 1.Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 110. ENTROPY MEASURES HOMOGENEITY OF EXAMPLES Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 111. ENTROPY MEASURES HOMOGENEITY OF EXAMPLES • More generally, if the target attribute can take on c different values, then the entropy of S relative to this c-wise classification is defined as – where pi is the proportion of S belonging to class i. Note the logarithm is still base 2 because entropy is a measure of the expected encoding length measured in bits. • Note also that if the target attribute can take on c possible values, the entropy can be as large as log, c. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 112. INFORMATION GAIN MEASURES THE EXPECTED REDUCTION IN ENTROPY • Given entropy as a measure of the impurity in a collection of training examples, • we can now define a measure of the effectiveness of an attribute in classifying the training data. • The measure we will use, called information gain, is simply the expected reduction in entropy caused by partitioning the examples according to this attribute. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 113. INFORMATION GAIN MEASURES THE EXPECTED REDUCTION IN ENTROPY • More precisely, the information gain, Gain(S, A) of an attribute A, relative to a collection of examples S, is defined as: where Values(A) is the set of all possible values for attribute A, and Sv is the subset of S for which attribute A has value v . Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 114. INFORMATION GAIN MEASURES THE EXPECTED REDUCTION IN ENTROPY • Note the first term in Equation is just the entropy of the original collection S, and the second term is the expected value of the entropy after S is partitioned using attribute A. • The expected entropy described by this second term is simply the sum of the entropies of each subset Sv weighted by the fraction of examples that belong to Sv. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 115. INFORMATION GAIN MEASURES THE EXPECTED REDUCTION IN ENTROPY • Gain(S,A) is therefore the expected reduction in entropy caused by knowing the value of attribute A. Put another way, Gain(S, A) is the information provided about the target & action value, given the value of some other attribute A. • The value of Gain(S, A) is the number of bits saved when encoding the target value of an arbitrary member of S, by knowing the value of attribute A. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 116. INFORMATION GAIN MEASURES THE EXPECTED REDUCTION IN ENTROPY • For example, suppose S is a collection of training-example days described by attributes including Wind, which can have the values Weak or Strong. • As before, assume S is a collection containing 14 examples, [9+, 5-]. • Of these 14 examples, suppose 6 of the positive and 2 of the negative examples have Wind = Weak, and the remainder have Wind = Strong. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 117. Training Examples for the target concept PalyTennis Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 118. INFORMATION GAIN MEASURES THE EXPECTED REDUCTION IN ENTROPY • The information gain due to sorting the original 14 examples by the attribute Wind may then be calculated as: Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 119. INFORMATION GAIN MEASURES Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 120. INFORMATION GAIN MEASURES • ID3 determines the information gain for each candidate attribute (i.e., Outlook, Temperature, Humidity, and Wind), then selects the one with highest information gain. The computation of information gain for two of these attributes is shown in Figure (last slide). The information gain values for all four attributes are: Gain(S, Outlook) = 0.246 Gain(S, Humidity) = 0.151 Gain(S, Wind) = 0.048 Gain(S, Temperature) = 0.029Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 121. INFORMATION GAIN MEASURES • According to the information gain measure, the Outlook attribute provides the best prediction of the target attribute, PlayTennis over the training examples. • Therefore, Outlook is selected as the decision attribute for the root node, and branches are created below the root for each of its possible values (i.e., Sunny, Overcast, and Rain). • The partially learned decision tree resulting from the first step of ID3 is shown on next slide. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 122. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 123. HYPOTHESIS SPACE SEARCH IN DECISION TREE LEARNING • As with other inductive learning methods, ID3 can be characterized as searching a space of hypotheses for one that fits the training examples. • The hypothesis space searched by ID3 is the set of possible decision trees. • ID3 performs a simple-to-complex, hill-climbing search through this hypothesis space, beginning with the empty tree, then considering progressively more elaborate hypotheses in search of a decision tree that correctly classifies the training data.Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 124. HYPOTHESIS SPACE SEARCH IN DECISION TREE LEARNING By viewing ID3 in terms of its search space and search strategy, we can get some insight into its capabilities and limitations. • ID3’s hypothesis space of all decision trees is a complete space of finite discrete-valued functions, relative to the available attributes. Because every finite discrete-valued function can be represented by some decision tree, ID3 avoids one of the major risks of methods that search incomplete hypothesis spaces, that the hypothesis space might not contain the target function. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 125. HYPOTHESIS SPACE SEARCH IN DECISION TREE LEARNING • ID3 maintains only a single current hypothesis as it searches through the space of decision trees. This contrasts, for example, with the earlier version space candidate-Elimination method, which maintains the set of all hypotheses consistent with the available training examples. By determining only a single hypothesis, ID3 loses the capabilities that follow from explicitly representing all consistent hypotheses.Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 126. HYPOTHESIS SPACE SEARCH IN DECISION TREE LEARNING • ID3 in its pure form performs no backtracking in its search. Once it, selects an attribute to test at a particular level in the tree, it never backtracks to reconsider this choice. Therefore, it is susceptible to the usual risks of hill-climbing search without backtracking: converging to locally optimal solutions that are not globally optimal. In the case of ID3, a locally optimal solution corresponds to the decision tree it selects along the single search path it explores. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 127. HYPOTHESIS SPACE SEARCH IN DECISION TREE LEARNING • ID3 uses all training examples at each step in the search to make statistically based decisions regarding how to refine its current hypothesis. This contrasts with methods that make decisions incrementally, based on individual training examples (e.g., FIND-S or CANDIDATE- ELIMINATOIN ). One advantage of using statistical properties of all the examples (e.g., information gain) is that the resulting search is much less sensitive to errors in individual training examples. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 128. HYPOTHESIS SPACE SEARCH IN DECISION TREE LEARNING • ID3 can be easily extended to handle noisy training data by modifying its termination criterion to accept hypotheses that imperfectly fit the training data. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 129. INDUCTIVE BIAS IN DECISION TREE LEARNING • Inductive bias is the set of assumptions that, together with the training data, deductively justify the classifications assigned by the learner to future instances. • Given a collection of training examples, there are typically many decision trees consistent with these examples. • Describing the inductive bias of ID3 therefore consists of describing the basis by which it chooses one of these consistent hypotheses over the others. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 130. INDUCTIVE BIAS IN DECISION TREE LEARNING • Which of these decision trees does ID3 choose? • It chooses the first acceptable tree it encounters in its simple-to-complex, hill climbing search through the space of possible trees. • Roughly speaking, then, the ID3 search strategy (a) selects in favor of shorter trees over longer ones, and (b) selects trees that place the attributes with highest information gain closest to the root. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 131. INDUCTIVE BIAS IN DECISION TREE LEARNING • Because of the subtle interaction between the attribute selection heuristic used by ID3 and the particular training examples it encounters, it is difficult to characterize precisely the inductive bias exhibited by ID3. • However, we can approximately characterize its bias as a preference for short decision trees over complex trees. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 132. INDUCTIVE BIAS IN DECISION TREE LEARNING • Approximate inductive bias of ID3: Shorter trees are preferred over larger trees. • A closer approximation to the inductive bias of ID3: Shorter trees are preferred over longer trees. Trees that place high information gain attributes close to the root are preferred over those that do not. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 133. Why Prefer Short Hypotheses? • Is ID3's inductive bias favoring shorter decision trees a sound basis for generalizing beyond the training data? • Philosophers and others have debated this question for centuries, and the debate remains unresolved to this day. • William of Occam was one of the first to discusst the question, around the year 1320, so this bias often goes by the name of Occam's razor. • Occam's razor: Prefer the simplest hypothesis that fits the data. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 134. ISSUES IN DECISION TREE LEARNING? • Practical issues in learning decision trees include: • Determining how deeply to grow the decision tree, (Avoiding Overfitting the Data, Reduced Error Pruning, Rule Post-pruning) • Handling continuous attributes, • Choosing an appropriate attribute selection measure, • Handling training data with missing attribute values, • Handling attributes with differing costs, and • Improving computational efficiency.Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 135. Chapter 4 ARTIFICIAL NEURAL NETWORKS Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 136. INTRODUCTION • Artificial neural networks (ANNs) provide a general, practical method for learning real- valued, discrete-valued, and vector-valued functions from examples. • Algorithms such as BACKPROPAGATION, gradient descent to tune network parameters to best fit a training set of input-output pairs. • ANN learning is robust to errors in the training data and has been successfully applied to problems such as interpreting visual scenes, speech recognition, and learning robot control strategies. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 137. Biological Motivation • The study of artificial neural networks (ANNs) has been inspired in part by the observation that biological learning systems are built of very complex webs of interconnected neurons. • The human brain is estimated to contain a densely interconnected network of approximately 1011 neurons, each connected, on average, to 104 others. • Neuron activity is typically excited or inhibited through connections to other neurons.Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 138. Biological Motivation • The fastest neuron switching times are known to be on the order of 10-3 seconds- quite slow compared to computer switching speeds of 10-10 seconds. • Yet humans are able to make surprisingly complex decisions, surprisingly quickly. • For example, it requires approximately 10-1 seconds to visually recognize your mother. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 139. Biological Motivation • This observation has led many to speculate that the information-processing abilities of biological neural systems must follow from highly parallel processes operating on representations that are distributed over many neurons. • One motivation for ANN systems is to capture this kind of highly parallel computation based on distributed representations. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 140. Biological Motivation • Most ANN software runs on sequential machines emulating distributed processes, although faster versions of the algorithms have also been implemented on highly parallel machines and on specialized hardware designed specifically for ANN applications. • While ANNs are loosely motivated by biological neural systems, there are many complexities to biological neural systems that are not modeled by ANNs.Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 141. Biological Motivation • Historically, two groups of researchers have worked with artificial neural networks. • One group has been motivated by the goal of using ANNs to study and model biological learning processes. • A second group has been motivated by the goal of obtaining highly effective machine learning algorithms, independent of whether these algorithms mirror biological processes. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 142. NEURAL NETWORK - Research History • McCulloch and Pitts (1943) are generally recognized as the designers of the first neural network. • They combined many simple processing units together that could lead to an overall increase in computational power. • They suggested many ideas like : a neuron has a threshold level and once that level is reached the neuron fires. • It is still the fundamental way in which ANNs operate. The McCulloch and Pitts's network had a fixed set of weights. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 143. NEURAL NETWORK - Research History • Hebb (1949) developed the first learning rule, that is if two neurons are active at the same time then the strength between them should be increased. • In the 1950 and 60's, many researchers (Block, Minsky, Papert, and Rosenblatt worked on perceptron. • The neural network model could be proved to converge to the correct weights, that will solve the problem. • The weight adjustment (learning algorithm) used in the perceptron was found more powerful than the learning rules used by Hebb.Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 144. NEURAL NETWORK - Research History • Minsky & Papert (1969) showed that perceptron could not learn those functions which are not linearly separable. • The neural networks research declined throughout the 1970 and until mid 80's because the perceptron could not learn certain important functions. • Neural network regained importance in 1985-86. The researchers, Parker and LeCun discovered a learning algorithm for multi-layer networks called back propagation that could solve problems that were not linearly separable. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 145. Artificial Neuron Model • An artificial neuron is a mathematical function conceived as a simple model of a real (biological) neuron. • The McCulloch-Pitts Neuron: This is a simplified model of real neurons, known as a Threshold Logic Unit. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 146. PERCEPTRONS • One type of ANN system is based on a unit called a perceptron. A perceptron takes a vector of real- valued inputs, calculates a linear combination of these inputs, then outputs a 1 if the result is greater than some threshold and -1 otherwise. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 147. PERCEPTRONS • More precisely, given inputs x1 through xn the output o(x1, . . . , xn) computed by the perceptron is: • where each wi is a real-valued constant, or weight, that determines the contribution of input xi to the perceptron output. • To simplify notation, we imagine an additional constant input x0 = 1, allowing us to write the above inequality as > 0 Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 148. PERCEPTRONS • In vector form as For brevity, we will sometimes write the perceptron function as : • Where • Learning a perceptron involves choosing values for the weights w0, . . . , wn. • Therefore, the space H of candidate hypotheses considered in perceptron learning is the set of all possible real-valued weight vectors. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 149. Representational Power of Perceptrons • We can view the perceptron as representing a hyperplane decision surface in the n-dimensional space of instances (i.e., points). • The perceptron outputs a 1 for instances lying on one side of the hyperplane and outputs a -1 for instances lying on the other side, as illustrated in Figure The decision surface represented by a two-input perceptron. (a) A set of training examples and the decision surface of a perceptron that classifies them correctly. (b) A set of training examples that is not linearly separable (i.e., that cannot be correctly classified by any straight line). xl and x2 are the Perceptron inputs.Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 150. Representational Power of Perceptrons • To implement the AND function is to set the weights w0 = -0.8 , and w1 = w2 = 0.5. • This perceptron can be made to represent the OR function instead by altering the threshold to w0 = - 0.3. • In fact, AND and OR can be viewed as special cases of m-of-n functions: that is, functions where at least m of the n inputs to the perceptron must be true. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 151. The Perceptron Training Rule • Several algorithms are known to solve learning problem. • The perceptron rule and the delta rule . These two algorithms are guaranteed to converge to somewhat different acceptable hypotheses, under somewhat different conditions. • They are important to ANNs because they provide the basis for learning networks of many units. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 152. The Perceptron Training Rule • One way to learn an acceptable weight vector is to begin with random weights, then iteratively apply the perceptron to each training example, modifying the perceptron weights whenever it misclassifies an example. • This process is repeated, iterating through the training examples as many times as needed until the perceptron classifies all training examples correctly. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 153. The Perceptron Training Rule • Weights are modified at each step according to the perceptron training rule, which revises the weight wi associated with input xi according to the rule – Where t = target output, o = observed output, η = learning rate. ∆ wi = weight update Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 154. Gradient descent algorithm for training a linear unit. Dr. Amit Kumar, Dept of CSE, JUET, Guna
  • 155. MULTILAYER NETWORKS AND THE BACKPROPAGATION ALGORITHM Dr. Amit Kumar, Dept of CSE, JUET, Guna