Reinforcement learning in neuro fuzzy traffic signal control

By:-
Abhishek Vishnoi(112501)
NITTTR Chandhigarh

INTRODUCTION
 A fuzzy traffic signal controller uses simple “if–then”
rules which involve linguistic concepts such as medium
or long, presented as membership functions.
 In neuro-fuzzy traffic signal control, a neural network
adjusts the fuzzy controller by fine-tuning the form
and location of the membership functions.
 The learning algorithm of the neural network is
reinforcement learning, which gives credit for
successful system behaviour and punishes for poor
behaviour;

Basics of Neuro-Fuzzy
 A combination of a neural network and a fuzzy system
is called a neurofuzzy system. In neurofuzzy control,
the parameters of the fuzzy controller are adjusted
using a neural network.
 Neurofuzzy systems utilize both the linguistic,
human-like reasoning of fuzzy systems and the
powerful computing ability of neural networks.
 They can avoid some of the drawbacks of solely fuzzy
or neural systems

Fuzzy system
 Fuzzy logic were brought to public attention by Zadeh.
 Fuzzy sets provide a mathematical interpretation for
natural language terms.
 A fuzzy set is a set without a crisp, clearly defined
boundary.
 It can contain elements with a partial membership
function
 Fuzzy control uses a rule base, where the rules are
propositions of the form “if X is S, then Y is T”.
Here X and Y are linguistic variables,

 Membership function (MF) is a curve that defines how
each point in the input space is mapped to a membership
value (or degree of membership) between 0 and 1.
 Each linguistic terms is associated with a fuzzy set defined
by a corresponding membership function.

Fuzzy traffic signal control

 the controller receives measurements of incoming
traffic and chooses the length of the green signal
accordingly.
 Advantage of fuzzy control systems over traditional
ones:-
 Their ability to use expert knowledge in the form of
fuzzy rule.
 Small number of parameters needed.

Fuzzy traffic signal control
• The traffic simulation environment used in this work is a two-
phase controlled intersection of two-lane streets.
 In each approaching lane there are two traffic detectors, the first
one before the stop line and the other at the stop line.
 These detectors send input measurements of traffic to the fuzzy
controller:
 APP, number of approaching vehicles in the green direction
and QUE, number of queuing vehicles in the red direction.
Depending on the traffic situation, the green phase can be
extended with one or several seconds.
 Output of the fuzzy controller is EXT, green time extension (in
seconds).
 The linguistic values of
APP are zero, a few, medium and many;
QUE are a few, medium and too long;
EXT are zero, short, medium and long.

Fuzzy rule
 The rule base consists of five rule sets. The choice of the rule
set depends on how many green extensions have already been
given.
 The objective of the rules is to split the green time and find
the right moment of green termination so that the delay of
vehicles is minimized
After minimum green (5 seconds)
 if APP is zero, then EXT is zero
 if APP is a few and if QUE is less than medium, then EXT is short
 if APP is more than a few, then EXT is medium
 if APP is medium, then EXT is long

After the first extension
 if APP is medium, then EXT is medium
 if APP is many, then EXT is long

After the second extension
 if APP is medium and if QUE is less than medium,
then EXT is medium
 if APP is many and if QUE is less than medium, then EXT is long

After the third extension
 if QUE is too long, then EXT is zero
 if APP is more than a few and if QUE is less than medium,
then EXT is short
 if APP is medium and if QUE is less than medium,
then EXT is medium
 if APP is many and if QUE is less than a few, then EXT is long
After the fourth extension
 if QUE is too long, then EXT is zero
 if APP is more than a few and if QUE is a few, then EXT is short
 if APP is medium and if QUE is less than a few,
then EXT is medium
 if APP is many and if QUE is less than a few, then EXT is long

Reinforcement learning
Why is reinforcement learning needed?
 The parameters of the fuzzy controller could be
updated using the back propagation
algorithm common in supervised learning in neural
networks.
 Backpropagation algorithm cannot be used; instead, a
learning algorithm called reinforcement learning is
used.
 In reinforcement learning, the system evaluates
whether the previous control action was good or not. If
the action had good consequences, the tendency to
produce that action is strengthened, that is,
reinforced.

Structure of the neurofuzzy control system
 The evaluation network gathers
information about the decisions
of the fuzzy controller and the
delays of the vehicles.
 This reinforcement information
is used in fine-tuning the
membership functions of the
fuzzy controller, which is also
presented as a neural network.
 Thus there are actually two
neural networks in the system:
an evaluation network and a
fuzzy controller network.

Evaluation network

 The evaluation network evaluates the goodness of the
actions of the fuzzy controller based on information it
has gathered by observing the process.
 The network fine-tunes the membership functions of
the fuzzy controller by updating the parameters of the
membership functions.
 It is a feedforward, multilayer perceptron-type
network.
 The input variables of the network are APP and QUE,
measurements of incoming traffic in green and red
directions, respectively.
.

 The hidden layer activation function
is a sigmoidal function
zj(xj)=1/(1+exp(−xj)),
 where The size h of the hidden layer
may vary, and there are no precise
rules for determining how many
cells it should contain. Increasing
the size gives a more powerful and
flexible network but requires a
longer learning time. In our work
the size of h=10 was found suitable

 The network output v is a measure of the goodness of
the state of the network, a prediction of future
reinforcement. h

v = b1 APP + b2 QUE + cjzj
j 1
 The gradient descent algorithm is used in the learning
phase of the evaluation network.
 If a positive reinforcement signal is received, the
network weights are rewarded by being changed in the
direction which increases their contribution to the
total sum.
 If a negative signal is received, the weights are
punished by being changed in the direction
which decreases their contribution.

Fuzzy controller network

 Consider the first rule set, whose neural network
presentation in figure.

Experimental results

Fig-Average vehicular delays (in Fig-Average vehicular delays (in
seconds) before (dashed line) and after seconds) before (dashed line) and
(solid line) the learning at traffic after (solid line) the learning at traffic
volumes of 300, 500 and 1000 vehicles volumes of 300, 500 and 1000 vehicles
per hour. The location of the first traffic per hour. The location of the first
detector is 50 m from the stop line. traffic detector is 100 m from the stop
line.

Volume (veh/h) Det. (M) dini dnew D
500 50 9.99 9.42 0.57
1000 50 14.78 13.96 0.81
500 100 9.11 8.82 0.81
1000 100 15.18 14.52 0.66

Here dini and dnew are the average vehicular delays using the initial
and the new membership functions, respectively, D=dini−dnew is the
difference of individual observations

Membership functions zero, a few, medium and many of APP before (dotted
line) and after (solid line) the learning at a traffic volume of 500 vehicles per
hour. The location of the first traffic detector is 50 m from the stop line.
Horizontal axis: number of approaching vehicles. Vertical axis: value of
membership function.

 Fig Membership functions a few, medium and too long of QUE before
(dotted line) and after (solid line) the learning at a traffic volume of
500 vehicles per hour. The location of the first traffic detector is 50 m
from the stop line. Horizontal axis: number of queuing vehicles.
Vertical axis: value of membership function.

 Fig. membership function zero, short , medium and long
of EXT before (dotted line) and after (solid line) the learning at a
traffic volume of 500 vehicles per hour. The location of the first
traffic detector is 50 m from the stop line. Horizontal axis: green
signal extension in seconds. Vertical axis: value of membership
function.

Reinforcement learning in neuro fuzzy traffic signal control

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (12)

Destacado

Destacado (9)

Similar a Reinforcement learning in neuro fuzzy traffic signal control

Similar a Reinforcement learning in neuro fuzzy traffic signal control (20)

Último

Último (20)

Reinforcement learning in neuro fuzzy traffic signal control