A temporal classifier system using spiking neural networks

A temporal classifier system using spiking neural networks Gerard David Howard, Larry Bull & Pier-Luca Lanzi {david4.howard, larry.bull} @uwe.ac.uk pierluca.lanzi @polimi.it 1

Contents Intro & Motivation System architecture – Spiking XCSF Constructivism (nodes and connections) Working in continuous space Comparison to MLP / Q-learner Taking time into consideration Comparison to MLP Simulated robotics 2

Motivation Many real-world tasks incorporate continuous space and continuous time Autonomous robotics are an unanswered question: will require some degree of knowledge “self-shaping” or control over their internal knowledge representation We introduce an LCS containing spiking networks and demonstrate the usefulness of the representation Handles continuous space and continuous time Representation structure dependent on environment 3

XCSF Includes computed prediction, which is calculated from input state (augmented by constant x0) and a weight vector – each classifier has a weight vector Weights are updated linearly using modified delta rule Main differences from canonical: SNN replaces condition and calculates action Self-adaptive parameters give autonomous learning control Topology of networks altered in GA cycle Generalisation from computed prediction, computed actions and network topologies 4

Spiking networks Spiking networks have temporal functionality We use Integrate-and-Fire (IAF) neurons Each neuron has a membrane potential (m) that varies through time When m exceeds a threshold, the neuron sends a spike to every neuron in the network that it has a forward connection to, and resets m Membrane potential is a way of implementing memory 5

Spiking networks IAF Spiking network replaces condition and action, 2 input, 3 output nodes Each input state processed 5 times by spiking network. Neural outputs are spike trains: high (>=3) or low (<3) spikes in 5-element output window. Classifier doesn’t match if !M node is high ,[object Object],6 00101 = LOW L Input state (one node per element) R 01110 = HIGH ![M] 10001 = LOW

Self-adaptive parameters During a GA cycle, a parent’s µ value is copied to its offspring and altered The offspring then applies its own µ to itself (bounded [0-1]) before being inserted into the population. Similar to ES mutation alteration Mutate µ  µ * e N(0,1) Insert Copy µ [A] Mutate µ  µ * e N(0,1) Copy µ Insert 7

Constructivism Neural Constructivism - interaction with environment guides learning process by growing/pruning dendritic connectivity Constructivism can add or remove neurons from the hidden layer during a GA event Two new self-adaptive values control NC , ψ (probability of constructivism event occurring) and ω (probability of adding rather than removing a node). These are modified during a GA cycle as with µ 8 Randomly initialised weights

Connection selection Automatic feature selection often used in conjunction with neural networks – allows reduction in number of inputs to only highest utility features We apply feature selection to every connection in a network, connection is enabled/disabled on satisfaction of new self-adaptive parameter τ. All connections initially enabled, connections created via node addition are enabled with 50% probability per connection ,[object Object],9

Effects of SA, NC & CS Self Adaptation allows the system to control the amount of search taking place in an environmental niche without having to predetermine suitable parameter values Neural Constructivism allows classifier to automatically grow networks to match task complexity ,[object Object],10

Continuous Grid World Two dimensional continuous grid environment Runs from 0 – 1 in both x and y axes Goal state is where (x+y>1.9) – darker regions of grid represent higher expected payoff. Reaching goal returns a reward of 1000, else 0 Agent starts randomly anywhere in the grid except the goal state, aims to reach goal (moving 0.05) in fewest possible steps (avg. opt. 18.6) 1.00 0.50 Agent 0.00 0.50 1.00 11

Discrete movement Agent can make a single discrete movement (N,E,S,W) N=(HIGH,HIGH), E=(HIGH,LOW) etc… Experimental parameters N=20000,γ =0.95, β=0.2, ε0=0.005, θGA=50, θDEL=50 XCSF parameters as normal. Initial prediction error in new classifiers=0.01, initial fitness=0.1 Additional trial from fixed location lets us perform t-tests. “Stability” shows first step that 50 consecutive trials reach goal state from this location. 12

Discrete movement 13 ,[object Object]

Fewer macroclassifiers = greater generalisation

Lower mutation rate = more stable evolutionary process,[object Object]

Continuous duration actions Reward usually calculated as Reward is now calculated as Two discount factors that favour overall effectiveness and efficient state transitions respectively =0.05, ρ=0.1 tt = total steps for entire trial ti = duration of a single action Timeout=20; new steps to goal is 1.5 15

Continuous Grid World TCS 16 ,[object Object]

Smaller step size 18 ,[object Object]

Tabular Q-learner cannot learn – too many (s,a) combinations, long action chains!

Spiking non-TCS cannot learn – too many (s,a) combinations, long action chains!

MLP TCS cannot learn – lack of memory?

Spiking TCS canlearn to optimally solve this environment by extending an action set across multiple states and recalculating actions where necessary

Aided by temporal element of networks,[object Object],[object Object]

Mountain-car 21 ,[object Object]

Guide a car out of a valley, sometimes requiring non-obvious behaviour

State comprises position and velocity

Actions increase/decrease velocity: HIGH/HIGH = increase, LOW/LOW = decrease, anything else = no change.

A temporal classifier system using spiking neural networks

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a A temporal classifier system using spiking neural networks

Similar a A temporal classifier system using spiking neural networks (20)

Más de Daniele Loiacono

Más de Daniele Loiacono (20)

Último

Último (20)

A temporal classifier system using spiking neural networks