Clara_de_Paiva_Master_thesis_presentation_June2016

Hierarchical Associative
Memories and Sparse code
Clara de Paiva
Técnico Lisboa – Alameda campus
1

Associative Memories
in neural networks
1.
2

Hierarchical memories and Sparse Code 1. Associative memories
𝑥
input output
𝒙 𝟏
0
𝒙 𝟐
1
𝒙 𝟑
1
𝒙 𝟒
0
𝒙 𝟓
0
y1
0
y2
0
y3
0
y4
0
y5
1
y6
0
𝑦
3

𝑥i 𝒚j
wji
Weight: strength of link from i to j:
component i
of the input vector
component j
of the output vector
4

input 𝑥 𝜇
𝒙 𝟏
𝜇
0
𝒙 𝟐
𝜇
1
𝒙 𝟑
𝜇
1
𝒙 𝟒
𝜇
0
𝒙 𝟓
𝜇
0
output 𝑦 𝜇
𝒚 𝟏
𝜇
0
𝒚 𝟐
𝜇
0
𝒚 𝟑
𝜇
0
𝒚 𝟒
𝜇
0
𝒚 𝟓
𝜇
1
𝒚 𝟔
𝜇
0
xi yj
wji
w11
w12
w14
w13
w15
Link from i to j:
component i
of the input vector
component j
of the output vector
5

2. Lernmatrix
w11 w21 ⋯ ⋯ ⋯
wm1
w12 w22 ⋯ ⋯ ⋯
wm2
⋯ ⋯ ⋯ ⋯ ⋯ ⋯
w1i w2i ⋯ wji ⋯
wmi
⋯ ⋯ ⋯ ⋯ ⋯ ⋯
w1n w2n ⋯ ⋯ ⋯
wmn
Correlation matrix W
Hierarchical memories and Sparse Code
i
j
Input
components
Output components (neurons)
(dendrites)
6

(Hebb’s learning rule
for correlations)
0
1
1
0
0
W𝒙1
𝒚1
0 0 0 0 0 0
𝟏 0 0 0 0 0
𝟏 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
1 0 0 0 0 0
fire-shots’
coincidences
7

W 𝒙2
𝒚1
𝒙1
W
𝒚2
1 0 0 0 0 1
2 0 0 0 0 1
1 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0
1
1
0
0
1
1
0
0
0
1 0 0 0 0 0 1 0 0 0 0 1
8

Behavior of matricial associative memories
• The memory load increases with the number of patterns learned
0,0%
0,1%
0,2%
0,3%
0,4%
0,5%
0,6%
0,7%
0 200 400 600 800 1000 1200 1400 1600 1800
p1-memoryload
p - number of patterns learned (stored)
Memory load p1 in function of p
Probability that a synapse is active after storing p patterns
𝑝1 = 1 − (1 −
𝐾𝐿
𝑀𝑁
) 𝑝
𝒑𝟏 = 𝟏 − (𝟏 −
𝑲 𝟐
𝑵 𝟐
) 𝒑
9

• The quality of the retrieved outputs deteriorates with increasing memory load
Numberofadd-errorsinoutput
Memory load in r=3
10

Research question
• How can we increase network performance?
i.e, increase the number of stored patterns using the same number of computations
without compromising the quality of the retrieval?
↔
• How can we reduce the number of computational steps
for the same number of stored patterns?
11

Research question
• How can we increase network performance?
i.e, increase the number of stored patterns using the same number of computations
without compromising the quality of the retrieval?
↔
• How can we reduce the number of computational steps for the
same number of stored patterns?
Solution
• Reorganizing matrices
12

Lernmatrix Karl Steinbuch (1958), Willshaw et al. (1969)
the “learning matrix” neural network
2.
13

2. a) Learning phase
2. b) Retrieval phase
2. c) Optimal capacity
Hierarchical memories and Sparse Code 2. Lernmatrix
14

2. a) Learning phase
Initialization
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
15

2. a) Learning with weights
• Hebb’s learning rule
W 𝒙2
𝒚1
𝒙1
W
𝒚2
1 0 0 0 0 1
2 0 0 0 0 1
1 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0
1
1
0
0
1
1
0
0
0
1 0 0 0 0 0 1 0 0 0 0 1
16

• Weights’ mathematical update
wnew
ij = wold
ij + yixj
Normalizing:
• Clipped:
wnew
ij = min (1, wold
ij + yixj)
• Or, equivalently, OR-based:
wnew
ij = wold
ij ∨ (yi ∧ xj)
17

2. b) Retrieval
• A thresholded decision
Components of 𝒚 as threshold functions of
the current weighted input 𝑥 :
𝑦𝑖 =
1 𝑖𝑓 𝑗=1
𝑛
𝑤𝑖𝑗 𝑥𝑗 ≥ 𝜃𝑖
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
~
~
dendritic potential threshold
18

2. b) Retrieval
• A thresholded decision
Dendritic
Potential
𝑑i
x1
x2
xn
xj Σ
wi1
wi2
win
wij
...
...
...
...
Neurone i
Input Output
yi
Vi
Threshold
𝜃𝑖
1
0
𝑦𝑖 =
1 𝑖𝑓 𝑗=1
𝑛
𝑤𝑖𝑗 𝑥𝑗 ≥ 𝜃𝑖
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
~
×
×
×
×~
~
~
~
19

2. c) Optimal capacity
Maximum number Cstor of associations stored is O(ln N2):
Cstor =
𝑃
𝑀
= ln(2)
𝑁2
𝐾2
with:
P – number of associations
M – number of neurons (dimension of output);
N – dimension of input
K – activity level (number of 1s per vector)
for an optimal O(ln N) activity level K:
𝐾 = 𝑙𝑜𝑔2
𝑛
4
(𝒔𝒑𝒂𝒓𝒔𝒆 𝒄𝒐𝒅𝒆)
20

Hierarchical associative memory3.
21

3. a) Structure
3. b) Training
3. c) Retrieval
Hierarchical memories and Sparse Code 3. Hierarchical memory
22

3. b) Structure
• State of the network as a set of R correlation
matrices:
𝑊 = 𝑊1
, 𝑊2
, … , 𝑊 𝑟
, … , 𝑊 𝑅−1
, 𝑊 𝑅
with: 𝑟 = 1, … , 𝑅
• Matrices hierarchically content-sized:
dimensions (𝑊 𝑟
) = 𝑚 × 𝑛 𝑟
with: 𝑛1 < 𝑛2 < ... < 𝑛 𝑅−1 < 𝑛 𝑅
𝑚 = size of the input
23

0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 1 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 1 0 0
0 0 0 0
0 0 0 0
0 1 0 0
1
0
0
0
1
0
0
1
1
0
0
0
1
0
0
1
0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0
0 1 0 0
𝑊 𝑟=1
𝑦 𝑟=2
𝑥
𝑦 𝑟=1
, size:
𝑛2
𝑎1
= 16/4 = 4
𝑥
𝑊 𝑟=2
This top layer (R) corresponds exactly to the Lernmatrix
Same clipped Hebb learning rule applied:
𝑤𝑖𝑗
𝑛𝑒𝑤
= min (1, 𝑤𝑖𝑗
𝑜𝑙𝑑
+ 𝑦𝑖 𝑥𝑗)
3. c) Learning Layer: r=R=2
24

0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 1 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 1 0 0
0 0 0 0
0 0 0 0
0 1 0 0
1
0
0
0
0
1
0
1
1
0
0
0
1
0
0
1
0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0
0 1 0 0
𝑊 𝑟=1
𝑦 𝑟=2
𝑥
𝑦 𝑟=1
, size:
𝑛2
𝑎1
= 16/4 = 4
𝑥
𝑊 𝑟=2
OR-based
aggregation
Hierarchical memory
with:
• number of layers R = 2
• aggregation factor 𝑎1 = 4
• threshold 𝜃𝑖 = 2
Successively (from r = R − 1 to r = 1), learning based on compressed versions of y:
𝑦𝑗2
𝑟
= 𝜁 𝑦𝑗1
𝑟+1
= 𝑗1=𝑎 𝑟 −(𝑎 𝑟 −1)
𝑗1=𝑎 𝑟
𝑦𝑗1
𝑟+1
with 𝑎 𝑟 constant
(size of the aggregation window: 𝑛 𝑟
=
𝑛 𝑟+1
𝑎 𝑟
)
Layer: r=1
25

0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 1 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 1 0 0
0 0 0 0
0 0 0 0
0 1 0 0
3. d) Retrieval
• Starts at lowest-resolution layer (𝑟 = 1)
1
0
0
0
1
0
0
1
𝑥~Input cue:
26

0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 1 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 1 0 0
0 0 0 0
0 0 0 0
0 1 0 0
3. d) Retrieval
1
0
0
0
1
0
0
1
0 1 0 0
𝑥~Input cue:
Output
obtained
27

0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 1 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 1 0 0
0 0 0 0
0 0 0 0
0 1 0 0
3. d) Retrieval
• At each layer 𝑟:
• 𝑦𝑗
𝑟
= 0 ↔ 𝑦𝑗
𝑟+1
= 0, ∀ 𝑗 ∈ {𝑗𝑎 𝑟 − (𝑎 𝑟 −1), ... , 𝑗𝑎 𝑟}
Hence, the search along this window can be pruned.
1
0
0
0
1
0
0
1
0 1 0 0
𝑥~
28

0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 1 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 1 0 0
0 0 0 0
0 0 0 0
0 1 0 0
3. d) Retrieval
• 𝑦𝑗
𝑟
= 0 ↔ 𝑦𝑗
𝑟+1
= 0, ∀ 𝑗 ∈ {𝑗𝑎 𝑟 − (𝑎 𝑟 −1), ... , 𝑗𝑎 𝑟}
• 𝑦𝑗
𝑟
= 1 ↔ ∃ 𝑗 ∈ {𝑗𝑎 𝑟 − (𝑎 𝑟 −1), ... , 𝑗𝑎 𝑟} : 𝑦𝑗
𝑟+1
= 1
Hence, the search along this window cannot be pruned,
occuring as for the lernmatrix (dendritic sum and thresholded
decision)
1
0
0
0
1
0
0
1
0 1 0 0
𝑥~
29

0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 1 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 1 0 0
0 0 0 0
0 0 0 0
0 1 0 0
3. d) Retrieval
1
0
0
0
1
0
0
1
𝑥~
• 𝑦𝑗
𝑟
= 0 ↔ 𝑦𝑗
𝑟+1
= 0, ∀ 𝑗 ∈ {𝑗𝑎 𝑟 − (𝑎 𝑟 −1), ... , 𝑗𝑎 𝑟}
• 𝑦𝑗
𝑟
= 1 ↔ ∃ 𝑗 ∈ {𝑗𝑎 𝑟 − (𝑎 𝑟 −1), ... , 𝑗𝑎 𝑟} : 𝑦𝑗
𝑟+1
= 1
Hence, the search along this window cannot be pruned,
occuring as for the lernmatrix (dendritic sum and thresholded
decision)
0 1 0 0
30

Ordered Indexes Hierarchical
Associative Memory
4.
31

4. a) Motivation
Hierarchical memories and Sparse Code 4. Ordered indexes memory
32

4. a) Motivation
The matrix is sparse and yet...
There may be no pruning!
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0
0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 1 1 0 1 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 1 0 0
1 0 1 1
0 1 0 0
0 1 1 1
0 0 0 0
0 1 1 0
1 0 0 0
0 1 0 0
33

4. b) Idea
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0
0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 1 0 0
1 0 1 1
0 1 0 0
0 1 1 1
0 0 0 0
0 1 0 0
1 0 0 0
0 1 0 0
0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0
1 1 0 0
1 0 0 0
1 1 0 0
0 0 0 0
1 0 0 0
1 0 0 0
1 0 0 0
information zeros
From here... ... to there
34

0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0
0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 1 0 0
1 0 1 1
0 1 0 0
0 1 1 1
0 0 0 0
0 1 0 0
1 0 0 0
0 1 0 0
0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0
1 1 0 0
1 0 0 0
1 1 0 0
0 0 0 0
1 0 0 0
1 0 0 0
1 0 0 0
information zeros
From here
More zeros
More null-columns
p1 =
𝟏𝟏
𝟑𝟐 p1 =
𝟗
𝟑𝟐
35

... ???
36

W 𝒙2
𝒚1
𝒙1
W
𝒚2
1 0 0 0 0 1
2 0 0 0 0 1
1 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0
1
1
0
0
1
1
0
0
0
1 0 0 0 0 0 1 0 0 0 0 1
This is a codification
for the positions of
the correlations!
37

4. a) Motivation
4. b) Idea
4. c) Solution: Ordered Indexes
Hierarchical Associative Memory
38

4.c) Solution: Ordered Indexes Hierarchical Associative Memory
1 0 0 1 0 0 0 0 1 1 1 0
1 0 0 1 1 0 0 0 1 1 0 0
1 1 1 0 0 0 1 0 0 0 1 0
0 0 1 0 0 0 0 1 0 0 0 0
0 0 0 0 1 0 0 1 0 0 0 1
39

4.d) Solution: Ordered Indexes Hierarchical Associative Memory
Indexes 1 2 3 4 5 6 7 8 9 10 11 12
1 1 0 0 1 0 0 0 0 1 1 1 0
2 1 0 0 1 1 0 0 0 1 1 0 0
3 1 1 1 0 0 0 1 0 0 0 1 0
4 0 0 1 0 0 0 0 1 0 0 0 0
5 0 0 0 0 1 0 0 1 0 0 0 1
40

4.c) Solution: Ordered Indexes Hierarchical Associative Memory
Auxiliary structures:
1 2 3 4 5 6 7 8 9 10 11 12
AllSequenceListinitial
Sequence of all the column-indexes
subsequence
node
41

For each Line L of the Wr=3
For each Subsequence SS in L
If (Ones and Zeros unclustered?(SS))
SplitAndOrder(SS);
Base of the algorithm:
42

For each Line L of the Wr=3
For each Subsequence SS in L
If (Ones and Zeros unclustered?(SS))
SplitAndOrder(SS);
Base of the algorithm:
Note: Variants of the model are more intelligent ways to select the next line to be tested.
43

Illustrative example:
Indexes 1 2 3 4 5 6 7 8 9 10 11 12
1 1 0 0 1 0 0 0 0 1 1 1 0
2 1 0 0 1 1 0 0 0 1 1 0 0
3 1 1 1 0 0 0 1 0 0 0 1 0
4 0 0 1 0 0 0 0 1 0 0 0 0
5 0 0 0 0 1 0 0 1 0 0 0 1
Lernmatrix after learning phase.
44

Iteration 1: line 1
Indexes 1 2 3 4 5 6 7 8 9 10 11 12
1 1 0 0 1 0 0 0 0 1 1 1 0
45

Iteration 1: line 1
Indexes 1 2 3 4 5 6 7 8 9 10 11 12
1 1 0 0 1 0 0 0 0 1 1 1 0
Zeros and Ondes unclustered? YES
46

Iteration 1: line 1
Indexes 1 2 3 4 5 6 7 8 9 10 11 12
1 1 0 0 1 0 0 0 0 1 1 1 0
Zeros and Ones unclustered? YES
Split and Order.
47

Iteration 1: line 1 :
Indexes 1 2 3 4 5 6 7 8 9 10 11 12
1 1 0 0 1 0 0 0 0 1 1 1 0
Ordering:
Indexes 1 4 9 10 11 2 3 5 6 7 8 12
1 1 1 1 1 1 0 0 0 0 0 0 0
48

Indexes 1 2 3 4 5 6 7 8 9 10 11 12
1 1 0 0 1 0 0 0 0 1 1 1 0
Ordering:
Indexes 1 4 9 10 11 2 3 5 6 7 8 12
1 1 1 1 1 1 0 0 0 0 0 0 0
49

Ordered sequence in node:
Indexes 1 4 9 10 11 2 3 5 6 7 8 12
1 1 1 1 1 1 0 0 0 0 0 0 0
1 4 9 10 11 2 3 5 6 7 8 12
AllSequenceListiteration1:
50

Ordered sequence in node:
Indexes 1 4 9 10 11 2 3 5 6 7 8 12
1 1 1 1 1 1 0 0 0 0 0 0 0
1 4 9 10 11 2 3 5 6 7 8 12
51

Iteration 2: line 2
Indexes 1 2 3 4 5 6 7 8 9 10 11 12
2 1 0 0 1 1 0 0 0 1 1 0 0
52

Iteration 2: line 2
Indexes 1 2 3 4 5 6 7 8 9 10 11 12
2 1 0 0 1 1 0 0 0 1 1 0 0
1 4 9 10 11 2 3 5 6 7 8 12
53

Indexes 1 2 3 4 5 6 7 8 9 10 11 12
2 1 0 0 1 1 0 0 0 1 1 0 0
1 4 9 10 11 2 3 5 6 7 8 12
Line 2:
1 4 9 10 11
1 1 1 1 0
2 3 5 6 7 8 12
0 0 1 0 0 0 0
AllSequenceListiteration1 with content of line2:
54

1 4 9 10 11
1 1 1 1 0
2 3 5 6 7 8 12
0 0 1 0 0 0 0
1 4 9 10
1 1 1 1
6 7 8 12
0 0 0 0
11
0
2 3
0 0
5
1
2 nodes
55

1 4 9 10
1 0 0 0
6 7 8 12
0 1 0 0
11
1
2 3
1 1
5
0
1
1
7
1
11
1
2 3
1 1
5
0
8 12
0 0
4 9 10
0 0 0
6
0
Iteration 3: line 3
56

1
0
7
0
11
0
2 3
0 1
5
0
8 12
1 0
4 9 10
0 0 0
6
0
e 4
1
0
7
0
11
0
2
0
5
0
12
0
4 9 10
0 0 0
6
0
3
1
8
1
Iteration 4: line 4
57

e 4
Iteration 5: line 5
1
0
7
0
11
0
2
0
5
1
12
1
4 9 10
0 0 0
6
0
3
0
8
1
1
0
7
0
11
0
2
0
5
1
12
1
4 9 10
0 0 0
6
0
3
0
8
1
58

Final sequence of column-indexes: 1, 4, 9, 10, 11, 5, 3, 2, 7, 8, 12, 6.
1 4 9 10 11 5 3 2 7 8 12 6
1 1 1 1 1 0 0 0 0 0 0 0
1 1 1 1 0 1 0 0 0 0 0 0
1 0 0 0 1 0 1 1 1 0 0 0
0 0 0 0 0 0 1 0 0 1 0 0
0 0 0 0 0 1 0 0 0 1 1 0
59

Un-ordered Ordered
1 4 9 10 11 5 3 2 7 8 12 6
1 1 1 1 1 0 0 0 0 0 0 0
1 1 1 1 0 1 0 0 0 0 0 0
1 0 0 0 1 0 1 1 1 0 0 0
0 0 0 0 0 0 1 0 0 1 0 0
0 0 0 0 0 1 0 0 0 1 1 0
1 2 3 4 5 6 7 8 9 10 11 12
1 0 0 1 0 0 0 0 1 1 1 0
1 0 0 1 1 0 0 0 1 1 0 0
1 0 1 0 0 0 1 0 0 0 1 0
0 0 1 0 0 0 0 1 0 0 0 0
0 0 0 0 1 0 0 1 0 0 0 1
60

1 4 9 10 11 5 3 2 7 8 12 6
1 1 1 1 1 0 0 0 0 0 0 0
1 1 1 1 0 1 0 0 0 0 0 0
1 0 0 0 1 0 1 1 1 0 0 0
0 0 0 0 0 0 1 0 0 1 0 0
0 0 0 0 0 1 0 0 0 1 1 0
1 1 0 0
1 1 0 0
1 1 1 0
0 0 1 0
0 0 0 1
1 0
1 0
1 1
0 1
0 1
OR-based learning
61

Retrieval
• Uses filtering as before
1 4 9 10 11 5 3 2 7 8 12 6
1 1 1 1 1 0 0 0 0 0 0 0
1 1 1 1 0 1 0 0 0 0 0 0
1 0 0 0 1 0 1 1 1 0 0 0
0 0 0 0 0 0 1 0 0 1 0 0
0 0 0 0 0 1 0 0 0 1 1 0
1 1 0 0
1 1 0 0
1 1 1 0
0 0 1 0
0 0 0 1
1 0
1 0
1 1
0 1
0 1
62

Retrieval
• Uses filtering as before
• Only needs the new sequence of
indexes (AllSequenceList)
to restore the correct order of the
components in the output
1 4 9 10 11 5 3 2 7 8 12 6
1 1 1 1 1 0 0 0 0 0 0 0
1 1 1 1 0 1 0 0 0 0 0 0
1 0 0 0 1 0 1 1 1 0 0 0
0 0 0 0 0 0 1 0 0 1 0 0
0 0 0 0 0 1 0 0 0 1 1 0
1 1 0 0
1 1 0 0
1 1 1 0
0 0 1 0
0 0 0 1
1 0
1 0
1 1
0 1
0 1
1 4 9 10 11 5 3 2 7 8 12 6
63

Retrieval
1 4 9 10 11 5 3 2 7 8 12 6
Column considered
1 2 3 4 5 6 7 8 9 10 11 12
Column meant (because of reordering)
Mappings
<column considered>
mapped to
<column meant>
<column meant> = AllSequenceList[<column considered>]
↔
64

6 5
Retrieval
0 0 0 0 1 0 0 0 0 0 0 1
0 0 0 0 0 1 0 0 0 0 1 0
i = 6 i = 11
11 12
i = 12i = 5
1 4 9 10 11 5 3 2 7 8 12 6
Column considered
1 2 3 4 5 6 7 8 9 10 11 12
Column meant
65

r=3 r=2 r=1
Value of 𝑤𝑖𝑗:
- 1
- 0
66

r=3 r=2 r=1
Value of 𝑤𝑖𝑗:
- 1
- 0
67

4. a) Motivation
4. b) Idea
4. c) Solution: Ordered Indexes Hierarchical
Associative Memory
4. d) Empirical experiments
68

• Method: test:
• quality (add errors) (the same as before)
• performance (number of computations) of retrieval
• additions
• multiplications
• threshold-comparisons
• fire-shots
69

• (default) Experience – in Julia programing language
• Data base of 1600 patterns
• 120 tests
• For each test:
• Performance and Quality
… in function of memory load
i.e, probability of 𝑤𝑖𝑗 = 1, ∀𝑖, 𝑗. Memory load variable thanks to a 𝑝 (number of patterns learned) variable
for retrieving 20 patterns
• Fixed N (number of neurons)
• Fixed K (activity level or number of 1s per vector). With Gauss distribution
• Each test in run by 5 models…
70

(...)
• Each test in run by 5 models
• Lernmatrix
• Hierarchical Associative Memory
• 3 models for Ordered indexes Hierarchical Ass. Memory
• Lines for iteration naively chosen
• Lines for iteration with more 1s chosen first
• Lines for iteration with more 0s chosen first + right null columns discarded
71

Total number of steps in function of memory load
in layer r=3
For each test, and for each model
- Lernmatrix
- Hierarchical Ass. Mem.
- (naively) Ordered H.A.M
- (1s first) Ordered H.A.M
- Ord. H.A.M with right
null-columns discarded
72

• Observations (and discussion)
73

- Lernmatrix
Curves of Hierarchical models >>
curve of Lernmatrix (≈-80% steps)
Hierarchical models:
74

- Lernmatrix
Curves of Hierarchical models >>
curve of Lernmatrix (≈-80% steps)
Why?
Pruning of
≈ 80% columns
75

Original
Ordered
models
Performance Ordered Hierarchical models
>> original hierarchical model
Why?
1. Reordering
improoves the
aggregations
for pruning
76

Shift
Why?
1. Reordering
otimizes aggregations
for pruning
2. Reordering frees space
y = total steps
x = memory load in r=1
- Hierarchical Ass. Mem. (H.A.M)
77

Performance model that discards
columns > other ordered column-indexes
models
Why?
Right of the matrix
is not even visited
78

5.1) Achievements
• Considerable savings
• 2-20% (worst-case) of the total number of steps
(Ordered column-indexes hierarchical model with right null-columns discarded, relatively to Lernmatrix.)
• Worthy trade-offs
• More resources spent in infrastructure and computations in Learning phase -
done (only) once.
• Overcome by benefits in retrievals
Hierarchical memories and Sparse Code 5. Conclusion
80

5.2) Future work
• Variable aggregation factor
• Adapt window to density of different zones of the matrices
• Different distribution for correlations
• Check how non-uniform activity patterns affect the models
• Check cost of hierarchical models of neural networks in biology
Hierarchical memories and Sparse Code 5. Conclusion
81

Clara_de_Paiva_Master_thesis_presentation_June2016

Recomendados

Recomendados

Más contenido relacionado

Destacado

Destacado (20)

Similar a Clara_de_Paiva_Master_thesis_presentation_June2016

Similar a Clara_de_Paiva_Master_thesis_presentation_June2016 (20)

Clara_de_Paiva_Master_thesis_presentation_June2016