Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
NIPS2007: sensory coding and hierarchical representations
1. Sensory Coding &
Hierarchical
Representations
Michael S. Lewicki
Computer Science Department &
Center for the Neural Basis of Cognition
Carnegie Mellon University
2. “IT”
V4
V2
retina LGN V1
NIPS 2007 Tutorial 2 Michael S. Lewicki ! Carnegie Mellon
3. Fig. 1. Anatomical and functional hierarchies in the macaque visual system. The human Hedgé and Felleman, 2007
from visual system (not shown) is
believed to be roughly similar. A, A schematic summary of the laminar patterns of feed-forward (or ascending) and feed-
back (or descending) connections for visual area V1. The laminar patterns vary somewhat from one visual area to the next.
But in general, the connections are complementary, so that the ascending connections terminate in the granular layer (layer
4) and the descending connections avoid it. The connections are generally reciprocal, in that an area that sends feed-
forward connections to another area also receives feedback connections from it. The visual anatomical hierarchy is defined
based on, among other things, the laminar patterns of these interconnections among the various areas. See text for details.
B, A simplified version of the visual anatomical hierarchy in the macaque monkey. For the complete version, see Felleman
and Van Essen (1991). See text for additional details. AIT = anterior inferotemporal; LGN = lateral geniculate nucleus;
LIP = lateral intraparietal; MT = middle temporal; MST = medial superior temporal; PIT = posterior inferotemporal; V1 =
visual area 1; V2 = visual area 2; V4 = visual area 4; VIP = ventral intraparietal. C, A model of hierarchical processing of
visual information proposed by David Marr (1982). D, A schematic illustration of the presumed parallels between the
anatomical and functional hierarchies. It is widely presumed 3 only that visual processingMichael S. Lewicki ! Carnegie Mellon the
NIPS 2007 Tutorial not is hierarchical but also that
4. Palmer and Rock, 1994
NIPS 2007 Tutorial 4 Michael S. Lewicki ! Carnegie Mellon
5. V1 simple cells
!"#"$%&'"()&"*+(,)(-(./(0&1$*"(#"
2"345"*&06 789-:
Hubel and Wiesel, 1959 DeAngelis, et al, 1995
NIPS 2007 Tutorial 5 Michael S. Lewicki ! Carnegie Mellon
8. Out of the retina
• > 23 distinct neural pathways, no simple function
division
• Suprachiasmatic nucleus: circadian rhythm
• Accessory optic system: stabilize retinal image
• Superior colliculus: integrate visual and auditory
information with head movements, direct eyes
• Pretectum: plays adjusting pupil size, track large
moving objects
• Pregeniculate: cells responsive to ambient light
• Lateral geniculate (LGN): main “relay” to visual
cortex; contains 6 distinct layers, each with 2
sublayers. Organization very complex
NIPS 2007 Tutorial 8 Michael S. Lewicki ! Carnegie Mellon
9. V1 simple cell integration of LGN cells
!"#"$%&"'(()'*+,-+)".*)(/"*'"&0.1/("2(//&
!"#$%&'(")&* +,%"-
345
2(//&
60.1/(
2(//
Hubel and Wiesel, 1963
!78(/"#"$0(&(/9":;<=
NIPS 2007 Tutorial 9 Michael S. Lewicki ! Carnegie Mellon
10. Hyper-column model
Hubel and Wiesel, 1978
NIPS 2007 Tutorial 10 Michael S. Lewicki ! Carnegie Mellon
11. Anatomical circuitry in V1
54 CALLAWAY
Figure 1 Contributions of individual neurons to local excitatory connections between cortical
layers. From left to right, typical spiny from Callaway, 1998 4C, 2–4B, 5, and 6 are shown. Dendritic
neurons in layers
arbors are illustrated with thick lines and axonal arbors with finer lines. Each cell projects axons
specifically to only a subset of the layers. This simplified diagram focuses on connections between
layers 2–4B, 4C, 5, and 6 and does not incorporate details of circuitry specific for subdivisions of
NIPS 2007 Tutorial Michael S. Lewicki ! Carnegie
these layers. A model of the interactions between these neurons is shown in Figure 2. The neurons Mellon
11
12. Where is this headed?
Ramon y Cajal
NIPS 2007 Tutorial 12 Michael S. Lewicki ! Carnegie Mellon
13. A wing would be a most mystifying structure
if one did not know that birds flew.
Horace Barlow, 1961
An algorithm is likely to be understood more
readily by understanding the nature of the
problem being solved than by examining the
mechanism in which it is embodied.
David Marr, 1982
14. A theoretical approach
• Look at the system from a functional perspective:
What problems does it need to solve?
• Abstract from the details:
Make predictions from theoretical principles.
• Models are bottom-up; theories are top-down.
What are the relevant computational principles?
NIPS 2007 Tutorial 14 Michael S. Lewicki ! Carnegie Mellon
15. Representing structure in natural images
image from Field (1994)
NIPS 2007 Tutorial 15 Michael S. Lewicki ! Carnegie Mellon
16. Representing structure in natural images
image from Field (1994)
NIPS 2007 Tutorial 16 Michael S. Lewicki ! Carnegie Mellon
17. Representing structure in natural images
image from Field (1994)
NIPS 2007 Tutorial 17 Michael S. Lewicki ! Carnegie Mellon
18. Representing structure in natural images
Need to describe all image structure in the scene.
What representation is best?
image from Field (1994)
NIPS 2007 Tutorial 18 Michael S. Lewicki ! Carnegie Mellon
19. V1 simple cells
!"#"$%&'"()&"*+(,)(-(./(0&1$*"(#"
2"345"*&06 789-:
Hubel and Wiesel, 1959 DeAngelis, et al, 1995
NIPS 2007 Tutorial 19 Michael S. Lewicki ! Carnegie Mellon
20. Oriented Gabor models of individual simple cells
figure from Daugman, 1990; data from Jones and Palmer, 1987
NIPS 2007 Tutorial 20 Michael S. Lewicki ! Carnegie Mellon
21. 2D Gabor wavelet captures spatial structure
• 2D Gabor functions
• Wavelet basis generated by
dilations, translations, and
rotations of a single basis function
• Can also control phase and
aspect ratio
• (drifting) Gabor functions are what
the eye “sees best”
NIPS 2007 Tutorial 21 Michael S. Lewicki ! Carnegie Mellon
22. Recoding with Gabor functions
Pixel entropy = 7.57 bits Recoding with 2D Gabor functions
Coefficient entropy = 2.55 bits
NIPS 2007 Tutorial 22 Michael S. Lewicki ! Carnegie Mellon
23. How is the V1 population organized?
NIPS 2007 Tutorial 23 Michael S. Lewicki ! Carnegie Mellon
24. A general approach to coding: redundancy reduction
Correlation of adjacent pixels
1
0.8
0.6
0.4
0.2
0
0 0.2 0.4 0.6 0.8 1
Redundancy reduction is equivalent to efficient coding.
NIPS 2007 Tutorial 24 Michael S. Lewicki ! Carnegie Mellon
25. Describing signals with a simple statistical model
Principle
Good codes capture the statistical distribution of sensory patterns.
How do we describe the distribution?
• Goal is to encode the data to desired precision
x = a1 s1 + a2 s2 + · · · + aL sL +
= As +
• Can solve for the coefficients in the no noise case
ˆ=A
s −1
x
NIPS 2007 Tutorial 25 Michael S. Lewicki ! Carnegie Mellon
26. An algorithm for deriving efficient linear codes: ICA
Learning objective: Using independent component analysis
(ICA) to optimize A:
maximize coding efficiency
maximize P(x|A) over A. ∂
∆A ∝ AA log P (x|A)
T
∂A
= −A(zsT − I)
Probability of the pattern ensemble is:
where z = (log P(s))’.
P (x1 , x2 , ..., xN |A) = P (xk |A)
k
This learning rule:
To obtain P(x|A) marginalize over s:
• learns the features that capture the
most structure
P (x|A) = ds P (x|A, s)P (s)
• optimizes the efficiency of the code
P (s)
=
| det A|
What should we use for P(s)?
NIPS 2007 Tutorial 26 Michael S. Lewicki ! Carnegie Mellon
27. Modeling Non-Gaussian distributions
• Typical coeff. distributions of natural
signals are non-Gaussian.
!=2 !=4
!6 !4 !2 0 2 4 6 !6 !4 !2 0 2 4 6
s s
1 2
NIPS 2007 Tutorial 27 Michael S. Lewicki ! Carnegie Mellon
28. The generalized Gaussian distribution
1 q
P (x|q) ∝ exp(− |x| )
2
• Or equivalently, and exponential power distribution (Box and Tiao, 1973):
2/(1+β)
ω(β) x−µ
P (x|µ, σ, β) = exp −c(β)
σ σ
• " varies monotonically with the kurtosis, #2:
!=!0.75 " =!1.08 !=!0.25 " =!0.45 !=+0.00 (Normal) " =+0.00
2 2 2
!=+0.50 (ICA tanh) "2=+1.21 !=+1.00 (Laplacian) "2=+3.00 !=+2.00 "2=+9.26
NIPS 2007 Tutorial 28 Michael S. Lewicki ! Carnegie Mellon
29. Modeling Gaussian distributions with PCA
•
!=0 !=0
Principal component
analysis (PCA) describes
the principal axes of
variation in the data
distribution.
• This is equivalent to fitting
the data with a multivariate
Gaussian. !6 !4 !2 0 2 4 6 !6 !4 !2 0 2 4 6
s1 s2
NIPS 2007 Tutorial 29 Michael S. Lewicki ! Carnegie Mellon
30. Modeling non-Gaussian distributions
•
!=2 !=4
What about non-Gaussian
marginals?
!6 !4 !2 0 2 4 6 !6 !4 !2 0 2 4 6
s1 s2
• How would this distribution
be modeled by PCA?
NIPS 2007 Tutorial 30 Michael S. Lewicki ! Carnegie Mellon
31. Modeling non-Gaussian distributions
•
!=2 !=4
What about non-Gaussian
marginals?
!6 !4 !2 0 2 4 6 !6 !4 !2 0 2 4 6
s1 s2
• How would this distribution
be modeled by PCA?
• How should the distribution
be described?
The non-orthogonal ICA
solution captures the non-
Gaussian structure
NIPS 2007 Tutorial 31 Michael S. Lewicki ! Carnegie Mellon
32. Efficient coding of natural images: Olshausen and Field, 1996
naturalscene
nature scene visual input
visual input units image basis functions
receptive fields
...
...
...
before after
learning learning
Network weights are adapted to maximize coding efficiency:
minimizes redundancy and maximizes the independence of the outputs
NIPS 2007 Tutorial 32 Michael S. Lewicki ! Carnegie Mellon
33. Model predicts local and global receptive field properties
Learned basis for natural images Overlaid basis function properties
from Lewicki and Olshausen, 1999
NIPS 2007 Tutorial 33 Michael S. Lewicki ! Carnegie Mellon
34. Algorithm selects best of many possible sensory codes
Haar
Learned Wavelet
Gabor Fourier PCA
Theoretical perspective: Not edge “detectors.” from Lewicki and Olshausen, 1999
An efficient code for natural images.
NIPS 2007 Tutorial 34 Michael S. Lewicki ! Carnegie Mellon
35. Comparing coding efficiency on natural images
Comparing codes on natural images
8
7
6
estimated bits per pixel
5
4
3
2
1
0
Gabor wavelet Haar Fourier PCA learned
NIPS 2007 Tutorial 35 Michael S. Lewicki ! Carnegie Mellon
36. Responses in primary visual cortex to visual motion
from Wandell, 1995
NIPS 2007 Tutorial 36 Michael S. Lewicki ! Carnegie Mellon
37. Sparse coding of time-varying images (Olshausen, 2002)
"#$!%
!( !
'
"#$&)')!!!(%
&
!
'
*$&)')!% ...
&
!
I(x, y, t) = ai (t )φi (x, y, t − t ) + (x, y, t)
i t
= ai (t) ∗ φi (x, y, t) + (x, y, t)
i
NIPS 2007 Tutorial 37 Michael S. Lewicki ! Carnegie Mellon
41. $ explains data from principles
Theory
$ requires idealization and abstraction
$ code signals accurately and efficiently
Principle
$ adapted to natural sensory environment
Idealization $ cell response is linear
Methodology $ information theory, natural images
$ explains individual receptive fields
Prediction
$ explains population organization
NIPS 2007 Tutorial 41 Michael S. Lewicki ! Carnegie Mellon
42. How general is the efficient coding principle?
Can it explain auditory coding?
NIPS 2007 Tutorial 42 Michael S. Lewicki ! Carnegie Mellon
43. Limitations of the linear model
• linear
• only optimal for block, not whole signal
• no phase-locking
• representation depends on block alignment
NIPS 2007 Tutorial Michael S. Lewicki ! Carnegie Mellon
44. A continuous filterbank does not form an efficient code
⊗ ←
→
→
→
Goal:
find a representation that is both time-relative and efficient
NIPS 2007 Tutorial 44 Michael S. Lewicki ! Carnegie Mellon
45. Efficient signal representation using time-shiftable kernels (spikes)
Smith and Lewicki (2005) Neural Comp. 17:19-45
M nm
x(t) = sm,i φm (t − τm,i ) + (t)
m=1 i=1
• Each spike encodes the precise time and magnitude of an acoustic feature
• Figure 3: Smith, NC ms 2956
Two important theoretical abstractions for “spikes”
- not binary
- not probabilistic
• Can convert to a population of stochastic, binary spikes
NIPS 2007 Tutorial 45 Michael S. Lewicki ! Carnegie Mellon
46. Coding audio signals with spikes
5000
Kernel CF (Hz)
2000
1000
K
500
200
100
0 5 10 15 20 25 ms
Input
Reconstruction
Residual
NIPS 2007 Tutorial 46 Michael S. Lewicki ! Carnegie Mellon
47. Kernel CF (Hz)
2000
Comparing a spike code to a spectrogram
1000
500
a
200
100
b
c
5000
5000
FrequencyCF (Hz)
2000
Kernel (Hz)
4000
1000
3000
500
2000
200
a
1000
100
c
b 100 200 300 400 500 600 700 800 ms
5000
Frequency (Hz)
4000
Kernel CF (Hz)
2000
3000
1000
2000
500
1000
200
100
100 200 300 400 500 600 700 800 ms
c How do we compute the spikes?
NIPS 2007 Tutorial
5000 47 Michael S. Lewicki ! Carnegie Mellon
48. “can” with filter-threshold
5000
Kernel CF (Hz)
2000
1000
500
200
100
0 5 10 15 20 ms
Input
Reconstruction
Residual
NIPS 2007 Tutorial 48 Michael S. Lewicki ! Carnegie Mellon
49. Spike Coding with Matching Pursuit
1. convolve signal with kernels
NIPS 2007 Tutorial 49 Michael S. Lewicki ! Carnegie Mellon
50. Spike Coding with Matching Pursuit
1. convolve signal with kernels
2. find largest peak over convolution set
NIPS 2007 Tutorial 50 Michael S. Lewicki ! Carnegie Mellon
51. Spike Coding with Matching Pursuit
1. convolve signal with kernels
2. find largest peak over convolution set
3. fit signal with kernel
NIPS 2007 Tutorial 51 Michael S. Lewicki ! Carnegie Mellon
52. Spike Coding with Matching Pursuit
1. convolve signal with kernels
2. find largest peak over convolution set
3. fit signal with kernel
4. subtract kernel from signal, record
spike, and adjust convolutions
NIPS 2007 Tutorial 52 Michael S. Lewicki ! Carnegie Mellon
53. Spike Coding with Matching Pursuit
1. convolve signal with kernels
2. find largest peak over convolution set
3. fit signal with kernel
4. subtract kernel from signal, record
spike, and adjust convolutions
5. repeat
NIPS 2007 Tutorial 53 Michael S. Lewicki ! Carnegie Mellon
54. Spike Coding with Matching Pursuit
1. convolve signal with kernels
2. find largest peak over convolution set
3. fit signal with kernel
4. subtract kernel from signal, record
spike, and adjust convolutions
5. repeat
NIPS 2007 Tutorial 54 Michael S. Lewicki ! Carnegie Mellon
55. Spike Coding with Matching Pursuit
1. convolve signal with kernels
2. find largest peak over convolution set
3. fit signal with kernel
4. subtract kernel from signal, record
spike, and adjust convolutions
5. repeat
NIPS 2007 Tutorial 55 Michael S. Lewicki ! Carnegie Mellon
56. Spike Coding with Matching Pursuit
1. convolve signal with kernels
2. find largest peak over convolution set
3. fit signal with kernel
4. subtract kernel from signal, record
spike, and adjust convolutions
5. repeat
NIPS 2007 Tutorial 56 Michael S. Lewicki ! Carnegie Mellon
57. Spike Coding with Matching Pursuit
1. convolve signal with kernels
2. find largest peak over convolution set
3. fit signal with kernel
4. subtract kernel from signal, record
spike, and adjust convolutions
5. repeat . . .
NIPS 2007 Tutorial 57 Michael S. Lewicki ! Carnegie Mellon
58. Spike Coding with Matching Pursuit
1. convolve signal with kernels
2. find largest peak over convolution set
3. fit signal with kernel
4. subtract kernel from signal, record
spike, and adjust convolutions
5. repeat . . .
6. halt when desired fidelity is reached
NIPS 2007 Tutorial 58 Michael S. Lewicki ! Carnegie Mellon
59. “can” 5 dB SNR, 36 spikes, 145 sp/sec
5000
Kernel CF (Hz)
2000
1000
500
200
100
0 50 100 150 200 ms
Input
Reconstruction
Residual
NIPS 2007 Tutorial 59 Michael S. Lewicki ! Carnegie Mellon
60. “can” 10 dB SNR, 93 spikes, 379 sp/sec
5000
Kernel CF (Hz)
2000
1000
500
200
100
0 50 100 150 200 ms
Input
Reconstruction
Residual
NIPS 2007 Tutorial 60 Michael S. Lewicki ! Carnegie Mellon
61. “can” 20 dB SNR, 391 spikes, 1700 sp/sec
5000
Kernel CF (Hz)
2000
1000
500
200
100
0 50 100 150 200 ms
Input
Reconstruction
Residual
NIPS 2007 Tutorial 61 Michael S. Lewicki ! Carnegie Mellon
62. “can” 40 dB SNR, 1285 spikes, 5238 sp/sec
5000
Kernel CF (Hz)
2000
1000
500
200
100
0 50 100 150 200 ms
Input
Reconstruction
Residual
NIPS 2007 Tutorial 62 Michael S. Lewicki ! Carnegie Mellon
63. Efficient auditory coding with optimized kernel shapes
Smith and Lewicki (2006) Nature 439:978-982
What are the optimal kernel shapes?
M nm
x(t) = sm,i φm (t − τm,i ) + (t)
m=1 i=1
Adapt algorithm of Olshausen (2002)
NIPS 2007 Tutorial 63 Michael S. Lewicki ! Carnegie Mellon
64. Optimizing the probabilistic model
M nm
x(t) = sm φm(t − τim) + (t),
i
m=1 i=1
p(x|Φ) = p(x|Φ, s, τ )p(s)p(τ )dsdτ
≈ p(x|Φ, s, τ )p(ˆ)p(ˆ)
ˆ ˆ s τ
(t) ∼ N (0, σ )
Learning (Olshausen, 2002):
∂ ∂
log p(x|Φ) = log p(x|Φ, s, τ ) + log p(ˆ)p(ˆ)
ˆ ˆ s τ
∂φm ∂φm
M n
1 ∂ m
= [x − sm φm(t − τim)]2
ˆi
2σε ∂φm m=1 i=1
1
= [x − x]
ˆ ˆi
sm
σε i
Also adapt kernel lengths
NIPS 2007 Tutorial 64 Michael S. Lewicki ! Carnegie Mellon
65. Adapting the optimal kernel shapes
NIPS 2007 Tutorial 65 Michael S. Lewicki ! Carnegie Mellon
69. Using efficient coding theoryComparing atheoretical a spectrogram
to make spike code to predictions
a
b
5000 Natural Sound Environment
Kernel CF (Hz)
2000
evolution theory
1000
500
A simple model of auditory coding
200
physiological data: 100
optimal kernels:
How do we compute the spikes?
sound
• auditory nerve non-linearities
waveform
filterbank filter shapes
static c stochastic
spiking
population
spike code
• properties
• population trends 5000 • coding efficiency
Frequency (Hz)
4000
auditory revcor filters: gammatones
5 3000
Speech Prediction
?
Auditory Nerve Filters
1 2 3 4 5 6 7 8 9 10
2000
time (ms) 2
1000
Bandwidth (kHz)
1 More theoretical questions:
• Why gammatones?
1 2 3 4 5
time (ms)
0.5
100 200 300
only compare 500 the data after optimizing
400
to 600 700 800 ms
• Why spikes?
Michael S. Lewicki ! Carnegie Mellon
• How is sound coded by
we do not fit theBad Zwischenahn ! Aug 21, 2004
data
0.2
1 2 3 4 5
the spike population?
time (ms) 0.1
0.1 0.2 0.5 1 2 5
Center Frequency (kHz)
How do we develop a theory? Prediction depends on sound ensemble.
deBoer and deJongh, 1978 Carney and Yin, 1988
Michael S. Lewicki ! Carnegie Mellon Bad Zwischenahn ! Aug 21, 2004
NIPS 2007 Tutorial 69 Michael S. Lewicki ! Carnegie Mellon
70. Learned kernels share features of auditory nerve filters
Auditory nerve filters Optimized kernels
from Carney, McDuffy, and Shekhter, 1999 scale bar = 1 msec
NIPS 2007 Tutorial 70 Michael S. Lewicki ! Carnegie Mellon
71. Learned kernels closely match individual auditory nerve filters
for each kernel find closet matching auditory nerve filter
in Laurel Carney’s database of ~100 filters.
NIPS 2007 Tutorial 71 Michael S. Lewicki ! Carnegie Mellon
72. Learned kernels overlaid on selected auditory nerve filters
For almost all learned kernels there is a closely matching auditory nerve filter.
NIPS 2007 Tutorial 72 Michael S. Lewicki ! Carnegie Mellon
73. Coding of a speech consonant
a Input
Reconstruction
Residual
b
5000
Kernel CF (Hz)
2000
4800
1000 4000
500 3400
200 38 39 40
100
c
5000
Frequency (Hz)
4000
3000
2000
1000
10 20 30 40 50 60ms
NIPS 2007 Tutorial 73 Michael S. Lewicki ! Carnegie Mellon
74. How is this achieving an efficient, time-relative code?
Time-relative coding of glottal pulses
a
b
5000
Kernel CF (Hz)
2000
1000
500
200
100
0 20 40 60 80 100 ms
NIPS 2007 Tutorial 74 Michael S. Lewicki ! Carnegie Mellon
75. $ explains data from principles
Theory
$ requires idealization and abstraction
$ code signals accurately and efficiently
Principle
$ adapted to natural sensory environment
Idealization $ analog spikes
$ information theory, natural sounds
Methodology
$ optimization
Prediction $ explains individual receptive fields
$ explains population organization
NIPS 2007 Tutorial 75 Michael S. Lewicki ! Carnegie Mellon
76. A gap in the theory?
-
- + -
-
Learned
from Hubel, 1995
NIPS 2007 Tutorial 76 Michael S. Lewicki ! Carnegie Mellon
77. Redundancy reduction for noisy channels (Atick, 1992)
s input x recoding Ax output y
channel A channel
y = x +" y = Ax +!
Mutual information
P (x, s)
I(x, s) = P (x, s) log2
s,x
P (s)P (x)
I(x, s) = 0 iff P (x, s) = P (x)P (s), i.e. x and s are independent.
NIPS 2007 Tutorial 77 Michael S. Lewicki ! Carnegie Mellon
78. Profiles of optimal filters
• high SNR
- - reduce redundancy
- + -
-
- center-surround
-
- + -
-
+ • low SNR
- average
- low-pass filter
• matches behavior of
retinal ganglion cells
NIPS 2007 Tutorial 78 Michael S. Lewicki ! Carnegie Mellon
79. An observation: Contrast sensitivity of ganglion cells
Luminance level decreases one log unit each time we go to lower curve.
NIPS 2007 Tutorial 79 Michael S. Lewicki ! Carnegie Mellon
80. Natural images have a 1/f amplitude spectrum
Field, 1987
NIPS 2007 Tutorial 80 Michael S. Lewicki ! Carnegie Mellon
81. Components of predicted filters
whitening filter
low-pass filter
optimal filter
from Atick, 1992
NIPS 2007 Tutorial 81 Michael S. Lewicki ! Carnegie Mellon
82. Predicted contrast sensitivity functions match neural data
from Atick, 1992
NIPS 2007 Tutorial 82 Michael S. Lewicki ! Carnegie Mellon
83. Robust coding of natural images
• Theory refined:
- image is noisy and blurred
- neural population size changes
- neurons are noisy
from Hubel, 1995
(a) Undistorted image (b) Fovea retinal image (c) 40 degrees eccentricity
Intensity
-4 0 4 -4 0 4
Visual angle [arc min]
from Doi and Lewicki, 2006
NIPS 2007 Tutorial 83 Michael S. Lewicki ! Carnegie Mellon
84. Problem 1: Real neurons are “noisy”
Estimates of neural information capacity
system (area) stimulus bits / sec bits / spike
fly visual (H1) motion 64 ~1
monkey visual (MT) motion 5.5 - 12 0.6 - 1.5
frog auditory (auditory nerve) noise & call 46 & 133 1.4 & 7.8
Salamander visual (ganglinon cells) rand. spots 3.2 1.6
cricket cercal (sensory afferent) mech. motion 294 3.2
cricket cercal (sensory afferent) wind noise 75 - 220 0.6 - 3.1
cricket cercal (10-2 and 10-3) wind noise 8 - 80 avg. = 1
Electric fish (P-afferent) amp. modulation 0 - 200 0 - 1.2
Limited capacity neural codes needBorst and Theunissen, 1999
After to be robust.
NIPS 2007 Tutorial 84 Michael S. Lewicki ! Carnegie Mellon
85. Traditional codes are not robust
encoding neurons
sensory input
Original
NIPS 2007 Tutorial 85 Michael S. Lewicki ! Carnegie Mellon
86. Traditional codes are not robust
encoding neurons
Add
noise equivalent
to 1 bit precision
sensory input
Original 1x efficient coding reconstruction (34% error)
1 bit precision
NIPS 2007 Tutorial 86 Michael S. Lewicki ! Carnegie Mellon
87. al., to pairwise correla- al.,
st 2001; Schneidman et
e correlations that can
not just to pairwise correla-
Redundancy plays an important role in neural coding
ental Procedures). By can
to all correlations that
xperimental Procedures). By
cy can be measured
model ofcan be measured
undancy
the light re-
g any model of the light re-
ancy was defined as
edundancy was defined as
Response of salamander retinal ganglion cells to natural movies
al information that that the
mutual information the
ne conveyed aboutstim-
veyed about the the stim-
information conveyed
d the information conveyed
a As information rates
).,Rb;S). As information rates
n of ganglion cells, we
pulation of ganglion cells, we
as a fraction of the minimum
action of the minimum
al cells cells (Reichal., al.,
dividual (Reich et et
min{I(Ra;S), I(Rb;S)}, is the
ancy I(Rb;S)}, is the
Ra;S),between two cells, so
between be nocells, sothan
ncy can two greater
fractional redundancy
an be 0 when the two cells
cy is no greater than
0 when the two cells
mation about the stimulus; its
lls encode exactly the same
about the stimulus; its
ell’s information is a subset
ode exactly the same
ues of the redundancy mean
formation is a subset
c.
the redundancy mean
s of Ganglion Cells
rocessing under realistic vi-
anglion Cells
ted retinas with a set of nat-
sing underarealistic of envi-
represent variety vi- From Puchalla et al, 2005
cell pair distance (%m)
important a set of nat- of
inas with characteristic
sent a NIPS 2007 Tutorial by the
variety of envi-
field motion caused 87 Michael S. Lewicki ! Carnegie Mellon
88. tance, the code. great numbersee, coding elementsintroduces redundancy into the cod
ty of what if a As we will of robust coding are available while their coding pre
o be Robust codingthe assumed noise in 2005,2006)
case, separated from (Doi and Lewicki, the representation, unlike PCA or ICA the
the apparent dimensionality could be increased (instead of reduced) while tha
limited. Robust coding should make optimal use of such an arbitrary number of cod
e present the optimal linear encoder andcoding introduces redundancy into the code
lity of the code. As we will Encoder see, robust decoder when the coefficients are noisy, an
Decoder
to be separated from the assumedthe different representation, unlike PCA or ICApaper r
to the number of units and to noise in the data and noise conditions. This that
x W u A ^
x
, we formulate the problem. Then, in section III, we analyze the solutions for the g
we present theand two-dimensional case. Next, in when theIV, we applyare noisy, and
ons for one- optimal Data encoder and decoder section coefficients the proposed
linear Noiseless Reconstruction
Representation
y to theits considerable robustness different data and noise conditions. This codes. is
nstrate number of units and to the compared against conventional image paper Fi
udies. formulate the problem. Then, in section III, we analyze the solutions for the gen
II, we
ions for one- and two-dimensional case. Next, in section IV, we apply the proposed ro
Model limited precision using
onstrate its channel noise: robustnessPcompared FORMULATION
additive
considerable II. Channel Noise against conventional image codes. Fina
ROBLEM
n
udies. (Fig. 1), we assume that the data is N -dimensional with zero mean and cov
model Encoder Decoder
rices W ∈ RM ×N and A ∈ RN ×M . For each data point x, its representation r in
r
FORMULATION A ^
x perturbed byII.uP ROBLEM noise (i.e., channel noise) n ∼ N (0, σ 2
W x
rough matrix W, the additive n
Data Noiseless Noisy Reconstruction
model (Fig. 1), we assume that the data is Representation
Representation
N -dimensional with zero mean and covari
trices W ∈ RM ×N and A ∈ r N ×M . For each= u + n. x, its representation r in th
R = Wx + n data point
channel noise) n ∼ N (0, σ 2 I
hrough matrix W, perturbed by the additive noise (i.e., vectors. The reconstructionnofM
the encoding matrix and its row vectors as encoding
noisy representation using matrix A: + n = u + n.
r = Wx Caveat: linear, but can
evaluate for different
s the encoding matrix and its row= Ar = encoding vectors.
x vectors as AWx + An.
ˆ Thenoise levels.
reconstruction of ad
noisy representation using matrix A:
NIPS 2007 Tutorial 88 Michael The term AWx i
the decoding matrix and its column vectors as decoding vectors. S. Lewicki ! Carnegie Mellon
89. Generalizing the model: sensory noise and optical blur
(a) Undistorted image (b) Fovea retinal image (c) 40 degrees eccentricity
Intensity
-4 0 4 -4 0 4
Visual angle [arc min]
sensory noise channel noise
ν δ only implicit
optical blur encoder decoder
s H x W r A s
ˆ
image observation representation reconstruction
Can also add sparseness and resource constraints
NIPS 2007 Tutorial 89 Michael S. Lewicki ! Carnegie Mellon
90. REVIEWS
Robust coding is distinct from “reading” noisy populations
a b In EQN 2, si is the direction (th
100 100 triggers the strongest respo
width of the tuning curve, s
80 80
(so if s = 359° and si = 2°, the
Activity (spikes s–1)
Activity (spikes s–1)
60 60
ing factor. In this case, all t
share a common tuning cur
40 40 preferred directions, si (FIG. 1
involve bell-shaped tuning cu
20 20 The inclusion of the no
because neurons are known
0 0
–100 0 100 –100 0 100 neuron that has an average
Direction (deg) Preferred direction (deg) stimulus moving at 90° mig
c d occasion for a particular 90°
100 100
another occasion for exactly
factors contribute to this va
80 80 trolled or uncontrollable as
Activity (spikes s–1)
Activity (spikes s–1)
presented to the monkey, a
60 60 neuronal responses. In the
collectively considered as ran
40 40
this noise causes important
20 20 transmission and processing
which are solved by populati
0 0 we should be concerned no
–100 0 100 –100 0 100
computes with population co
Preferred direction (deg) Preferred direction (deg) From Pouget et al, 2001
reliably in the presence of suc
Figure 1 | The standard population coding model. a | Bell-shaped tuning curves to direction for
16 neurons. b | A population pattern of activity across 64 neurons with bell-shaped tuning curves in Decoding population cod
response to an object moving at –40°. The activity of each cell was generated using EQN 1, and In this section, we shall addr
Here, we want to learn an optimal image code using noisy neurons.
plotted at the location of the preferred direction of the cell. The overall activity looks like a ‘noisy’ hill what information about t
centred around the stimulus direction. c | Population vector decoding fits a cosine function to the
object is available from the r
observed activity, and uses the peak of the cosine function, s as an estimate of the encoded
ˆ,
direction. d | Maximum likelihood fits a template derived from the tuning curves of the cells. More neurons? Let us take a hypoth
NIPS 2007 Tutorial precisely, the template is obtained from the noiseless (or average) population activity in response to S. Lewicki we record Mellon
90 Michael that ! Carnegie the activity of
and that these neurons have
91. ∆W ∝ − (Cost) = 2 A (IN − AW)Σx − λ diag
(Cost) = (Error) + λ (Var.∂W
Const.) M (83)
How do we learn robust codes?
M sensory noise 2 channel noise
u2 A = Σx WT (σn IM + WΣx WT
2
(Var. Const.) = ln ν 2 i
δ (84)
i=1
σu
optical blur encoder decoder
∂
⇔ (Cost) = O
s H x W r A ∂A s
ˆ
image observation representation reconstruction
4 ln{diag(WΣx WT )/c} u2
2 A (IObjective: x − λ
T
N − AW)Σ diag T)
WΣxi = i
SNR (85)
M diag(WΣx W 2
σn
! Find W and A that minimize reconstruction error.
A •= Channel capacity of+ WΣneuron:−1
Σx WT (σn IM the ith x WT )
2
u 2 = σu
i
2 (86)
1
Ci = ln(SNRi + 1)
∂
⇔ (Cost) = O
2 (87)
∂A
• To limit capacity, fix the coefficient signal to noise ratio:√
λ 1 + λ2
√ √
√λ1
u2 2(1 + M SNR) λ2
SNRi = 2i 2 (88)
σn
Now robust coding is formulated as a 2
constrained optimization problem. N
1 1
u 2 = σu
2 E= M λi
i
N · SNR + 1 N i=1 (89)
NIPS 2007 Tutorial 91 Michael S. Lewicki ! Carnegie Mellon
92. Robust coding of natural images
encoding neurons
Add
noise equivalent
to 1 bit precision
sensory input
Original 1x efficient coding reconstruction (34% error)
1 bit precision
NIPS 2007 Tutorial 92 Michael S. Lewicki ! Carnegie Mellon
93. Robust coding of natural images
encoding neurons
Weights adapted for optimal
robustness
sensory input
Original 1x robust coding reconstruction (3.8% error)
1 bit precision
NIPS 2007 Tutorial 93 Michael S. Lewicki ! Carnegie Mellon
94. Reconstruction improves by adding neurons
encoding neurons
Weights adapted for optimal
robustness
sensory input
Original 8x robust coding reconstruction error: 0.6%
1 bit precision
NIPS 2007 Tutorial 94 Michael S. Lewicki ! Carnegie Mellon
95. Adding precision to 1x robust coding
original images 1 bit: 12.5% error 2 bit: 3.1% error
NIPS 2007 Tutorial 95 Michael S. Lewicki ! Carnegie Mellon
96. 1 M ·2 (1, ·√ ,1 T ∈ RM . It takes a convenient fo
where 1M SNR + 1· · 1)
=
Can 2(1 + M SNR) 2 average error bound
derive minimum theoretical
2
λ2
2-D 2σx
Isotropic
E= M diag(V
2
· SNR + 1
when we define V (byλ1 + λ2 ) √ 22
N a linear transform of W,
√
2-D 1= 11
E = M E M · SNR + 1
Anisotropic λi if SNR ≥ SNRc
N · SNR + 1 N i=1 2
2 V≡
λ1
E= = EDET √
where Σx· SNR + + λ2 the
is if SNR ≤ SNRc
eigenvalue decomposition of
M 1
√ λ1
λi -≡ Dii are theof the data covariance. To summarize, the c
N ith eigenvalue eigenvalues of Σx
its λi 1
thatinputlinear transform. V should have row vectors of un
E = MN- i=1 dimensionality .
.
· SNR + 1usN (neurons)
N M Next, coding units √ a necessary condition for the min
- # of
let consider
λN
all A PPLICATION Regarding A, ∂E/∂A = O yields (se
IV. free parameters.TO IMAGE CODING
ata. we characterized the optimal solutions for 1-D and 2-D data. For1the high
ection
A = high-dim
γ 2 ES(I
ata, or theto be investigated. Here, we data. lower bound solutions for u
s remains eigenvalues of isotoropictheoretical numerical
Algorithm achieves present σ
bustness to channel noise. To derive an optimal solution we can employ a grad
ons) (Note that Results Bound
the necessary condition with respect to W (or e
function (its details are given in [3], [4]).
W (or V) 19.9%
is constrained to a certain subspace by the cha
performance of our proposed code when applied 20.3%test image. The data consists
0.5x to a
sampled T
) = 1x 512×512 pixel image. We set the number andcodin
Using the channel T 12.5% constraint (eq. 9) of the n
e randomlydiag( uufrom the diag(WΣx 12.4% W capacity σ 2 1
) = u M
be8xsimplified as (see Appendix B) Balcan, Doi, and Lewicki, 2007;
2.0% 2.0% Balcan and Lewicki, 2007
NIPS 2007 Tutorial 96
E = tr{D · (
Michael S. Lewicki ! Carnegie Mellon
97. Sparseness localizes the vectors and increases coding efficiency
normalized histogram
encoding vectors (W) decoding vectors (A)
of coeff. kurtosis
Kurtosis of RC
100 Kurtosis of RC
100
avg. = 4.9
80
80
robust coding
% % frequency
60
frequency
60
%
40
40
20
20
0
0 4 6 8 10
4 6
kurtosis 8 10
average error is unchanged kurtosis
kurtosis
12.5% & 12.5% Kurtosis of RSC
100 Kurtosis of RSC
100
avg. = 7.3
robust sparse coding
80
80
% % frequency
60
frequency
60
%40
40
20
20
0
0 4 6 8 10
4 6
kurtosis 8 10
kurtosis
kurtosis
NIPS 2007 Tutorial 97 Michael S. Lewicki ! Carnegie Mellon
98. Non-zero resource constraints localize weights
(a) 20dB 10dB 0dB -10dB
fovea (d)
20dB
(b)
Magnitude
-10dB
(c)
Spatial freq.
average error is also unchanged
40 degrees
20dB -10dB
NIPS 2007 Tutorial 98 Michael S. Lewicki ! Carnegie Mellon
99. $ explains data from principles
Theory
$ requires idealization and abstraction
$ code signals accurately, efficiently, robustly
Principle
$ adapted to natural sensory environment
$ cell response is linear, noise is additive
Idealization
$ simple, additive resource constraints
$ information theory, natural images
Methodology
$ constrained optimization
$ explains individual receptive fields
Prediction $ explains non-linear response
$ explains population organization
NIPS 2007 Tutorial 99 Michael S. Lewicki ! Carnegie Mellon
100. What happens next?
-
- + -
-
Learned
from Hubel, 1995
NIPS 2007 Tutorial 100 Michael S. Lewicki ! Carnegie Mellon
102. of filters with strongly matic gain control known as ‘divisive normalization’ that has been
is larger; for a pair that used to account for many nonlinear steady-state behaviors of neu-
Joint statisticsrons in primary visual cortex10,20,21. Normalization models have
rtion is zero. Because L2 of filter outputs show magnitude dependence
ber of other filters with- been motivated by several basic properties. First, gain control
ralization of this condi-
portional to a weighted
neighborhood and an histo{ L | L ! 0.1 } histo{ L2 | L1 ! 0.9 }
2 1
optimal weights and an 1 1
ihood of the conditional
es or sounds (Methods,
0.6 0.6
er for pairs of filters that
0.2 0.2
ge as seen through two lin-
ical filter (L2), conditioned -1 0 1 -1 0 1
diagonal spatially shifted fil-
r all image positions, and a 1
he frequency of occurence
nsional histograms are ver-
widths of these histograms
e not statistically indepen- L2 0 histo{ L2| L1 }
ull two-dimensional condi-
tional to the bin counts,
escaled to fill the range of
hly decorrelated (expected -1
of L1) but not statistically -1 0 1
stribution of L2 increases L
tive) of L1. 1
nature neuroscience • volume 4 no 8 • Schwartz and Simoncelli (2001)
from august 2001
NIPS 2007 Tutorial 102 Michael S. Lewicki ! Carnegie Mellon
103. Using sensory gain control to reduce redundancy
Schwartz and Simoncelli (2001)
1
4
Fig. 4. Generic no
response is divided
A divisive boring filters and a
normalization 0 Maximum Likeliho
model The conditional his
that the variance o
0
-1
-1 0 1 0 4 gram is a represen
Other squared a particular mecha
filter responses
2
!
F1
that is primarily e
In Fig. 7b, the hig
Stimulus quency bands wi
F2
2 Adapt weights to
DISCUSSION
!
Other squared We have describ
minimize higher-
filter responses order dependencies
processing, in w
from natural images
divided by a gain
the squared linea
NIPS 2007 Tutorial 103 Michael S. Lewicki ! Carnegie Mellon
masking tone. As in the visual data, the rate–level curves of the stant. The form
104. Model accounts for several non-linear response properties
muli. Schwartz and Simoncelli (2001)
audi-
a Cell Model
muli.
grat- a
Mask Signal
(Cavanaugh et al., 2000)
udi- Cell Model
of a Mask Signal
grat-
MeanMean firing rate
(Cavanaugh et al., 2000)
80
e fits
of a Mask contrast:
firing rate
Mean 80 No mask
e fits 40 0.13
gonal Mask contrast:
Mean 0.5 mask
No
of an 40 0.13
onal 0
of a 0.03 0.3 1 0.03 0.3 1 0.5
of an
Mean 0 Signal contrast Signal contrast
of a
non-
0.03 0.3 1 0.03 0.3 1
Mask Signal
Signal contrast
Mean
maxi-
Signal contrast
non- b
Mean Mean rate rate
Mask Signal
80
maxi-
b Mask contrast:
firing firing
80 No mask
40 0.13
Mask contrast:
0.5
No mask
udi- 40
0 0.13
0.03 0.3 1 0.03 0.3 1 0.5
cha- Signal contrast Signal contrast
udi-
have
0
0.03 0.3 1 0.03 0.3 1
ha-
a V1 Signal contrast Signal contrast
have NIPS 2007 Tutorial
odel 104 Michael S. Lewicki ! Carnegie Mellon
105. What about image structure?
• Bottom-up approaches focus on the non-linearity.
• Our aim here is to focus on the computational problem:
How do we learn the intrinsic dimensions of natural image structure?
• Idea: characterize how the local image distribution changes.
NIPS 2007 Tutorial 105 Michael S. Lewicki ! Carnegie Mellon
106. Perceptual organization in natural scenes
image of Kyoto, Japan from E. Doi
NIPS 2007 Tutorial 106 Michael S. Lewicki ! Carnegie Mellon
107. Palmer and Rock’s theory of perceptual organization
Palmer and Rock, 1994
NIPS 2007 Tutorial 107 Michael S. Lewicki ! Carnegie Mellon
109. Malik and Perona’s (1991) model of texture segregation
input image
output of dark
bar filters
output of light
bar filters
Visual Features
NIPS 2007 Tutorial 109 Michael S. Lewicki ! Carnegie Mellon
110. Malik and Perona algorithm can find texture boundaries
Painting by Gustav Klimt texture boundaries from Malik and
Perona algorithm
NIPS 2007 Tutorial 110 Michael S. Lewicki ! Carnegie Mellon