Procuring digital preservation CAN be quick and painless with our new dynamic...
Neural System for Learning to Recognize Textured Scenes
1. A SELF-ORGANIZI NG NEURAL S YS TEM FOR L EARNI NG TO
RECOGNI Z E TEXTURED S CENES
Stephen Grossberg1 and James R. Will i am 2
son
Departm of Cogni ti ve and Neural System
ent s
and C enter f or Adapti ve Systems
Boston Uni versi ty
Vision Research , 39 (1999) 1385-1406.
All c rr spo de c sh uld be a d e d to
o e n ne o d r sse :
Prof essor Stephen G rossberg
Departm of C ti ve and N
ent ogni eural Systems
Boston U versi ty
ni
677 B eacon Street
Boston, MA02215
Phone: 617-353-7858
Fax: 617-353-7755
E-m l : steve@cns. bu. edu
ai
Keywords: pattern recogni ti on, boundary segm entati on, surf ace representati on,
3. cati on neural network, adapti ve resonance theory
,
1
Supported in par t by t he Defense Res ear ch Pr oject s Agency and t he Oce of Naval Re s e ar c h
(O N00014-95- 1- 0409) and t he O c e of Naval Res ear ch ( ONR N00014- 95- 1- 0657) .
NR
2 Suppor t ed i n par t by t he Def ens e Res ear ch Pr o j ect s Agency and t he Oce of Naval Re s e ar c h
( O N00014- 95- 1- 0409) .
NR
4. Abs tr act
Asel f -organi zi ng A TE m
R X odel i s devel oped to categori ze and cl assi f y textured i m age
regi ons. A T Xspeci al i zes the F C D odel of howthe vi sual cortex sees, and the
RE A A Em
A Tm of howtem
R odel poral andpref rontal corti ces i nteract w th the hi ppocam system
i pal
to l earn vi sual recogni ti on categori es and thei r nam F C D
es. A A Eprocessi ng generates a
vector of boundary and surf ace properti es, notabl y texture and bri ghtness properti es, by
uti l i zi ng m ti -scal e
6. l l i ng-i n. Its context-sensi ti ve
ul peti
l ocal m easures of textured scenes can be used to recogni ze sceni c properti es that grad-
ual l y change across space, as w l as abrupt texture boundari es. A T i ncrem
el R ental l y
l earns recogni ti on categori es that cl assi f y F C D
A A Eoutput vectors, cl ass nam of these
es
categori es, and thei r probabi l i ti es. T op-dow expectati ons w thi n A Tencode l earned
n i R
prototypes that pay attenti on to expected vi sual f eatures. W novel vi sual i nf orm
hen a-
ti on creates a poor m w th the best exi sti ng category prototype, a m ory search
atch i em
sel ects a new category w th w ch cl assi f y the novel data. A T X i s com
i hi RE pared w th
i
psychophysi cal data, and i s benchm arked on cl assi
7. cati on of natural textures and syn-
theti c aperture radar i m ages. It outperf orm state-of -the-art system that use rul e-based,
s s
backpropagati on, and K-nearest nei ghbor cl assi
9. 1 Introduction
1.1 Ba c kgr o und a n d Be n c hma r k s
T brai n's unparal l el ed abi l i ty to percei ve and recogni ze a rapi dl y changi ng w d has
he orl
i nspi red an i ncreasi ng num of m s ai m at expl oi ti ng these properti es f or purposes
ber odel ed
of autom c target recogni ti on. On the perceptual si de, the brai n can cope w th vari abl e
ati i
i l l um nati on l evel s and noi sy sceni c data that com ne i nf orm on about edges, textures,
i bi ati
shadi ng, and depth that are overl ai d i n al l parts of a scene. T s type of general -purpose
hi
processi ng enabl es the brai n to deal w th a w de range of i m
i i agery, both f am l i ar and
i
unf am l i ar. O the recogni ti on si de, the brai n can autonom y di scover and l earn
i n ousl
recogni ti oncategori es and predi cti ve cl assi
10. cati ons that shape them ves to the stati sti cs
sel
of a changi ng envi ronm i n real ti m T present arti cl e devel ops a newsel f -organi zi ng
ent e. he
neural archi tecture that com nes perceptual and recogni ti on m s that exhi bi t these
bi odel
desi rabl e properti es.
These m s have i ndi vi dual l y been deri ved to expl ai n and predi ct data about how
odel
the brai n generates perceptual representati ons i n the stri ate and prestri ate vi sual cor-
ti ces (e. g. , A ngton, 1994; B och G
rri al rossberg, 1997; F ranci s G rossberg, 1996; G ove,
G rossberg, Mngol l a, 1995; G
i rossberg, 1994, 1997; G rossberg, Mngol l a, R
i oss, 1997;
P essoa, Mngol l a, N ann, 1995) and uses these representati ons to l earn attenti ve
i eum
recogni ti on categori es and predi cti ons through i nteracti ons betw i nf erotem
een poral , pre-
f rontal , and hi ppocam corti ces (e. g. , B
pal radski G rossberg, 1995; C arpenter G ross-
berg, 1993; G rossberg, 1995; G rossberg M l l , 1996). T perceptual theory i n ques-
erri he
ti on i s cal l ed F C D theory. It consi sts of subsystem cal l ed the B
AAE s oundary Contour
System(B S) and the F
C eature Contour System(FC that generate 3-Dboundary and
S)
surf ace representati ons that m odel the corti cal i nterbl ob and bl ob processi ng stream s,
respecti vel y. T adapti ve categori zati on and predi cti ve theory i s cal l ed A
he dapti ve Reso-
nance T heory, or A T A Tm s are capabl e of stabl y sel f -organi zi ng thei r recogni ti on
R . R odel
codes usi ng ei ther unsupervi sed or supervi sed i ncrem ental l earni ng i n any com nati on
bi
through ti m (C
e arpenter G rossberg, 1991; C arpenter et al., 1992).
T present w devel ops the A T Xm to cl assi f y scenes that i ncl ude com ex
he ork R E odel pl
textures, both natural and arti
11. ci al . T A T Xarchi tecture w bui l t up f romspe-
he R E as
ci al i zed versi ons of F C D
A A Eand A Tm s that have been desi gned to achi eve hi gh
R odel
com petence i n cl assi f yi ng textured scenes w thout al so i ncorporati ng m
i echani sm that
s
are not essenti al f or understandi ng thi s com petence. Just as the properti es of the F - A
C D and A Tm s are em
AE R odel ergent properti es that are due to i nteracti ons of thei r
vari ous parts, the properti es of the A T Xarchi tecture are al so em
RE ergent properti es due
to i nteracti ons w thi n and betw i ts F C D
i een A A Eand A Tm es. T
R odul hese newem ergent
properti es are not m y the sumof the parts of the m es of w chthey are deri ved,
erel odul hi
and need to be anal ysed on thei r ow term
n s.
Inorder to understandthe emergent properti es that are achi evedby joi ni ng a F C D
AAE
2
12. vi si on preprocessor to an A Tadapti ve cl assi
13. er, A T Xi s benchm
R RE arked agai nst state-
of -the-art al ternati ve m s of texture cl assi
14. cati on. O m stri ki ng resul ts are deri ved
odel ur ost
throughbenchm studi es that cl assi f y natural textures f romthe B
ark rodatz (1966) texture
al bum w chi s of ten used as a standardi zedtest of texture cl assi
15. cati on m s. A T X
, hi odel RE
benchm em ated the condi ti ons under w ch others benchm
arks ul hi arked thei r al gori thms
on B rodatz textures. Asi ngl e tri al of on-l i ne i ncrem ental category l earni ng by A T X
RE
can outperf ormanother l eadi ng m ' s o-l i ne batchl earni ng usi ng a com ex rul e-based
odel pl
system(G reenspan, 1996; G reenspan et al., 1994). A T Xal so outperf orm K
RE s -nearest
nei ghbor m s i n both accuracy anddata com
odel pressi on, andm ti l ayer perceptrons (back
ul
propagati on) i n both accuracy and processi ng ti m e.
T cl assi
16. cati on errors that A T Xdoes produce are com
he RE pared w th hum per-
i an
cepti on of texture si m l ari ti es (R Lohse, 1993, 1996). Acorrel ati on exi sts betw
i ao een
the psychophysi cal l y measured si m l ari ty betw tw textures and the probabi l i ty that
i een o
A T Xw l l conf use them
RE i .
A T Xi s al so used to cl assi f y regi ons i n real -w d scenes that have been processed
RE orl
by syntheti c aperture radar (SA ). SA m
R Ri agery has recentl y becom popul ar i n m
e any
satel l i te i m processi ng appl i cati ons because the SA sensor can penetrate vari abl e
age R
w eather condi ti ons (N ovak et al., 1990; W an et al., 1995). T SA m present
axm he Ri ages
a chal l enge f or texture cl assi
18. ve orders of m tude and are corrupted by hi gh l evel s of m ti pl i cati ve noi se, yi el di ng
agni ul
i ncom ete and di sconti nuous boundary and surf ace representati ons. R ts bel owon
pl esul
natural texture and SA m i l l ustrate howpattern recogni ti on m s that are based
Ri ages odel
on bi ol ogi cal pri nci pl es and m echani sm can outperf ormm s that have been deri ved
s odel
f romm tradi ti onal engi neeri ng concepts.
ore
1 . 2 Ps y c h o ph y s ic a l Da t a a n d Mo d e l Pr o p e r t i e s
A l east tw di erent approaches exi st to texture cl assi
19. cati on. In one approach, the f ocus
t o
i s on separati ng regi ons w th di erent textures by
20. ndi ng the boundari es betw them
i een
(B ergen A son, 1988; F
del ogel Sagi , 1989; Gurnsey B se, 1989; M i k P
row al erona,
1990; R ubenstei n Sagi , 1990; B ergen Landy, 1991). A nother approach attem topts
cl assi f y the textures w thi n sm l regi ons of a scene (C l i , 1985, 1988; B k, C ark,
i al ael ovi l
G sl er, 1990; Jai n F
ei arrokhni a, 1991; Greenspan et al., 1994). Such an approach
di scovers texture boundari es by cl assi f yi ng the textures w thi n each regi on di erentl y. It
i
can al so cl assi f y l ocal regi ons whose textural properti es vary gradual l y across space, and
thus are not separated by a di sti nct boundary.
Gurnsey and Laundry (1992) have provi ded psychophysi cal data i n support of the
l atter type of processi ng by show ng that hum texture recogni ti on i s onl y sl i ghtl y i m
i an -
pai red w the boundari es betw di erent textures i n a texture m c are bl urred.
hen een ozai
A T Xdoes the l atter type of cl assi
21. cati on. It deri ves a 17-di m onal f eature vec-
RE ensi
tor f romm ti pl e-scal e boundary f eatures of the B S and a surf ace bri ghtness f eature
ul C
3
23. l ters of f our di erent scal es, as suggested by
S. hi
psychophysi cal experi m (Harvey G
ents ervai s, 1978; R chards, 1979; Wl son B
i i ergen,
1979). T spati al
24. l ters are eval uated at f our di erent ori entati ons, thereby l eadi ng to a
he
16-di m onal (4 2 4) f eature vector. T 17 di m on i s a surf ace bri ghtness f eature.
ensi he th
ensi
T A T Xm uses these f eature vectors to generate a context-sensi ti ve cl assi
25. cati on
he R E odel
of l ocal texture properti es. T hese B S and FC operati ons are desi gned to be as si m e
C S pl
and f ast as possi bl e w thout i ncurri ng a l oss of accuracy i n cl assi f yi ng texture data.
i
Al arge psychophysi cal l i terature supports the F C DA A Ehypothesi s that the hum an
brai n f orm di sti nct boundary and surf ace representati ons bef ore they are bound together
s
by obj ect recogni ti on categori es. E xperi mental resul ts that support the rol e of boundary
representati ons i ncl ude the f ol l ow ng: (1) O ect superi ori ty eects occur usi ng outl i ne
i bj
sti m i w th l i ttl e surf ace detai l (D do D
ul i avi onnel l y, 1990; H a, H
om aver, Schw artz,
1976). (2) T num of errors i n tachi stoscopi c recogni ti on and the speed of i denti
26. ca-
he ber
ti on are of ten com parabl e usi ng appropri atel y and i nappropri atel y col ored obj ects (Mal ,
i
Sm th, D
i oherty, Sm th, 1979; O
i stergaard D do, 1985). (3) T
avi here i s no di erence
i nrecogni ti on speed usi ng bl ack-and-w te photographs or l i ne draw ngs that are caref ul l y
hi i
deri ved f romthem(B ederm Ju, 1988).
i an
Several types of data al so i m i cate a separate surf ace bri ghtness and col or process.
pl
T hese i ncl ude the f ol l ow ng: (4) C ored surf aces m be bound to an i ncorrect f ormdur-
i ol ay
i ng i l l usory conj uncti ons (M cLean, B roadbent, Broadbent, 1983; Stef urak Boynton,
1986; T sm Schmdt, 1982). (5) C or can f aci l i tate obj ect nam ng i f the obj ect-
rei an i ol i
s to be nam are structural l y si m l ar or degraded (C st, 1975; P ce H phreys,
ed i hri ri um
1989). (6) C ors are coded categori cal l y pri or to the processi ng stage at w ch they
ol hi
are nam (D do, 1991; R
ed avi osch, 1975). T o of the m recent studi es i n support
w ost
of the boundary-surf ace di sti ncti on w carri ed out by E der and Zucker (1998) and
ere l
R ogers-R achandran and R achandran (1998).
am am
F C D theory proposes that 3-Dboundary and surf ace f eatures that are f orm
AAE ed
i n the prestri ate vi sual cortex are categori zed i n the i nf erotemporal cortex (Grossberg,
1994, 1997). B boundary and surf ace properti es are proposed to be com ned duri ng
oth bi
the categori zati on process w thi n bottom and top-dow adapti ve pathw that are
i -up n ays
m ed by an A Tsystem T o consequences of thi s concepti on are that unam guous
odel R . w bi
boundari es can generate category recogni ti on by them ves, and that boundari es can
sel
pri m 3-Dobj ect representati ons even i f they need to be suppl em
e ented by 3-Dsurf ace
i nf orm on i n order to achi eve unam guous recogni ti on. C
ati bi avanagh (1997) has reported
data consi stent w th thi s l atter predi cti on.
i
In the A T Xi m em
R E pl entati on of thi s concept, the f eature vectors that are f orm ed
f romthe 17-di m onal boundary and surf ace f eatures of the F C D
ensi A A Epreprocessor are
i nput to an A Tcl assi
27. er, w ch categori zes the textures usi ng a bi ol ogi cal l y-m vated
R hi oti
l earni ng al gori thm H ans l earn to di scri m nate textures by l ooki ng at themand be-
. um i
com ng sensi ti ve to thei r stati sti cal properti es i n sm l regi ons. T s i s howour m i s
i al hi odel
trai ned. Intui ti vel y speaki ng, m trai ni ng i s l i ke havi ng an observer l ook at a num
odel ber
4
28. of l ocati ons and tryi ng to l earn to categori ze thembased on thei r l ocal properti es. T he
A T cl assi
29. er w used, cal l ed G
R e aussi an A T A , or G M i ncrem
R MP A, ental l y constructs
i nternal categori es that have G aussi an recepti ve
30. el ds i n the i nput space, and that m ap
to output cl ass predi cti ons (Wl l i am 1996, 1997). C l s w th G
i son, el i aussi an recepti ve
31. el ds
are ubi qui tous i n the brai n, and have been used to m data about howthe i nf erotem
odel -
poral cortex l earns to categori ze vi sual i nput patterns (Logotheti s et al., 1994). Such
m s are not, how
odel ever, typi cal l y abl e to sel f -organi ze thei r ow recogni ti on categori es
n
and to autonom y search f or new ones w th w ch to cl assi f y novel i nput patterns.
ousl i hi
A Tm s overcom thi s w
R odel e eakness by show ng howcom em
i pl entary attenti onal and ori -
enti ng system are desi gned w th w ch to bal ance betw the processi ng of f am l i ar and
s i hi een i
expected events, on the one hand, and unf am l i ar and unexpected events on the other
i
(C arpenter G rossberg, 1991; G rossberg, 1980; G rossberg M l l , 1996). A l l earned
erri l
categori zati on goes on w thi n the attenti onal system T ori enti ng subsystemi s acti -
i . he
vated i n response to events that are too novel f or the attenti onal systemto successf ul l y
categori ze them Interacti ons betw the attenti onal and ori enti ng subsystem then l ead
. een s
to a m ory search w ch di scovers a m appropri ate popul ati on of cel l s w th w ch
em hi ore i hi
to categori ze the novel i nf orm on. T
ati hese i nteracti ons are desi gned to expl ai n howthe
brai n conti nues to l earn qui ckl y about huge am ounts of newi nf orm on throughout l i f e,
ati
w thout bei ng f orced to j ust as qui ckl y f orget usef ul i nf orm on that i t has previ ousl y
i ati
l earned.
A ter each i nput i s presented (i . e. , each l ocati on i s observed), G Mautom cal l y
f A ati
acti vates cel l s w recepti ve
32. el ds adapt to represent the i nput by am
hose ounts proporti onal
to thei r l evel of match w th the i nput. H ever, i f the i nput i s too novel f or any exi sti ng
i ow
recepti ve
33. el d to m the i nput w l enough, then a m ory search i s tri ggered w ch
atch el em hi
l eads to the sel ecti on of a previ ousl y uncom i tted cel l popul ati on w th w ch a newcate-
m i hi
gory can be l earned. D ng unsupervi sed l earni ng, the correct nam of the regi ons that
uri es
are bei ng cl assi
34. ed are not suppl i ed, and the l evel of m that i s requi red f or a category
atch
to l earn i s constant. T param
he eter that determ nes thi s degree of m
i atch i s cal l ed the
vi gi l ance param because i t com
eter putati onal l y real i zes the i ntui ti ve process of bei ng
m or l ess vi gi l ant i n respose to i nf orm onof vari abl e i m
ore ati portance (C arpenter G ross-
berg, 1991). Lowvi gi l ance al l ow the netw to l earn general categori es i n w ch m
s ork hi any
i nput exem ars m share the sam category prototype. H gh vi gi l ance enabl es the net-
pl ay e i
w to l earn m speci
35. c categori es, even categori es i n w chonl y a si ngl e exem ar m
ork ore hi pl ay
be represented. T the choi ce of vi gi l ance can trade betw prototype and exem ar
hus een pl
l earni ng, even w thi n a si ngl e A Tsystem E
i R . xperi m ental evi dence consi stent w th vi gi -
i
l ance control has been reported i n m onkeys w they attem to perf ormcl assi
36. cati ons
hen pt
duri ng easy vs. di cul t di scri m nati ons (Spi tzer, D m M
i esi one, oran, 1988).
Learni ng typi cal l y starts w th a l ow vi gi l ance val ue, w ch l eads to the f orm on
i hi ati
of the m general categori es that are consi stent w th the i nput data. B
ost i ecause A T
R
m s are sel f -organi zi ng, suchl earni ng can proceed on i ts ow n an unsupervi sedm
odel ni ode.
Starti ng w th a l owvi gi l ance val ue conserves m ory resources, but i t can al so create the
i em
tendency, al so f ound i n chi l dren, to overgeneral i ze unti l f urther l earni ng l eads to category
5
38. nem (C an, et al., 1986; C ark, 1973; Sm th et al., 1985; Sm th K l er, 1978;
ent hapm l i i em
W 1983). F exam e, i t m ght happen that, af ter l earni ng a category that cl assi
39. es
ard, or pl i
vari ati ons on the l etter E the l etter F w l l al so acti vate that category, based on the
, i
vi sual si m l ari ty betw the tw types of l etters. T di erence betw the l etters E
i een o he een
and F i s determ ned by cul tural f actors, not by vi sual si m l ari ty. Supervi sed l earni ng
i i
i s of ten essenti al to prevent errors based on i nput si m l ari ty w ch do not correspond to
i hi
cul tural understandi ngs, or other envi ronm ental l y dependent f actors. A Tm s can
R odel
operate i n both unsupervi sed and supervi sed l earni ng modes, and can sw tch betw the
i een
tw seam essl y duri ng the course of l earni ng.
o l
D ng supervi sed l earni ng, the vi gi l ance param
uri eter, or requi red m l evel , i s rai sed
atch
i f an i ncorrect predi cti on i s m (e. g. , i f there i s negati ve rei nf orcem
ade ent) by j ust e-
nough to tri gger a m ory search f or a new category. T s type of vi gi l ance control
em hi
sacri
41. c categori es are needed to m the
hen ore atch
stati sti cal properti es of a gi ven envi ronm C
ent. ategori es of vari abl e general i ty are hereby
autom cal l y l earned based upon the success or f ai l ure of previ ousl y l earned categori es
ati
i n predi cti ng the correct cl assi
42. cati on. Abl ock di agramof the A T Xarchi tecture i s
RE
show i n Fi gure 1.
n
2 u pl e-scal e Ori en
lti ted Fi l ter
T A T Xm ti pl e-scal e ori ented
44. l ter that w i ntro-
he R E ul C as
duced to expl ai n texture data i n Grossberg and Mngol l a (1985). Vari ants of thi s B S
i C
45. l ter have si nce becom standard i n m texture segm
e any entati on al gori thm (M i k
s al
Perona, 1989; Sutter, B eck, G raham 1989; B k et al., 1990; B
, ovi ergen, 1991; B ergen
Landy, 1991; Jai n F arrokhni a, 1991; Graham B
, eck, Sutter, 1992; G reenspan et al.,
1994).
Fi gure 2 di agram the A T X versi on of B S processi ng (Stages 1{5) f or a si ngl e
s RE C
spati al scal e. A i n R chards (1979), w used 4 spati al f requency channel s. E chan-
s i e ach
nel com puted 4 ori entati onal contrast f eatures. T hese
46. l ter equati ons and param eters
are descri bed i n A ppendi x I. Af uncti onal descri pti on i s gi ven here. Stage 1 of the B S C
47. l ter uses an on-center o-surround netw w ork hose cel l s obey m brane equati ons, or
em
shunti ng l aw (G
s rossberg, 1980, 1983) to di scount the i l l um nant, com contrast ra-
i pute
ti os of the i m age, and norm i ze i m i ntensi ti es. Stage 2 accom i shes m ti pl e-scal e
al age pl ul
ori ented
49. l ters at the 4 ori entati ons and spati al s-
cal es. Stage 3 com putes a l ocal m easure of absol ute ori entati onal contrast by f ul l -w ave
recti f yi ng the
50. l ter acti vi ti es f romStage 2. T hese operati ons are neural l y i nterpreted as
f ol l ow Stage 1 operati ons occur i n the reti na and LG , Stage 2 operati ons at corti -
s: N
cal si m e cel l s, and Stage 3 operati ons at corti cal com ex cel l s (G
pl pl rossberg Mngol l a,
i
1985). Stage 4 si m i
51. es the B S operati ons of boundary groupi ng by com ng a s-
pl C puti
m ooth, rel i abl e measure of ori entati onal contrast that spati al l y pool s responses w thi n the
i
sam ori entati on. Stage 5 perf orm an opti onal ori entati onal i nvari ance operati on w ch
e s hi
6
52. ARTEX System
Output Prediction
of Region Type
Gaussian
ARTMAP
Classifier
Multiple Scale BCS Single Scale FCS
Orientational Contrast Surface Brightness
Features Feature
Discount
Illuminant
Input
Image
Figre 1 Blo d g m o A
u : ck iara f R
TEX ime cla tio b stem
a ssi
54. shi fts ori entati onal responses at each scal e i nto a canoni cal orderi ng. T s com
hi putati on
shi fts, w th w around, the sm
i rap oothed ori entati onal responses fromStage 4 so that the
ori entati on w th m m am i tude i s i n the
55. rst ori entati on pl ane. T useful ness of
i axi al pl he
thi s operati on i s task-dependent, as show by our si m ati ons bel ow
n ul .
G rahamet al . (1992) al so si m i
56. ed Stage 4 of the B S by pool i ng responses from
pl C
Stage 3. T then used a hand-crafted si gm dal di scri mnati on m
hey oi i easure to convert
Stage 4 output i nto a probabi l i sti c output functi onthat coul d be comparedw thsubjects'
i
rati ngs of texture di scri mnabil i ty. In the present benchm studi es, the B S
57. l ter
i ark C
outputs form part of the i nput vector to a GMcl assi
58. er w ch autonom y l earns
s A hi ousl
the probabi l i sti c recogni ti on categori es w th w ch texture di scri mnati ons are m W
i hi i ade. e
note i nSecti on3 howthe G raham al . (1992) study has beenextendedto expl ai na l arger
et
data base about texture di scri mnati on usi ng addi ti onal F C D
i A A Etheory m echani sm
s.
3 Fi l l e d - i n Su r fa c e r i ght n e s s
T F C D odel suggests howthe B S and F S i nteract to generate
59. l l ed-i n 3-D
he A A Em C C
surface representati ons w thi n the F S. T surface representati ons are deri ved from
i C hese
sceni c data after the i l l umnant has been di scounted, as i nStage 1 of F gure 2. In general ,
i i
these surface representati ons com ne i nform on about bri ghtness, col or, depth, and
bi ati
form O si m ati ons bel owdem
. ur ul onstrate the uti l i ty of usi ng a
60. l l ed-i nsurface bri ghtness
feature to hel p l earn recogni ti on categori es for texture di scri mnati on.
i
T si m est surface feature i s one that i s based on
61. rst-order di erences i n i l l um-
he pl i
nati on i ntensi ty. A i m
n proved surface feature di scounts the i l l umnant to com a
i pute
m easure of l ocal contrast. Sucha feature, how ever, can sti l l be corrupted by vari ous sorts
of specul ar noi se i nani m Inthe brai n, suchnoi se canbe due to the bl i ndspot, reti nal
age.
vei ns, and the reti nal l ayers through w ch l i ght m pass to acti vate photodetectors.
hi ust
In arti
62. ci al sensors, too, such noi se can deri ve fromsensor characteri sti cs. D scounti ng
i
the i l l umnant i s al so i nsensi ti ve to contextual groupi ngs of i m features. A
63. l l ed-i n
i age
surface bri ghtness feature overcom these de
64. ci enci es by sm ng l ocal contrast val -
es oothi
ues w they bel ong to the sam regi on, w l e m ntai ni ng contrast di erences w
hen e hi ai hen
they bel ong to di erent regi ons. F l l i ng-i n hereby sm
i oothes over i m noi se i n a form
age -
sensi ti ve w and generates a representati onthat re
ects properti es of a regi on' s formby
ay,
bei ng contai ned w thi n the regi on boundari es. It al so tends to m mze the separabi l i ty,
i axi i
i n bri ghtness space, of di erent regi on types by mni mzi ng w thi n-regi on vari ance w l e
i i i hi
m mzi ng betw
axi i een-regi on vari ance. T s sort of preattenti ve and autom c separati on
hi ati
si m i
66. er such as GM
pl A.
In Grossberg et al . (1995), a m ti pl e-scal e F C D
ul A A Enetw w devel oped to pro-
ork as
cess noi sy SA i m for use by hum operators. T
R ages an here the goal w to generate
as
reconstructi ons of SA m that w pl easi ng to the eyes of expert photoi nterpreter-
Ri ages ere
s. T B S i n thi s si m ati on used a groupi ng netw w th a feedback process that
he C ul ork i
8
67. Gaussian
ARTMAP
OV OI
Texture Processing Boundary Processing
5: Orientational 8: Sum Across
Invariance Orientations
4: Spatial 7: Half−Wave
Pooling Rectification
3: Full−Wave 6: Center−surround
Rectification Processing
2: Orientational
Filtering
Surface Processing
1: Center−surround 9: Boundary−Gated
Processing Diffusion
Input
Image
Fi g u r e 2 : Bo unda r y a nd s ur f a c e pr e pr o c e s s i ng s t a g e s . OV = o r i e nt a t i o n a l l y va r i a n t
OI = o r i e n t a t i o na l l y i nva r i a nt r e pr e s e nt a t i o n. Ei t he r OV o r OI , bu t n o t b o t h , a r e
g i v e n p r o b l e m.
9
68. can com ete and sharpen boundary representati ons. T
pl hese boundary groupi ngs created
sharpl y del i neated i m regi ons and
69. l l ed-i n surf aces. A though such a f eedback group-
age l
i ng netw has the rem
ork arkabl e property of convergi ng w thi n 1 to 3 f eedback i terati ons,
i
i t sti l l has the di sadvantage, at l east i n sof tw si m ati ons, of sl ow ng dow processi ng
are ul i n
ti me.
H w repl ace the f ul l B S
70. l ter and groupi ng netw by a m ti pl e-scal e B S
ere e C ork ul C
71. l ter and a si ngl e scal e of one-pass f eedf orw boundary processi ng to control
72. l l i ng-i n
ard
of the bri ghtness f eature. C puter si m ati ons sum ari zed bel owdem
om ul m onstrate that thi s
si m i
74. cati on benchm on B
pl pai arks rodatz textures and on SA R
textured scenes. T si m i
75. ed boundary segm
he pl entati on i s, moreover, computati onal l y 75
ti m f aster than the f eedback netw
es ork. T sl ow f eedback benchm are not reported
he er arks
here. A ccurate texture cl assi
76. cati on thus does not seemto depend upon photoreal i smof
the correspondi ng percept. Stages 6{9 of Fi gure 2 showhowthe B S
77. l ter output i s used
C
to deri ve the one-pass boundary segm entati on. A ppendi x II contai ns the equati ons and
param eters of thi s si m i
79. l l i ng-i n process.
pl
These F C D
A A Epreprocessi ng resul ts can be pl aced i nto a l arger f ram ork to better
ew
understand thei r rel evance f or understandi ng hum texture di scri m nati on. T
an i hree i ssues
need to be consi dered: (1) the use of a si m i
80. edStage 4 spati al pool i ng operati on i nstead
pl
of l ong-range groupi ng by a f eedback netw ork; (2) the rol e of surf ace representati ons;
and (3) the need f or 3-Dboundary and surf ace representati ons. W are l ong-range
hen
groupi ngs, such as i l l usory contours, not needed to i m prove texture di scri m nabi l i ty?
i
T s i s m true w the i m contai n dense enough textures to obvi ate the need f or
hi ore hen ages
groupi ng over l ong di stances. N al l of the data consi dered even by G
ot rahamet al . (1992)
w of thi s type, how
ere ever, si nce thei r di spl ays contai ned regul arl y pl aced f eatures that
coul d group together i n ori entati ons col i near, perpendi cul ar, or obl i que to thei r de
81. ni ng
edges. C ruthi rds et al . (1993) show that a m ti pl e-scal e B S
82. l ter, suppl em
ed ul C ented by
the l ong-range groupi ngs of a f eedback netw ork, coul d si m ate the pai rw se orderi ng of
ul i
hum rati ngs of texture di scri m nabi l i ty better than the G
an i rahamet al . (1992) vari ant
of the B S
83. l ter on i ts ow
C n.
Grossberg and P essoa (1997) have si m ated a vari ant of F C D theory i n w ch
ul AAE hi
both 2-Dand 3-Dboundary and surf ace operati ons w needed to si m ate psychophys-
ere ul
i cal data about the di scri m nati on of textured regi ons com
i posed of regul ar arrays of
equi l um nant col ored regi ons on backgrounds of vari abl e l um nance, as i n the experi -
i i
m of B (1994) and P
ents eck essoa, Beck, Mngol l a (1996). T s l atter si m ati on study
i hi ul
w restri cted, how
as ever, to textures com posed of col ored squares on achrom c back-
ati
grounds, rather than the stochasti c f actors that ari se i n Brodatz and SA Rtextures. T he
G rossberg and P essoa (1997) study al so does not anal yze howrecogni ti on categori es f or
di scri mnati ng textures are l earned. T
i aken together, how ever, these several studi es pro-
vi de convergi ng evi dence that F C D m
A A E echani sm can expl ai n chal l engi ng properti es
s
of data concerni ng hum texture segregati on.
an
10
84. 4 euri s t i cs
T 16-di m onal f eature vector produced by Stages 1{5 (representi ng ori entati onal
he ensi
contrast at 4 ori entati ons and 4 spati al scal es) and the si ngl e
85. l l ed-i n bri ghtness f eature
produced by Stages 6{9 yi el d a 17-di m onal boundary-surf ace f eature vector. G M
ensi A
m l earna m ng f romthe i nput space popul atedby these f eature vectors to a di screte
ust appi
output space of associ ated regi on cl ass l abel s. A noted above, G Mshares a num of
s A ber
key properti es w th other A T A
i R MParchi tectures (C arpenter, G rossberg, and R eynol ds,
1991; C arpenter et al . , 1992). G Ml earns m ngs i ncrem
A appi ental l y, w thout any pri or
i
know edge of the probl emdom n, by sel f -organi zi ng an e ent set of recogni ti on cate-
l ai ci
gori es that shape them ves to the stati sti cs of the i nput envi ronm as w l as a m
sel ent, el ap
f romrecogni ti on categori es to cl ass l abel s, w ch are suppl i ed duri ng supervi sed l earni ng.
hi
B ecause G Ml earns i ts m ngs i ncrem
A appi ental l y, a previ ousl y trai ned G Mnetw m
A ork ay
be retrai ned w th new i nput/output conti ngenci es, i ncl udi ng new cl ass l abel s, w thout
i i
any need to retrai n the netw on the previ ous data. Fi nal l y, al though G Mi s trai ned
ork A
onl y w th i ndi vi dual cl ass l abel s, i t al so l earns to accuratel y esti m the probabi l i ti es of
i ate
i ts cl ass l abel predi cti ons, as w showi n our si m ati ons bel ow
e ul .
In a typi cal A T netw (C
R ork arpenter G rossberg, 1987, 1991), an i nput vector
acti vates f eature sel ecti ve cel l s w thi n the attenti onal systemthat store the vector i n
i
short-termm ory. T s short-termm ory pattern then acti vates bottom pathw
em hi em -up ays
w hose si gnal s are
86. l tered by l earned adapti ve w ghts, or l ong-termm ory traces. T
ei em he
87. l tered si gnal s are added up at target category nodes w ch com vi a recurrent l ateral
hi pete
i nhi bi ti onto determ ne w chcategory acti vi ti es w l l be stored i n short-termm ory and
i hi i em
thereby represent the i nput vector. T degree of acti vati on of a category provi des an
he
esti m of the l i kel i hood that an i nput bel ongs to the category. A vati ng a category i s
ate cti
l i ke m ng a hypothesi s.
aki
A they are bei ng acti vated, the sel ected categori es read-out l earnedtop-dow
s nexpecta-
ti ons, or prototypes, w ch are m
hi atched agai nst the i nput vector at the f eature detectors.
T sm
hi atchi ng process pl ays the rol e of testi ng the hypothesi s. T vi gi l ance param
he eter
de
88. nes the cri teri on f or a good enough match. A noted above, l owvi gi l ance l eads to the
s
l earni ng of general categori es, w hereas hi gh vi gi l ance l eads to the l earni ng of speci al i zed
categori es, evena si ngl e exem ar, i nthe l i m t of very hi ghvi gi l ance. B varyi ng vi gi l ance,
pl i y
an A Tsystemcan hereby l earn both abstract prototypes and concrete exem ars.
R pl
If the chosen category' s m atch f uncti on exceeds the vi gi l ance param eter, then the
bottom and top-dow exchange of f eedback si gnal s l ocks the systemi nto a resonant
-up n
state. T resonant state si gni
89. es that the hypothesi s m
he atches the data w l enough to be
el
accepted by the system A Tproposes that these resonant states f ocus attenti on upon
. R
rel evant f eature com nati ons, and that onl y resonant states enter consci ous aw
bi areness
(Grossberg, 1980). R esonance tri ggers l earni ng i n both the bottom adapti ve w ghts
-up ei
that are used to acti vate the sel ected recogni ti on category, and i n the top-dow w ghts
n ei
that represent i ts prototype. T s l earni ng i ncorporates the newi nf orm on suppl i ed by
hi ati
11
90. the i nput vector i nto the l ong-termm ory of the attenti onal system
em .
If the category' s m atch f uncti on does not exceed vi gi l ance, thi s desi gnates that the
hypothesi s i s too novel to be i ncorporated i nto the prototype of the acti ve category. A
bout of m ory search, or hypothesi s testi ng, i s then tri ggered through acti vati on of the
em
ori enti ng system M ory search ei ther di scovers a category that can better represent
. em
the data or, i f no such l earned category al ready exi sts, autom cal l y chooses uncom i t-
ati m
ted cel l s w th w ch to l earn a new category. A Thereby i ncrem
i hi R ental l y di scovers new
categori es w hose degree of general i zati on vari es i nversel y w th the si ze of the vi gi l ance
i
param eter. N eurobi ol ogi cal data about recogni ti on l earni ng i n i nf erotem poral cortex that
are consi stent w th these hypotheses are revi ew
i edby C arpenter and G rossberg (1993) and
Grossberg and M l l (1996).
erri
A l of the above properti es proceed autonom y i n A Tnetw
l ousl R orks as they undergo
unsupervi sed l earni ng. A T A
R MPextends these A Tdesi gns to i ncl ude both supervi sed
R
and unsupervi sed l earni ng (C arpenter, G rossberg, R eynol ds, 1991; C arpenter et al . ,
1992). In A T A , the chosen A Tcategori es l earn to m predi cti ons w ch take the
R MP R ake hi
f ormof m ngs to the nam of output cl asses. In such an A T A system m
appi es R MP , any
di erent recogni ti on categori es can al l l earn to m i nto the sam output nam m as
ap e e, uch
m di erent vi sual f onts of a gi ven l etter of the al phabet can be grouped i nto several
any
di erent vi sual recogni ti on categori es, based upon vi sual si m l ari ty, bef ore these vi sual
i
categori es are m apped i nto the sam audi tory category that i s used to nam that l etter.
e e
AT A
R MPsystem propose howto correct a predi cti on, as i n the case w
s here the l etter
E i s di scon
91. rm by envi ronm
ed ental f eedback that the correct l etter i s F, usi ng onl y
l ocal operati ons i n envi ronm that m be
92. l l ed w th unexpected events. A T A
ents ay i R MP
does thi s usi ng a m ni m l earni ng pri nci pl e, w ch conj oi ntl y m m zes predi cti ve gen-
i ax hi axi i
eral i zati on w l e i t m ni m zes predi cti ve error. A T A
hi i i R MPdoes thi s by tryi ng to f ormthe
l argest categori es that are consi stent w th envi ronm
i ental f eedback. Am ch t racki ng pro-
at
cess real i zes thi s pri nci pl e by i ncreasi ng the vi gi l ance val ue af ter eachdi scon
93. rm onunti l
ati
i t exceeds the chosen category' s m f uncti on. T s vi gi l ance i ncrease i s the m ni m
atch hi i al
one that can tri gger newhypothesi s testi ng on that l earni ng tri al . M tracki ng hereby
atch
gi ves up the m ni m ount of general i zati on that i s requi red to correct the error. In
i umam
sum ary, an A T A
m R MPsystemorgani zes i ts categori zati on of experi ence based both on
the si m l ari ty of the i nput f eature vectors and upon f eedback f romthe envi ronm
i ental
response, w hether cul tural l y or otherw se determ ned, to the nam or other behavi ors
i i es
that i ts categori es predi ct.
5 aus s i an P
Gaussi an A T(Wl l i am 1996, 1997) provi des a m f or an A Tsystemto l earn
R i son, eans R
the stati sti cs of an i nput envi ronm E of i ts categori es de
94. nes a G
ent. ach aussi an di stri bu-
ti on i n the i nput space, w th a m and vari ance i n each i nput di m on, as w l as an
i ean ensi el
12
95. overal l a pri ori probabi l i ty. T G
he aussi an A Tbottom acti vati on f uncti on eval uates
R -up
the probabi l i ty that the i nput bel ongs to a category, gi ven i ts Gaussi an di stri buti on and
a pri ori probabi l i ty. The m f uncti on eval uates howw l the i nput
96. ts the category' s
atch el
di stri buti on, w ch i s norm i zed to a uni t hei ght. T s m i s a m
hi al hi atch easure of the di s-
tance, i n uni ts of standard devi ati on, betw the i nput vector and the category' s m
een ean.
V gi l ance speci
97. es the m m l ow e si ze of thi s di stance.
i axi umal abl
G aussi an A T al so uses di stri buted l earni ng, i n w ch m ti pl e categori es can al l
R hi ul
cooperate to cl assi f y an i nput event. G aussi an A Thereby avoi ds the probl em i ncurred
R s
by grandm other cel l m s of recogni ti on. E such category i s assi gned credi t based
odel ach
on i ts proporti on of the net acti vati on, w ch i s determ ned by al l categori es w
hi i hose m atch
f uncti ons sati sf y the vi gi l ance cri teri on. E category then l earns by an am that
ach ount
i s determ ned by i ts credi t. W G
i hen aussi an A T i s extended to G
R aussi an A T A
R MP
to enabl e i t to bene
98. t f romboth supervi sed and unsupervi sed l earni ng, each category' s
credi t i s determ ned by i ts proporti on of the net acti vati on of i ts ensem e, w ch consi sts
i bl hi
of al l categori es that m to the sam output predi cti on. T norm i zed strength of
ap e he al
each ensem e' s predi cti on i s a probabi l i ty esti m f or that predi cti on. T equati ons
bl ate he
and param eters f or G aussi an A T A
R MPare f ound i n A ppendi x III.
6 Some l t ernat i ve ext ure l as s i ers
6.1 Cm no F re E
opariso f eatu xtractio eth s
nM od
In order to eval uate the prom se of any vi si on system parti cul arl y one that attem to
i , pts
expl ai n such a com ex com
pl petence as textured scene cl assi
99. cati on, one needs to eval uate
that i t real l y works. T s i s parti cul arl y the case w the key behavi oral properti es
hi hen
em due to i nteracti ons across the enti re system T i s thus no substi tute f or runni ng
erge . here
such a systemon benchm arks on w ch com ng system have al so been eval uated.
hi peti s
O benchm com sons, presented i n Secti on 7, eval uate A T Xunder condi ti ons
ur ark pari RE
that are as si m l ar as possi bl e to those under w ch these com ng system have been
i hi peti s
eval uated.
A T Xperf orm i s
100. rst com
RE ance pared to that of a systemthat w used to cl assi f y
as
natural textures i n G reenspan et al . (1994) and G reenspan (1996). W cal l thei r m
e odel
the H d Systembecause i t i s a hybri d archi tecture that used a l og-G
ybri abor G aussi an
pyram d f or f eature extracti on f ol l ow by one of three al ternati ve cl assi
101. ers. A though
i ed l
the H d Systemw not devel oped to expl ai n bi ol ogi cal data, i t has the vi rtue of
ybri as
havi ng been devel oped to the poi nt that i t coul d be successf ul l y tested on benchm ark
data bases that use textures or textured scenes as thei r i nputs. M other bi ol ogi cal l y
ost
deri ved m s have not yet reached thi s l evel of devel opm
odel ent.
T H d System s l og-G
he ybri ' abor pyramd uses three l evel s, or spati al scal es, and f our
i
ori entati ons at each scal e. E l evel , af ter the
102. rst one, of the G
ach aussi an pyramd i s
i
13
103. obtai ned by bl urri ng the previ ous l ow l evel (i . e. , sm l er spati al scal e) w th a G
er al i aussi an
kernel (w th standard devi ati on = 1) and then deci m ng the i m (i . e. , rem ng
i ati age ovi
3 out of 4 pi xel s i n each 2x2 pi xel bl ock). D to deci m on, the G
ue ati aussi an at each
successi ve l evel eecti vel y has tw ce the of the G
i aussi an used i n the previ ous l evel . The
104. nal outputs of al l three pyram d l evel s of the H d Systemhave the sam net am
i ybri e ount
of bl urri ng, produced by three successi ve bl ur/deci m steps. T s am of bl urri ng
ate hi ount
i s equi val ent to convol vi ng w th a si ngl e G
i aussi an kernel w th =
i 21 = 12 + 22 +4 2 ,
w ch produces an 8x8 pi xel resol uti on. T i s, each patch of 8 2 8 pi xel s i n the i nput
hi hat
i m yi el ds a si ngl e pi xel i n an output i m f or each ori ented contrast f eature. In
age age
G reenspan (1996), cl assi
105. cati on resul ts at 16 2 16, 32 2 32, and 64 2 64 resol uti on w ere
al so reported.
Wthout f urther preprocessi ng, A T Xproduces f eature i m at si ngl e pi xel reso-
i RE ages
l uti on. T m a f ai r com son w th the resul ts reported by G
o ake pari i reenspan et al . (1994)
and G reenspan (1996), A T Xf eature i m need to be reduced, vi a bl urri ng and dec-
RE ages
i m on, to the sam resol uti on used there. F exam e, to change the A T Xf eatures
ati e or pl RE
to 8 2 8 resol uti on, the sm l er-scal e A T Xf eatures requi re addi ti onal bl urri ng pri or to
al RE
deci m on so that thei r net am of bl urri ng i s equi val ent to convol vi ng w th a si ngl e
ati ount i
G aussi an kernel w th =
i 21.
T net am of bl urri ng i s a cruci al consi derati onf or the tw types of tasks onw ch
he ount o hi
the system are com
s pared. T
107. cati on of a l i brary of texture i m
he ages.
B ecause thi s task does not i ncl ude transi ti ons betw di erent textures, perf orm
een ance
m onotoni cal l y i mproves as bl urri ng i s i ncreased, si nce bl urri ng reduces vari ance and thus
im proves the si gnal -to-noi se rati o. T second task i s cl assi
108. cati on of a texture m c.
he osai
H ere, texture transi ti ons need to be accuratel y resol ved, so perf orm degrades w th
ance i
over-bl urri ng. W dem
e onstrate both of these phenom bel ow ena .
6.2 Cm no C
opariso f lassi
111. cati on schem the extracted f eatures are cl ustered
ybri ' e,
i ndependentl y i n each f eature di m on usi ng the K eans procedure. M ngs f rom
ensi -m appi
these cl usters to cl ass l abel s are then f orm usi ng a batch l earni ng, rul e-based al gori thm
ed
cal l ed IT U
R LE(G an, et al . , 1992). T cl usters i n thi s schem are f orm to di s-
oodm he e ed
creti ze the i nput, so that IT U can f ormexpl i ci t rul es m ng themto the output
R LE appi
cl asses. IT U f orm a l arge num of rul es. T exact num i s never stated i n
R LE s ber he ber
G reenspan (1996). O the l arge probl em how
n s, ever, a m m axi umof 10, 000 i s al l ow and
ed,
as m as 430 rul es per cl ass are reported f or di scri m nati ng onl y tw textures. A
any i o noth-
er draw back of thi s approach i s that unsupervi sed di screti zati on vi a K eans cl usteri ng
-m
throw aw potenti al l y i m
s ay portant i nf orm on because the cl usters m span di scri m -
ati ay i
nati on boundari es i n the i nput space. Fi nal l y, G Menj oys a m or practi cal advantage
A aj
i n that i t uses a si m e i ncrem
pl ental l earni ng procedure as opposed to the com ex and
pl
com putati onal l y expensi ve batch l earni ng procedure used by IT U .
R LE
14
113. ers used i n G
he o reenspan (1996) are standard i ncrem ental
l earni ng schem the K
es: -nearest nei ghbor (K N cl assi
114. er and the m ti l ayer percep-
-N ) ul
tron (M ), backpropagati on al gori thm T
LP . hese tw approaches have com em
o pl entary
advantages and
aw K N l earns qui ckl y (one trai ni ng epoch) but achi eves no data
s. -N
com pressi on. M , on the other hand, achi eves better data com
LP pressi on but l earns very
sl ow y (500 sl ow earni ng trai ni ng epochs i n G
l -l reenspan, 1996). A addi ti onal draw
n back
of M i s that i t uses a f ormof m sm l earni ng that m suer f romcatastrophi c
LP i atch ay
f orgetti ng i f trai ned on new data w th di erent conti ngenci es f romprevi ous data. A
i s
dem onstrated by our resul ts bel ow G Mcom nes the good properti es of the above three
, A bi
cl assi
115. ers: l i ke IT U , G Mpredi cts the posteri or probabi l i ti es of the output cl asses;
R LE A
l i ke K N G Ml earns l ocal m ngs qui ckl y; l i ke M , G Machi eves si gni
116. cant data
-N , A appi LP A
com pressi on. A though G Muse a m l ocal representati on than M , and thus coul d,
l A ore LP
i n pri nci pl e, requi re m m ory, G Mcom
ore em A pensates f or thi s by constructi vel y f orm ng
i
a representati on of appropri ate si ze f or w hatever probl emi t i s trai ned on.
ext ure l as s i cat i on es ul t s
7.1 1 ex re L rary
0-T tu ib
A T Xw
117. rst com
R E as pared to the H d Systemon the l i brary of ten textures show i n
ybri n
Fi gure 3A w
, hose top rowcontai ns structured textures and w hose bottomrowcontai ns
unstructured textures. E texture i m consi sts of 128 2 128 pi xel s. T
ach age hree other
im ages of each texture are not show In G
n. reenspan (1996), cl assi
118. cati on resul ts of
the H d Systemusi ng IT U , K N and M cl assi
119. ers w publ i shed f or thi s
ybri R LE -N , LP ere
database. T cl assi
120. ers w trai ned on data at three di erent l evel s of spati al resol uti on,
he ere
w th a di erent num of trai ni ng sam es per cl ass at each resol uti on: 300 sam es at 8
i ber pl pl
2 8 resol uti on, 125 sam es at 16 2 16 resol uti on, and 40 sam es at 32 2 32 resol uti on.
pl pl
A T Xw trai ned on the sam data set under the sam condi ti ons. Li ke the H d
R E as e e ybri
System A T Xused an ori entati onal l y vari ant, or O , representati on on thi s probl em
, RE V
si nce general i zati on to novel ori entati ons of the sam texture duri ng testi ng w not
e as
requi red. A T Xw eval uated w th
121. ve randomorderi ngs of the data, and the resul ts
R E as i
w averaged.
ere
T e 1 show com
abl s parati ve resul ts f or the H d Systemand A T Xat the three
ybri RE
spati al resol uti ons. T e 1 l i sts the cl assi
122. cati on rate, num of epochs, and num
abl ber ber
of categori es (or hi dden uni ts, stored exem ars, etc. ) f or each systemcon
123. gurati on.
pl
T num of epochs i ndi cates howm trai ni ng tri al s w needed. T num of
he ber any ere he ber
categori es i ndi cate howw l the m com
el odel presses the data. In the case of K N there
-N ,
i s no compressi on, so each i nput or exem ar f orm a di erent category. T num
pl s he ber
of w ghts i ndi cate the m ory resources, or com
ei em putati onal com exi ty, that i s needed
pl
to achi eve thi s degree of com pressi on. T goal i s to m ni m ze the num of epochs,
he i i ber
categori es, and w ghts. 60 hi dden uni ts are l i sted f or M because the average M
ei LP LP
15
124. Fi g u r e 3 : ( Ne xt p a g e ) . ) 1 0 - t e x t ur e da t a ba s e o f t e x t ur e s c o r r e s p o n d i n g t o Fi g u r e 2
e t a l . ( 1 9 9 4 ) . To p r o w c o ns i s t s o f s t r uc t ur e d t e x t ur e s , a nd b o t t o m r o w o f u n s t r u c t
Te x t ur e s f r o mBr o da t z a l buma r e l a b e l e d wi t h pl a t e numb e r . To p r o w ( l e f t t o r i g h t ) :
h e r r i n g b o ne we a ve ( D1 7 ) , f r e nc h c a nva s ( D2 1 ) , c o t t o n c a nva s ( D7 7 ) , j e a n s . Bo t t o m
r i g ht ) : g r a s s ( D9 ) , pr e s s e d c o r k ( D4 ) , ha ndma de pa p e r ( D5 7 ) , pi g s k i n ( D9 2 ) , a nd wo
4 2 - t e x t ur e d a t a b a s e f r o m Br o da t z a l bum. RO 1 : r e pt i l e s k i n ( D3 ) , c o r k ( D4 ) , wi r e
( D9 ) , b a r k ( D1 2 ) , s t r a w ( D1 5 ) . RO 2 : he r r i ng b o ne ( D1 7 ) , wo o l ( D1 9 ) , f r e nc h c a nva s
( D2 4 ) , s a nd ( D2 9 ) , wa t e r ( D3 8 ) . RO 3 : s t r a w ma t t i ng ( D5 5 ) , ha ndma de pa p e r ( D5 7
( D6 8 ) , c o t t o n c a nva s ( D7 7 ) , r a a l o o p e d ( D8 4 ) , pi g s k i n ( D9 2 ) . RO 4 : f u r ( D9 3 ) ,
s k i n ( D1 0 ) , h o me s pun wo o l ( D1 1 ) , r a a we a v e ( D1 8 ) , c e r a mi c br i c k ( D2 6 ) , ne t t i ng ( D
5 : l i z a r d s k i n ( D3 6 ) , s t r a w s c r e e ni ng ( D4 9 ) , r a a wo ve n ( D5 0 ) , o r i e nt a l c l o t h (
c l o t h ( D5 3 ) , o r i e n t a l r a t t a n ( D6 5 ) . RO 6 : pl a s t i c p e l l e t s ( D6 6 ) , o r i e nt a l g r a
o r i e n t a l c l o t h ( D7 8 ) , o r i e nt a l c l o t h ( D8 0 ) , o r i e nt a l c l o t h ( D8 2 ) , wo ve n ma t t i n g
s t r a w ma t t i ng ( D8 5 ) , s e a f a n ( D8 7 ) , br i c k ( D9 5 ) , bur l a p ( D1 0 3 ) , c he e s e c l o t h ( D1 0 5
( D1 1 0 ) .
16
127. gurati on C ass. R
l ate Sam es/C ass
pl l Epochs Categori es W ghts
ei
8 2 8R lu n
eso tio :
H d System IT U
ybri , R LE 94. 3 300 Batch | |
H d System M
ybri , LP 94. 5 300 500 60 1, 500
H d System K N
ybri , -N 87. 0 300 1 3, 000 48, 000
A T X al l f eatures
RE , 95. 8 300 1 26. 6 958
A T X al l f eatures
RE , 96. 3 300 5 34. 0 1, 224
A T X no l arge-scal e f eatures
RE , 97. 1 300 5 41. 0 1, 148
A T X no bri ghtness f eature
RE , 95. 6 300 5 38. 4 1, 306
A T X no l arge-scal e or
RE , 95. 7 300 5 47. 2 1, 227
bri ghtness f eatures
1 2 1 eso tio :
6 6R lu n
H d System IT U
ybri , R LE 95. 0 125 Batch | |
H d System M
ybri , LP 96. 0 125 500 60 1, 500
H d System K N
ybri , -N 93. 0 125 1 1, 250 20, 000
A T X al l f eatures
RE , 97. 2 125 1 17. 4 626
32 2 3 eso tio :
2R lu n
H d System IT U
ybri , R LE 97. 8 40 Batch | |
H d System M
ybri , LP 100. 0 40 500 60 1, 500
H d System K N
ybri , -N 99. 0 40 1 400 6, 400
A T X al l f eatures
RE , 100. 0 40 1 10. 6 382
Ta b l e 1 : Re c o g ni t i o n s t a t i s t i c s o n 1 0 - t e x t ur e l i br a r y a t t hr e e pi x e l r e s o l u t i o n
a n d 3 2 2 3 2 . The numb e r o f we i g ht s i s de t e r mi ne d by mul t i pl y i ng t he n u mb e r o f c a t e g
t h e n u mb e r o f we i g h t s p e r c a t e g o r y, o r . i s c a l c ul a t e d ba s e d o n t h e d i me ns
i nput s pa c e , , a nd t he numb e r o f o ut put c l a s s e s , . =1 5 f o r t he y b r i d S y s t e m,
ARTEX, a n d =1 0 b e c a us e t he r e a r e 1 0 t e x t ur e s . Fo r LP, = = 2 5 . Fo r - NN
= 1 = 1 6 . Fo r ARTEX wi t h a l l f e a t ur e s , =2 2 = 3 6 . Fo r ARTEX wi t h
no l a r g e - s c a l e f e a t ur e s ( =1 3 ) , = 2 8 . Fo r ARTEX wi t h no br i g h t n e s s f e a t u r e
, = 3 4 . Fo r ARTEX wi t h no l a r g e - s c a l e o r br i g ht ne s s f e a t ur e s ( = 1 2 ) ,
Fo r e x a mpl e , t h e 4 8 , 0 0 0 we i g ht s f o r - NN a r e c o mput e d a s f o l l o ws . Th e y b r i d S y s t
f e a t u r e s p e r i nput s a mpl e . i t h - NN, t he s e 1 5 f e a t ur e s pl us t he c o r r e c t c l a s s
s t o r e d f o r e a c h t r a i ni ng s a mpl e . The r e f o r e , t he numb e r o f we i g ht s t h a t mu s t b e s
( n u mb e r o f t r a i ni ng s a mpl e s ) . Si nc e t he r e a r e 3 0 0 s a mpl e s /c l a s s a n d 1 0 c l a s s e s ,
t r a i n i n g s a mpl e s . I n a l l 1 6 2 3 ; 0 0 0 = 4 8 ; 0 0 0 we i g ht s .
18
128. resul ts w reported f or 30, 60, and 90 hi dden uni ts.
ere
A T Xw tested w th several con
129. gurati ons, w th di erent subsets of i ts f eatures
R E as i i
rem oved. Wth i ts f ul l 17-di m onal f eature set, A T Xachi eved 95. 8 correct af ter
i ensi RE
onl y one i ncremental trai ni ng epoch, and 96. 3 af ter
130. ve epochs. B com son, the
y pari
H d Systemw th K Nachi eved onl y 87. 0 correct af ter one trai ni ng epoch, at the
ybri i -N
cost of 3, 000 stored exem ars com
pl pared to 23 i nternal categori es f or A T X Wth
RE . i
m l onger trai ni ng ti m (i . e. , 500 trai ni ng epochs usi ng M , or the com
uch es LP putati onal l y
expensi ve batch-l earni ng procedures usi ng K eans and IT U ), the H d System
-m R LE ybri
di d not m the perf orm of A T Xw th onl y one i ncrem
atch ance RE i ental l earni ng epoch, and
exhi bi ted 49 m errors than A T Xw th 5 trai ni ng epochs.
ore RE i
Three al ternati ve A T Xcon
131. gurati ons w al so tested to el uci date w A T X
RE ere hy R E
achi eved better resul ts than the H d System A T Xuses f our spati al scal es versus
ybri . RE
onl y three f or the H d System T
ybri . heref ore, perhaps i ts l argest spati al scal e conf erred an
advantage to A T XRE . T s possi bi l i ty w tested by rem ng the l argest scal e, resul ti ng
hi as ovi
i n a sl i ght perf orm i ncrem (97. 1 ). A
ance ent nother uni que f eature used by A T Xi s i ts
RE
132. l l ed-i n surf ace bri ghtness f eature, w ch seem to be m eecti ve than the m ti -scal e
hi s ore ul
G aussi an bl urri ng used by the H d System R ovi ng the bri ghtness f eature resul ted
ybri . em
i n a perf orm ance decrem (95. 6 ). T s di erence quanti
133. es how m surf ace as
ent hi uch
opposed to boundary properti es i n
uence recogni ti on accuracy on these data. Fi nal l y,
both the l arge-scal e and the bri ghtness f eatures w rem ere oved. T s resul ted i n a si m l ar
hi i
perf orm decrem (95. 7 ).
ance ent
T m
he odest rol e pl ayed by the surf ace bri ghtness f eature i n cl assi f yi ng these data i s
consi stent w th cogni ti ve evi dence sum ari zed above suggesti ng that boundary i nputs
i m
that go di rectl y to the hum cogni ti ve recogni ti on systemare of ten su ent to ac-
an ci
curatel y recogni ze m obj ects. Surf ace bri ghtness and col or properti es becom m
any e ore
important i nsof ar as the boundary i nf orm on, by i tsel f , i s am guous. G venthat bound-
ati bi i
ari es are predi cted to be perceptual l y i nvi si bl e w thi n the B S i tsel f (vi z. , the i nterbl ob
i C
corti cal processi ng stream these resul ts are consi stent w th the possi bi l i ty of bei ng abl e
), i
to qui ckl y begi n to recogni ze certai n obj ects usi ng thei r i nvi si bl e boundari es even bef ore
these obj ects becom vi si bl e through thei r surf ace properti es.
e
T A T Xadvantage, even w th
134. ve A T Xf eatures rem
he R E i RE oved, i s probabl y due to
som rem ni ng di erences betw
e ai eenthe system (1) the nature of band-pass
139. rst
e. he
di erence i s i n the Stage 1 band-pass
140. l teri ng operati on pri or to the ori entati onal Gabor
141. l teri ng. T H d Systemuses a Lapl aci an pyramd i n w ch both the center and
he ybri i hi
surround G aussi ans that m up the band-pass
142. l ter doubl e i n si ze w th each scal e. In
ake i
A T X onthe other hand, onl y the surroundG
RE , aussi angrow w theachsuccessi ve spati al
s i
scal e. It preserves on-center resol uti on w l e varyi ng the scal e of i m norm i zati on
hi age al
and noi se suppressi on. T the H d Systemi s m m restri cti ve i n the range of
hus, ybri uch ore
spati al f requenci es that are passed through to i ts ori entati onal
147. ned w th hi gher-f requency si new
i aves
(50 hi gher f requency; see A ppendi x I f or param eters). T thi rd di erence i s that
he
Stage 4 of A T Xperf orm spati al pool i ng f ol l ow ng ori entati onal
148. l teri ng at eachspati al
RE s i
scal e. T H d Systemdoes not do thi s i n i ts l argest spati al f requency channel at 8 2 8
he ybri
resol uti on. Theref ore, thi s di screpancy m ght hel p expl ai n w A T Xoutperf orm the
i hy R E s
H d Systemat 8 2 8 resol uti on, but not at l ow resol uti ons. T f ourth di erence
ybri er he
i s the cl assi
149. cati on stage. T advantages of the sel f -organi zi ng G
he aussi an A T A
R MP
cl assi
150. er over those used by the H d Systemare descri bed above.
ybri
7.2 L er T tu L raries
arg ex re ib
In G reenspan (1996), recogni ti on stati sti cs of the H d Systemon a 30-texture l i brary
ybri
w presented. T s l i brary consi sts of 19 textures f romthe B
ere hi rodatz al bum and 11
,
addi ti onal textures of com parabl e com exi ty. W w unabl e to obtai n thi s database,
pl e ere
and so w chose to eval uate A T Xon a l i brary of si m l ar textures obtai ned sol el y f rom
e RE i
the B rodatz al bum w ch contai ns the 19 textures used i n G
, hi reenspan (1996) as a subset.
Fi gure 3Bshow thi s l i brary of 42 B
s rodatz textures. T pl ate num f romthe B
he bers rodatz
al bumare l i sted i n the capti on. T 19 textures eval uated i n G
he reenspan (1996) com sepri
the