Neural System for Learning to Recognize Textured Scenes

1. A SELF-ORGANIZI NG NEURAL S YS TEM FOR L EARNI NG TO RECOGNI Z E TEXTURED S CENES Stephen Grossberg1 and James R. Will i am 2 son Departm of Cogni ti ve and Neural System ent s and C enter f or Adapti ve Systems Boston Uni versi ty Vision Research , 39 (1999) 1385-1406. All c rr spo de c sh uld be a d e d to o e n ne o d r sse : Prof essor Stephen G rossberg Departm of C ti ve and N ent ogni eural Systems Boston U versi ty ni 677 B eacon Street Boston, MA02215 Phone: 617-353-7858 Fax: 617-353-7755 E-m l : steve@cns. bu. edu ai Keywords: pattern recogni ti on, boundary segm entati on, surf ace representati on,

2. l l i ng-i n, texture cl assi

3. cati on neural network, adapti ve resonance theory , 1 Supported in par t by t he Defense Res ear ch Pr oject s Agency and t he Oce of Naval Re s e ar c h (O N00014-95- 1- 0409) and t he O c e of Naval Res ear ch ( ONR N00014- 95- 1- 0657) . NR 2 Suppor t ed i n par t by t he Def ens e Res ear ch Pr o j ect s Agency and t he Oce of Naval Re s e ar c h ( O N00014- 95- 1- 0409) . NR

4. Abs tr act Asel f -organi zi ng A TE m R X odel i s devel oped to categori ze and cl assi f y textured i m age regi ons. A T Xspeci al i zes the F C D odel of howthe vi sual cortex sees, and the RE A A Em A Tm of howtem R odel poral andpref rontal corti ces i nteract w th the hi ppocam system i pal to l earn vi sual recogni ti on categori es and thei r nam F C D es. A A Eprocessi ng generates a vector of boundary and surf ace properti es, notabl y texture and bri ghtness properti es, by uti l i zi ng m ti -scal e

5. l teri ng, com ti on, and di usi ve

6. l l i ng-i n. Its context-sensi ti ve ul peti l ocal m easures of textured scenes can be used to recogni ze sceni c properti es that gradual l y change across space, as w l as abrupt texture boundari es. A T i ncrem el R ental l y l earns recogni ti on categori es that cl assi f y F C D A A Eoutput vectors, cl ass nam of these es categori es, and thei r probabi l i ti es. T op-dow expectati ons w thi n A Tencode l earned n i R prototypes that pay attenti on to expected vi sual f eatures. W novel vi sual i nf orm hen a- ti on creates a poor m w th the best exi sti ng category prototype, a m ory search atch i em sel ects a new category w th w ch cl assi f y the novel data. A T X i s com i hi RE pared w th i psychophysi cal data, and i s benchm arked on cl assi

7. cati on of natural textures and syntheti c aperture radar i m ages. It outperf orm state-of -the-art system that use rul e-based, s s backpropagati on, and K-nearest nei ghbor cl assi

8. ers. 1

9. 1 Introduction 1.1 Ba c kgr o und a n d Be n c hma r k s T brai n's unparal l el ed abi l i ty to percei ve and recogni ze a rapi dl y changi ng w d has he orl i nspi red an i ncreasi ng num of m s ai m at expl oi ti ng these properti es f or purposes ber odel ed of autom c target recogni ti on. On the perceptual si de, the brai n can cope w th vari abl e ati i i l l um nati on l evel s and noi sy sceni c data that com ne i nf orm on about edges, textures, i bi ati shadi ng, and depth that are overl ai d i n al l parts of a scene. T s type of general -purpose hi processi ng enabl es the brai n to deal w th a w de range of i m i i agery, both f am l i ar and i unf am l i ar. O the recogni ti on si de, the brai n can autonom y di scover and l earn i n ousl recogni ti oncategori es and predi cti ve cl assi

10. cati ons that shape them ves to the stati sti cs sel of a changi ng envi ronm i n real ti m T present arti cl e devel ops a newsel f -organi zi ng ent e. he neural archi tecture that com nes perceptual and recogni ti on m s that exhi bi t these bi odel desi rabl e properti es. These m s have i ndi vi dual l y been deri ved to expl ai n and predi ct data about how odel the brai n generates perceptual representati ons i n the stri ate and prestri ate vi sual corti ces (e. g. , A ngton, 1994; B och G rri al rossberg, 1997; F ranci s G rossberg, 1996; G ove, G rossberg, Mngol l a, 1995; G i rossberg, 1994, 1997; G rossberg, Mngol l a, R i oss, 1997; P essoa, Mngol l a, N ann, 1995) and uses these representati ons to l earn attenti ve i eum recogni ti on categori es and predi cti ons through i nteracti ons betw i nf erotem een poral , pre- f rontal , and hi ppocam corti ces (e. g. , B pal radski G rossberg, 1995; C arpenter G rossberg, 1993; G rossberg, 1995; G rossberg M l l , 1996). T perceptual theory i n ques- erri he ti on i s cal l ed F C D theory. It consi sts of subsystem cal l ed the B AAE s oundary Contour System(B S) and the F C eature Contour System(FC that generate 3-Dboundary and S) surf ace representati ons that m odel the corti cal i nterbl ob and bl ob processi ng stream s, respecti vel y. T adapti ve categori zati on and predi cti ve theory i s cal l ed A he dapti ve Reso- nance T heory, or A T A Tm s are capabl e of stabl y sel f -organi zi ng thei r recogni ti on R . R odel codes usi ng ei ther unsupervi sed or supervi sed i ncrem ental l earni ng i n any com nati on bi through ti m (C e arpenter G rossberg, 1991; C arpenter et al., 1992). T present w devel ops the A T Xm to cl assi f y scenes that i ncl ude com ex he ork R E odel pl textures, both natural and arti

11. ci al . T A T Xarchi tecture w bui l t up f romspe- he R E as ci al i zed versi ons of F C D A A Eand A Tm s that have been desi gned to achi eve hi gh R odel com petence i n cl assi f yi ng textured scenes w thout al so i ncorporati ng m i echani sm that s are not essenti al f or understandi ng thi s com petence. Just as the properti es of the F - A C D and A Tm s are em AE R odel ergent properti es that are due to i nteracti ons of thei r vari ous parts, the properti es of the A T Xarchi tecture are al so em RE ergent properti es due to i nteracti ons w thi n and betw i ts F C D i een A A Eand A Tm es. T R odul hese newem ergent properti es are not m y the sumof the parts of the m es of w chthey are deri ved, erel odul hi and need to be anal ysed on thei r ow term n s. Inorder to understandthe emergent properti es that are achi evedby joi ni ng a F C D AAE 2

12. vi si on preprocessor to an A Tadapti ve cl assi

13. er, A T Xi s benchm R RE arked agai nst state- of -the-art al ternati ve m s of texture cl assi

14. cati on. O m stri ki ng resul ts are deri ved odel ur ost throughbenchm studi es that cl assi f y natural textures f romthe B ark rodatz (1966) texture al bum w chi s of ten used as a standardi zedtest of texture cl assi

15. cati on m s. A T X , hi odel RE benchm em ated the condi ti ons under w ch others benchm arks ul hi arked thei r al gori thms on B rodatz textures. Asi ngl e tri al of on-l i ne i ncrem ental category l earni ng by A T X RE can outperf ormanother l eadi ng m ' s o-l i ne batchl earni ng usi ng a com ex rul e-based odel pl system(G reenspan, 1996; G reenspan et al., 1994). A T Xal so outperf orm K RE s -nearest nei ghbor m s i n both accuracy anddata com odel pressi on, andm ti l ayer perceptrons (back ul propagati on) i n both accuracy and processi ng ti m e. T cl assi

16. cati on errors that A T Xdoes produce are com he RE pared w th hum per- i an cepti on of texture si m l ari ti es (R Lohse, 1993, 1996). Acorrel ati on exi sts betw i ao een the psychophysi cal l y measured si m l ari ty betw tw textures and the probabi l i ty that i een o A T Xw l l conf use them RE i . A T Xi s al so used to cl assi f y regi ons i n real -w d scenes that have been processed RE orl by syntheti c aperture radar (SA ). SA m R Ri agery has recentl y becom popul ar i n m e any satel l i te i m processi ng appl i cati ons because the SA sensor can penetrate vari abl e age R w eather condi ti ons (N ovak et al., 1990; W an et al., 1995). T SA m present axm he Ri ages a chal l enge f or texture cl assi

17. ers because they contai n pi xel i ntensi ti es that vary over

18. ve orders of m tude and are corrupted by hi gh l evel s of m ti pl i cati ve noi se, yi el di ng agni ul i ncom ete and di sconti nuous boundary and surf ace representati ons. R ts bel owon pl esul natural texture and SA m i l l ustrate howpattern recogni ti on m s that are based Ri ages odel on bi ol ogi cal pri nci pl es and m echani sm can outperf ormm s that have been deri ved s odel f romm tradi ti onal engi neeri ng concepts. ore 1 . 2 Ps y c h o ph y s ic a l Da t a a n d Mo d e l Pr o p e r t i e s A l east tw di erent approaches exi st to texture cl assi

19. cati on. In one approach, the f ocus t o i s on separati ng regi ons w th di erent textures by

20. ndi ng the boundari es betw them i een (B ergen A son, 1988; F del ogel Sagi , 1989; Gurnsey B se, 1989; M i k P row al erona, 1990; R ubenstei n Sagi , 1990; B ergen Landy, 1991). A nother approach attem topts cl assi f y the textures w thi n sm l regi ons of a scene (C l i , 1985, 1988; B k, C ark, i al ael ovi l G sl er, 1990; Jai n F ei arrokhni a, 1991; Greenspan et al., 1994). Such an approach di scovers texture boundari es by cl assi f yi ng the textures w thi n each regi on di erentl y. It i can al so cl assi f y l ocal regi ons whose textural properti es vary gradual l y across space, and thus are not separated by a di sti nct boundary. Gurnsey and Laundry (1992) have provi ded psychophysi cal data i n support of the l atter type of processi ng by show ng that hum texture recogni ti on i s onl y sl i ghtl y i m i an - pai red w the boundari es betw di erent textures i n a texture m c are bl urred. hen een ozai A T Xdoes the l atter type of cl assi

21. cati on. It deri ves a 17-di m onal f eature vec- RE ensi tor f romm ti pl e-scal e boundary f eatures of the B S and a surf ace bri ghtness f eature ul C 3

22. of the FC T s f eature vector uti l i zes

23. l ters of f our di erent scal es, as suggested by S. hi psychophysi cal experi m (Harvey G ents ervai s, 1978; R chards, 1979; Wl son B i i ergen, 1979). T spati al

24. l ters are eval uated at f our di erent ori entati ons, thereby l eadi ng to a he 16-di m onal (4 2 4) f eature vector. T 17 di m on i s a surf ace bri ghtness f eature. ensi he th ensi T A T Xm uses these f eature vectors to generate a context-sensi ti ve cl assi

25. cati on he R E odel of l ocal texture properti es. T hese B S and FC operati ons are desi gned to be as si m e C S pl and f ast as possi bl e w thout i ncurri ng a l oss of accuracy i n cl assi f yi ng texture data. i Al arge psychophysi cal l i terature supports the F C DA A Ehypothesi s that the hum an brai n f orm di sti nct boundary and surf ace representati ons bef ore they are bound together s by obj ect recogni ti on categori es. E xperi mental resul ts that support the rol e of boundary representati ons i ncl ude the f ol l ow ng: (1) O ect superi ori ty eects occur usi ng outl i ne i bj sti m i w th l i ttl e surf ace detai l (D do D ul i avi onnel l y, 1990; H a, H om aver, Schw artz, 1976). (2) T num of errors i n tachi stoscopi c recogni ti on and the speed of i denti

26. ca- he ber ti on are of ten com parabl e usi ng appropri atel y and i nappropri atel y col ored obj ects (Mal , i Sm th, D i oherty, Sm th, 1979; O i stergaard D do, 1985). (3) T avi here i s no di erence i nrecogni ti on speed usi ng bl ack-and-w te photographs or l i ne draw ngs that are caref ul l y hi i deri ved f romthem(B ederm Ju, 1988). i an Several types of data al so i m i cate a separate surf ace bri ghtness and col or process. pl T hese i ncl ude the f ol l ow ng: (4) C ored surf aces m be bound to an i ncorrect f ormdur- i ol ay i ng i l l usory conj uncti ons (M cLean, B roadbent, Broadbent, 1983; Stef urak Boynton, 1986; T sm Schmdt, 1982). (5) C or can f aci l i tate obj ect nam ng i f the obj ect- rei an i ol i s to be nam are structural l y si m l ar or degraded (C st, 1975; P ce H phreys, ed i hri ri um 1989). (6) C ors are coded categori cal l y pri or to the processi ng stage at w ch they ol hi are nam (D do, 1991; R ed avi osch, 1975). T o of the m recent studi es i n support w ost of the boundary-surf ace di sti ncti on w carri ed out by E der and Zucker (1998) and ere l R ogers-R achandran and R achandran (1998). am am F C D theory proposes that 3-Dboundary and surf ace f eatures that are f orm AAE ed i n the prestri ate vi sual cortex are categori zed i n the i nf erotemporal cortex (Grossberg, 1994, 1997). B boundary and surf ace properti es are proposed to be com ned duri ng oth bi the categori zati on process w thi n bottom and top-dow adapti ve pathw that are i -up n ays m ed by an A Tsystem T o consequences of thi s concepti on are that unam guous odel R . w bi boundari es can generate category recogni ti on by them ves, and that boundari es can sel pri m 3-Dobj ect representati ons even i f they need to be suppl em e ented by 3-Dsurf ace i nf orm on i n order to achi eve unam guous recogni ti on. C ati bi avanagh (1997) has reported data consi stent w th thi s l atter predi cti on. i In the A T Xi m em R E pl entati on of thi s concept, the f eature vectors that are f orm ed f romthe 17-di m onal boundary and surf ace f eatures of the F C D ensi A A Epreprocessor are i nput to an A Tcl assi

27. er, w ch categori zes the textures usi ng a bi ol ogi cal l y-m vated R hi oti l earni ng al gori thm H ans l earn to di scri m nate textures by l ooki ng at themand be- . um i com ng sensi ti ve to thei r stati sti cal properti es i n sm l regi ons. T s i s howour m i s i al hi odel trai ned. Intui ti vel y speaki ng, m trai ni ng i s l i ke havi ng an observer l ook at a num odel ber 4

28. of l ocati ons and tryi ng to l earn to categori ze thembased on thei r l ocal properti es. T he A T cl assi

29. er w used, cal l ed G R e aussi an A T A , or G M i ncrem R MP A, ental l y constructs i nternal categori es that have G aussi an recepti ve

30. el ds i n the i nput space, and that m ap to output cl ass predi cti ons (Wl l i am 1996, 1997). C l s w th G i son, el i aussi an recepti ve

31. el ds are ubi qui tous i n the brai n, and have been used to m data about howthe i nf erotem odel - poral cortex l earns to categori ze vi sual i nput patterns (Logotheti s et al., 1994). Such m s are not, how odel ever, typi cal l y abl e to sel f -organi ze thei r ow recogni ti on categori es n and to autonom y search f or new ones w th w ch to cl assi f y novel i nput patterns. ousl i hi A Tm s overcom thi s w R odel e eakness by show ng howcom em i pl entary attenti onal and ori - enti ng system are desi gned w th w ch to bal ance betw the processi ng of f am l i ar and s i hi een i expected events, on the one hand, and unf am l i ar and unexpected events on the other i (C arpenter G rossberg, 1991; G rossberg, 1980; G rossberg M l l , 1996). A l l earned erri l categori zati on goes on w thi n the attenti onal system T ori enti ng subsystemi s acti - i . he vated i n response to events that are too novel f or the attenti onal systemto successf ul l y categori ze them Interacti ons betw the attenti onal and ori enti ng subsystem then l ead . een s to a m ory search w ch di scovers a m appropri ate popul ati on of cel l s w th w ch em hi ore i hi to categori ze the novel i nf orm on. T ati hese i nteracti ons are desi gned to expl ai n howthe brai n conti nues to l earn qui ckl y about huge am ounts of newi nf orm on throughout l i f e, ati w thout bei ng f orced to j ust as qui ckl y f orget usef ul i nf orm on that i t has previ ousl y i ati l earned. A ter each i nput i s presented (i . e. , each l ocati on i s observed), G Mautom cal l y f A ati acti vates cel l s w recepti ve

32. el ds adapt to represent the i nput by am hose ounts proporti onal to thei r l evel of match w th the i nput. H ever, i f the i nput i s too novel f or any exi sti ng i ow recepti ve

33. el d to m the i nput w l enough, then a m ory search i s tri ggered w ch atch el em hi l eads to the sel ecti on of a previ ousl y uncom i tted cel l popul ati on w th w ch a newcate- m i hi gory can be l earned. D ng unsupervi sed l earni ng, the correct nam of the regi ons that uri es are bei ng cl assi

34. ed are not suppl i ed, and the l evel of m that i s requi red f or a category atch to l earn i s constant. T param he eter that determ nes thi s degree of m i atch i s cal l ed the vi gi l ance param because i t com eter putati onal l y real i zes the i ntui ti ve process of bei ng m or l ess vi gi l ant i n respose to i nf orm onof vari abl e i m ore ati portance (C arpenter G rossberg, 1991). Lowvi gi l ance al l ow the netw to l earn general categori es i n w ch m s ork hi any i nput exem ars m share the sam category prototype. H gh vi gi l ance enabl es the net- pl ay e i w to l earn m speci

35. c categori es, even categori es i n w chonl y a si ngl e exem ar m ork ore hi pl ay be represented. T the choi ce of vi gi l ance can trade betw prototype and exem ar hus een pl l earni ng, even w thi n a si ngl e A Tsystem E i R . xperi m ental evi dence consi stent w th vi gi - i l ance control has been reported i n m onkeys w they attem to perf ormcl assi

36. cati ons hen pt duri ng easy vs. di cul t di scri m nati ons (Spi tzer, D m M i esi one, oran, 1988). Learni ng typi cal l y starts w th a l ow vi gi l ance val ue, w ch l eads to the f orm on i hi ati of the m general categori es that are consi stent w th the i nput data. B ost i ecause A T R m s are sel f -organi zi ng, suchl earni ng can proceed on i ts ow n an unsupervi sedm odel ni ode. Starti ng w th a l owvi gi l ance val ue conserves m ory resources, but i t can al so create the i em tendency, al so f ound i n chi l dren, to overgeneral i ze unti l f urther l earni ng l eads to category 5

37. re

38. nem (C an, et al., 1986; C ark, 1973; Sm th et al., 1985; Sm th K l er, 1978; ent hapm l i i em W 1983). F exam e, i t m ght happen that, af ter l earni ng a category that cl assi

39. es ard, or pl i vari ati ons on the l etter E the l etter F w l l al so acti vate that category, based on the , i vi sual si m l ari ty betw the tw types of l etters. T di erence betw the l etters E i een o he een and F i s determ ned by cul tural f actors, not by vi sual si m l ari ty. Supervi sed l earni ng i i i s of ten essenti al to prevent errors based on i nput si m l ari ty w ch do not correspond to i hi cul tural understandi ngs, or other envi ronm ental l y dependent f actors. A Tm s can R odel operate i n both unsupervi sed and supervi sed l earni ng modes, and can sw tch betw the i een tw seam essl y duri ng the course of l earni ng. o l D ng supervi sed l earni ng, the vi gi l ance param uri eter, or requi red m l evel , i s rai sed atch i f an i ncorrect predi cti on i s m (e. g. , i f there i s negati ve rei nf orcem ade ent) by j ust e- nough to tri gger a m ory search f or a new category. T s type of vi gi l ance control em hi sacri

40. ces category general i ty onl y w m speci

41. c categori es are needed to m the hen ore atch stati sti cal properti es of a gi ven envi ronm C ent. ategori es of vari abl e general i ty are hereby autom cal l y l earned based upon the success or f ai l ure of previ ousl y l earned categori es ati i n predi cti ng the correct cl assi

42. cati on. Abl ock di agramof the A T Xarchi tecture i s RE show i n Fi gure 1. n 2 u pl e-scal e Ori en lti ted Fi l ter T A T Xm ti pl e-scal e ori ented

43. l ter f urther devel ops the B S

44. l ter that w i ntro- he R E ul C as duced to expl ai n texture data i n Grossberg and Mngol l a (1985). Vari ants of thi s B S i C

45. l ter have si nce becom standard i n m texture segm e any entati on al gori thm (M i k s al Perona, 1989; Sutter, B eck, G raham 1989; B k et al., 1990; B , ovi ergen, 1991; B ergen Landy, 1991; Jai n F arrokhni a, 1991; Graham B , eck, Sutter, 1992; G reenspan et al., 1994). Fi gure 2 di agram the A T X versi on of B S processi ng (Stages 1{5) f or a si ngl e s RE C spati al scal e. A i n R chards (1979), w used 4 spati al f requency channel s. E chan- s i e ach nel com puted 4 ori entati onal contrast f eatures. T hese

46. l ter equati ons and param eters are descri bed i n A ppendi x I. Af uncti onal descri pti on i s gi ven here. Stage 1 of the B S C

47. l ter uses an on-center o-surround netw w ork hose cel l s obey m brane equati ons, or em shunti ng l aw (G s rossberg, 1980, 1983) to di scount the i l l um nant, com contrast ra- i pute ti os of the i m age, and norm i ze i m i ntensi ti es. Stage 2 accom i shes m ti pl e-scal e al age pl ul ori ented

48. l teri ng usi ng odd-sym etri c G m abor

49. l ters at the 4 ori entati ons and spati al s- cal es. Stage 3 com putes a l ocal m easure of absol ute ori entati onal contrast by f ul l -w ave recti f yi ng the

50. l ter acti vi ti es f romStage 2. T hese operati ons are neural l y i nterpreted as f ol l ow Stage 1 operati ons occur i n the reti na and LG , Stage 2 operati ons at corti - s: N cal si m e cel l s, and Stage 3 operati ons at corti cal com ex cel l s (G pl pl rossberg Mngol l a, i 1985). Stage 4 si m i

51. es the B S operati ons of boundary groupi ng by com ng a s- pl C puti m ooth, rel i abl e measure of ori entati onal contrast that spati al l y pool s responses w thi n the i sam ori entati on. Stage 5 perf orm an opti onal ori entati onal i nvari ance operati on w ch e s hi 6

52. ARTEX System Output Prediction of Region Type Gaussian ARTMAP Classifier Multiple Scale BCS Single Scale FCS Orientational Contrast Surface Brightness Features Feature Discount Illuminant Input Image Figre 1 Blo d g m o A u : ck iara f R TEX ime cla tio b stem a ssi

53. ca nsusy s. g 7

54. shi fts ori entati onal responses at each scal e i nto a canoni cal orderi ng. T s com hi putati on shi fts, w th w around, the sm i rap oothed ori entati onal responses fromStage 4 so that the ori entati on w th m m am i tude i s i n the

55. rst ori entati on pl ane. T useful ness of i axi al pl he thi s operati on i s task-dependent, as show by our si m ati ons bel ow n ul . G rahamet al . (1992) al so si m i

56. ed Stage 4 of the B S by pool i ng responses from pl C Stage 3. T then used a hand-crafted si gm dal di scri mnati on m hey oi i easure to convert Stage 4 output i nto a probabi l i sti c output functi onthat coul d be comparedw thsubjects' i rati ngs of texture di scri mnabil i ty. In the present benchm studi es, the B S

57. l ter i ark C outputs form part of the i nput vector to a GMcl assi

58. er w ch autonom y l earns s A hi ousl the probabi l i sti c recogni ti on categori es w th w ch texture di scri mnati ons are m W i hi i ade. e note i nSecti on3 howthe G raham al . (1992) study has beenextendedto expl ai na l arger et data base about texture di scri mnati on usi ng addi ti onal F C D i A A Etheory m echani sm s. 3 Fi l l e d - i n Su r fa c e r i ght n e s s T F C D odel suggests howthe B S and F S i nteract to generate

59. l l ed-i n 3-D he A A Em C C surface representati ons w thi n the F S. T surface representati ons are deri ved from i C hese sceni c data after the i l l umnant has been di scounted, as i nStage 1 of F gure 2. In general , i i these surface representati ons com ne i nform on about bri ghtness, col or, depth, and bi ati form O si m ati ons bel owdem . ur ul onstrate the uti l i ty of usi ng a

60. l l ed-i nsurface bri ghtness feature to hel p l earn recogni ti on categori es for texture di scri mnati on. i T si m est surface feature i s one that i s based on

61. rst-order di erences i n i l l um- he pl i nati on i ntensi ty. A i m n proved surface feature di scounts the i l l umnant to com a i pute m easure of l ocal contrast. Sucha feature, how ever, can sti l l be corrupted by vari ous sorts of specul ar noi se i nani m Inthe brai n, suchnoi se canbe due to the bl i ndspot, reti nal age. vei ns, and the reti nal l ayers through w ch l i ght m pass to acti vate photodetectors. hi ust In arti

62. ci al sensors, too, such noi se can deri ve fromsensor characteri sti cs. D scounti ng i the i l l umnant i s al so i nsensi ti ve to contextual groupi ngs of i m features. A

63. l l ed-i n i age surface bri ghtness feature overcom these de

64. ci enci es by sm ng l ocal contrast val - es oothi ues w they bel ong to the sam regi on, w l e m ntai ni ng contrast di erences w hen e hi ai hen they bel ong to di erent regi ons. F l l i ng-i n hereby sm i oothes over i m noi se i n a form age - sensi ti ve w and generates a representati onthat re ects properti es of a regi on' s formby ay, bei ng contai ned w thi n the regi on boundari es. It al so tends to m mze the separabi l i ty, i axi i i n bri ghtness space, of di erent regi on types by mni mzi ng w thi n-regi on vari ance w l e i i i hi m mzi ng betw axi i een-regi on vari ance. T s sort of preattenti ve and autom c separati on hi ati si m i

65. es the task of an attenti ve pattern cl assi

66. er such as GM pl A. In Grossberg et al . (1995), a m ti pl e-scal e F C D ul A A Enetw w devel oped to pro- ork as cess noi sy SA i m for use by hum operators. T R ages an here the goal w to generate as reconstructi ons of SA m that w pl easi ng to the eyes of expert photoi nterpreter- Ri ages ere s. T B S i n thi s si m ati on used a groupi ng netw w th a feedback process that he C ul ork i 8

67. Gaussian ARTMAP OV OI Texture Processing Boundary Processing 5: Orientational 8: Sum Across Invariance Orientations 4: Spatial 7: Half−Wave Pooling Rectification 3: Full−Wave 6: Center−surround Rectification Processing 2: Orientational Filtering Surface Processing 1: Center−surround 9: Boundary−Gated Processing Diffusion Input Image Fi g u r e 2 : Bo unda r y a nd s ur f a c e pr e pr o c e s s i ng s t a g e s . OV = o r i e nt a t i o n a l l y va r i a n t OI = o r i e n t a t i o na l l y i nva r i a nt r e pr e s e nt a t i o n. Ei t he r OV o r OI , bu t n o t b o t h , a r e g i v e n p r o b l e m. 9

68. can com ete and sharpen boundary representati ons. T pl hese boundary groupi ngs created sharpl y del i neated i m regi ons and

69. l l ed-i n surf aces. A though such a f eedback group- age l i ng netw has the rem ork arkabl e property of convergi ng w thi n 1 to 3 f eedback i terati ons, i i t sti l l has the di sadvantage, at l east i n sof tw si m ati ons, of sl ow ng dow processi ng are ul i n ti me. H w repl ace the f ul l B S

70. l ter and groupi ng netw by a m ti pl e-scal e B S ere e C ork ul C

71. l ter and a si ngl e scal e of one-pass f eedf orw boundary processi ng to control

72. l l i ng-i n ard of the bri ghtness f eature. C puter si m ati ons sum ari zed bel owdem om ul m onstrate that thi s si m i

73. cati on does not i m r cl assi

74. cati on benchm on B pl pai arks rodatz textures and on SA R textured scenes. T si m i

75. ed boundary segm he pl entati on i s, moreover, computati onal l y 75 ti m f aster than the f eedback netw es ork. T sl ow f eedback benchm are not reported he er arks here. A ccurate texture cl assi

76. cati on thus does not seemto depend upon photoreal i smof the correspondi ng percept. Stages 6{9 of Fi gure 2 showhowthe B S

77. l ter output i s used C to deri ve the one-pass boundary segm entati on. A ppendi x II contai ns the equati ons and param eters of thi s si m i

78. ed bri ghtness

79. l l i ng-i n process. pl These F C D A A Epreprocessi ng resul ts can be pl aced i nto a l arger f ram ork to better ew understand thei r rel evance f or understandi ng hum texture di scri m nati on. T an i hree i ssues need to be consi dered: (1) the use of a si m i

80. edStage 4 spati al pool i ng operati on i nstead pl of l ong-range groupi ng by a f eedback netw ork; (2) the rol e of surf ace representati ons; and (3) the need f or 3-Dboundary and surf ace representati ons. W are l ong-range hen groupi ngs, such as i l l usory contours, not needed to i m prove texture di scri m nabi l i ty? i T s i s m true w the i m contai n dense enough textures to obvi ate the need f or hi ore hen ages groupi ng over l ong di stances. N al l of the data consi dered even by G ot rahamet al . (1992) w of thi s type, how ere ever, si nce thei r di spl ays contai ned regul arl y pl aced f eatures that coul d group together i n ori entati ons col i near, perpendi cul ar, or obl i que to thei r de

81. ni ng edges. C ruthi rds et al . (1993) show that a m ti pl e-scal e B S

82. l ter, suppl em ed ul C ented by the l ong-range groupi ngs of a f eedback netw ork, coul d si m ate the pai rw se orderi ng of ul i hum rati ngs of texture di scri m nabi l i ty better than the G an i rahamet al . (1992) vari ant of the B S

83. l ter on i ts ow C n. Grossberg and P essoa (1997) have si m ated a vari ant of F C D theory i n w ch ul AAE hi both 2-Dand 3-Dboundary and surf ace operati ons w needed to si m ate psychophys- ere ul i cal data about the di scri m nati on of textured regi ons com i posed of regul ar arrays of equi l um nant col ored regi ons on backgrounds of vari abl e l um nance, as i n the experi - i i m of B (1994) and P ents eck essoa, Beck, Mngol l a (1996). T s l atter si m ati on study i hi ul w restri cted, how as ever, to textures com posed of col ored squares on achrom c back- ati grounds, rather than the stochasti c f actors that ari se i n Brodatz and SA Rtextures. T he G rossberg and P essoa (1997) study al so does not anal yze howrecogni ti on categori es f or di scri mnati ng textures are l earned. T i aken together, how ever, these several studi es provi de convergi ng evi dence that F C D m A A E echani sm can expl ai n chal l engi ng properti es s of data concerni ng hum texture segregati on. an 10

84. 4 euri s t i cs T 16-di m onal f eature vector produced by Stages 1{5 (representi ng ori entati onal he ensi contrast at 4 ori entati ons and 4 spati al scal es) and the si ngl e

85. l l ed-i n bri ghtness f eature produced by Stages 6{9 yi el d a 17-di m onal boundary-surf ace f eature vector. G M ensi A m l earna m ng f romthe i nput space popul atedby these f eature vectors to a di screte ust appi output space of associ ated regi on cl ass l abel s. A noted above, G Mshares a num of s A ber key properti es w th other A T A i R MParchi tectures (C arpenter, G rossberg, and R eynol ds, 1991; C arpenter et al . , 1992). G Ml earns m ngs i ncrem A appi ental l y, w thout any pri or i know edge of the probl emdom n, by sel f -organi zi ng an e ent set of recogni ti on cate- l ai ci gori es that shape them ves to the stati sti cs of the i nput envi ronm as w l as a m sel ent, el ap f romrecogni ti on categori es to cl ass l abel s, w ch are suppl i ed duri ng supervi sed l earni ng. hi B ecause G Ml earns i ts m ngs i ncrem A appi ental l y, a previ ousl y trai ned G Mnetw m A ork ay be retrai ned w th new i nput/output conti ngenci es, i ncl udi ng new cl ass l abel s, w thout i i any need to retrai n the netw on the previ ous data. Fi nal l y, al though G Mi s trai ned ork A onl y w th i ndi vi dual cl ass l abel s, i t al so l earns to accuratel y esti m the probabi l i ti es of i ate i ts cl ass l abel predi cti ons, as w showi n our si m ati ons bel ow e ul . In a typi cal A T netw (C R ork arpenter G rossberg, 1987, 1991), an i nput vector acti vates f eature sel ecti ve cel l s w thi n the attenti onal systemthat store the vector i n i short-termm ory. T s short-termm ory pattern then acti vates bottom pathw em hi em -up ays w hose si gnal s are

86. l tered by l earned adapti ve w ghts, or l ong-termm ory traces. T ei em he

87. l tered si gnal s are added up at target category nodes w ch com vi a recurrent l ateral hi pete i nhi bi ti onto determ ne w chcategory acti vi ti es w l l be stored i n short-termm ory and i hi i em thereby represent the i nput vector. T degree of acti vati on of a category provi des an he esti m of the l i kel i hood that an i nput bel ongs to the category. A vati ng a category i s ate cti l i ke m ng a hypothesi s. aki A they are bei ng acti vated, the sel ected categori es read-out l earnedtop-dow s nexpecta- ti ons, or prototypes, w ch are m hi atched agai nst the i nput vector at the f eature detectors. T sm hi atchi ng process pl ays the rol e of testi ng the hypothesi s. T vi gi l ance param he eter de

88. nes the cri teri on f or a good enough match. A noted above, l owvi gi l ance l eads to the s l earni ng of general categori es, w hereas hi gh vi gi l ance l eads to the l earni ng of speci al i zed categori es, evena si ngl e exem ar, i nthe l i m t of very hi ghvi gi l ance. B varyi ng vi gi l ance, pl i y an A Tsystemcan hereby l earn both abstract prototypes and concrete exem ars. R pl If the chosen category' s m atch f uncti on exceeds the vi gi l ance param eter, then the bottom and top-dow exchange of f eedback si gnal s l ocks the systemi nto a resonant -up n state. T resonant state si gni

89. es that the hypothesi s m he atches the data w l enough to be el accepted by the system A Tproposes that these resonant states f ocus attenti on upon . R rel evant f eature com nati ons, and that onl y resonant states enter consci ous aw bi areness (Grossberg, 1980). R esonance tri ggers l earni ng i n both the bottom adapti ve w ghts -up ei that are used to acti vate the sel ected recogni ti on category, and i n the top-dow w ghts n ei that represent i ts prototype. T s l earni ng i ncorporates the newi nf orm on suppl i ed by hi ati 11

90. the i nput vector i nto the l ong-termm ory of the attenti onal system em . If the category' s m atch f uncti on does not exceed vi gi l ance, thi s desi gnates that the hypothesi s i s too novel to be i ncorporated i nto the prototype of the acti ve category. A bout of m ory search, or hypothesi s testi ng, i s then tri ggered through acti vati on of the em ori enti ng system M ory search ei ther di scovers a category that can better represent . em the data or, i f no such l earned category al ready exi sts, autom cal l y chooses uncom i t- ati m ted cel l s w th w ch to l earn a new category. A Thereby i ncrem i hi R ental l y di scovers new categori es w hose degree of general i zati on vari es i nversel y w th the si ze of the vi gi l ance i param eter. N eurobi ol ogi cal data about recogni ti on l earni ng i n i nf erotem poral cortex that are consi stent w th these hypotheses are revi ew i edby C arpenter and G rossberg (1993) and Grossberg and M l l (1996). erri A l of the above properti es proceed autonom y i n A Tnetw l ousl R orks as they undergo unsupervi sed l earni ng. A T A R MPextends these A Tdesi gns to i ncl ude both supervi sed R and unsupervi sed l earni ng (C arpenter, G rossberg, R eynol ds, 1991; C arpenter et al . , 1992). In A T A , the chosen A Tcategori es l earn to m predi cti ons w ch take the R MP R ake hi f ormof m ngs to the nam of output cl asses. In such an A T A system m appi es R MP , any di erent recogni ti on categori es can al l l earn to m i nto the sam output nam m as ap e e, uch m di erent vi sual f onts of a gi ven l etter of the al phabet can be grouped i nto several any di erent vi sual recogni ti on categori es, based upon vi sual si m l ari ty, bef ore these vi sual i categori es are m apped i nto the sam audi tory category that i s used to nam that l etter. e e AT A R MPsystem propose howto correct a predi cti on, as i n the case w s here the l etter E i s di scon

91. rm by envi ronm ed ental f eedback that the correct l etter i s F, usi ng onl y l ocal operati ons i n envi ronm that m be

92. l l ed w th unexpected events. A T A ents ay i R MP does thi s usi ng a m ni m l earni ng pri nci pl e, w ch conj oi ntl y m m zes predi cti ve gen- i ax hi axi i eral i zati on w l e i t m ni m zes predi cti ve error. A T A hi i i R MPdoes thi s by tryi ng to f ormthe l argest categori es that are consi stent w th envi ronm i ental f eedback. Am ch t racki ng pro- at cess real i zes thi s pri nci pl e by i ncreasi ng the vi gi l ance val ue af ter eachdi scon

93. rm onunti l ati i t exceeds the chosen category' s m f uncti on. T s vi gi l ance i ncrease i s the m ni m atch hi i al one that can tri gger newhypothesi s testi ng on that l earni ng tri al . M tracki ng hereby atch gi ves up the m ni m ount of general i zati on that i s requi red to correct the error. In i umam sum ary, an A T A m R MPsystemorgani zes i ts categori zati on of experi ence based both on the si m l ari ty of the i nput f eature vectors and upon f eedback f romthe envi ronm i ental response, w hether cul tural l y or otherw se determ ned, to the nam or other behavi ors i i es that i ts categori es predi ct. 5 aus s i an P Gaussi an A T(Wl l i am 1996, 1997) provi des a m f or an A Tsystemto l earn R i son, eans R the stati sti cs of an i nput envi ronm E of i ts categori es de

94. nes a G ent. ach aussi an di stri buti on i n the i nput space, w th a m and vari ance i n each i nput di m on, as w l as an i ean ensi el 12

95. overal l a pri ori probabi l i ty. T G he aussi an A Tbottom acti vati on f uncti on eval uates R -up the probabi l i ty that the i nput bel ongs to a category, gi ven i ts Gaussi an di stri buti on and a pri ori probabi l i ty. The m f uncti on eval uates howw l the i nput

96. ts the category' s atch el di stri buti on, w ch i s norm i zed to a uni t hei ght. T s m i s a m hi al hi atch easure of the di s- tance, i n uni ts of standard devi ati on, betw the i nput vector and the category' s m een ean. V gi l ance speci

97. es the m m l ow e si ze of thi s di stance. i axi umal abl G aussi an A T al so uses di stri buted l earni ng, i n w ch m ti pl e categori es can al l R hi ul cooperate to cl assi f y an i nput event. G aussi an A Thereby avoi ds the probl em i ncurred R s by grandm other cel l m s of recogni ti on. E such category i s assi gned credi t based odel ach on i ts proporti on of the net acti vati on, w ch i s determ ned by al l categori es w hi i hose m atch f uncti ons sati sf y the vi gi l ance cri teri on. E category then l earns by an am that ach ount i s determ ned by i ts credi t. W G i hen aussi an A T i s extended to G R aussi an A T A R MP to enabl e i t to bene

98. t f romboth supervi sed and unsupervi sed l earni ng, each category' s credi t i s determ ned by i ts proporti on of the net acti vati on of i ts ensem e, w ch consi sts i bl hi of al l categori es that m to the sam output predi cti on. T norm i zed strength of ap e he al each ensem e' s predi cti on i s a probabi l i ty esti m f or that predi cti on. T equati ons bl ate he and param eters f or G aussi an A T A R MPare f ound i n A ppendi x III. 6 Some l t ernat i ve ext ure l as s i ers 6.1 Cm no F re E opariso f eatu xtractio eth s nM od In order to eval uate the prom se of any vi si on system parti cul arl y one that attem to i , pts expl ai n such a com ex com pl petence as textured scene cl assi

99. cati on, one needs to eval uate that i t real l y works. T s i s parti cul arl y the case w the key behavi oral properti es hi hen em due to i nteracti ons across the enti re system T i s thus no substi tute f or runni ng erge . here such a systemon benchm arks on w ch com ng system have al so been eval uated. hi peti s O benchm com sons, presented i n Secti on 7, eval uate A T Xunder condi ti ons ur ark pari RE that are as si m l ar as possi bl e to those under w ch these com ng system have been i hi peti s eval uated. A T Xperf orm i s

100. rst com RE ance pared to that of a systemthat w used to cl assi f y as natural textures i n G reenspan et al . (1994) and G reenspan (1996). W cal l thei r m e odel the H d Systembecause i t i s a hybri d archi tecture that used a l og-G ybri abor G aussi an pyram d f or f eature extracti on f ol l ow by one of three al ternati ve cl assi

101. ers. A though i ed l the H d Systemw not devel oped to expl ai n bi ol ogi cal data, i t has the vi rtue of ybri as havi ng been devel oped to the poi nt that i t coul d be successf ul l y tested on benchm ark data bases that use textures or textured scenes as thei r i nputs. M other bi ol ogi cal l y ost deri ved m s have not yet reached thi s l evel of devel opm odel ent. T H d System s l og-G he ybri ' abor pyramd uses three l evel s, or spati al scal es, and f our i ori entati ons at each scal e. E l evel , af ter the

102. rst one, of the G ach aussi an pyramd i s i 13

103. obtai ned by bl urri ng the previ ous l ow l evel (i . e. , sm l er spati al scal e) w th a G er al i aussi an kernel (w th standard devi ati on = 1) and then deci m ng the i m (i . e. , rem ng i ati age ovi 3 out of 4 pi xel s i n each 2x2 pi xel bl ock). D to deci m on, the G ue ati aussi an at each successi ve l evel eecti vel y has tw ce the of the G i aussi an used i n the previ ous l evel . The

104. nal outputs of al l three pyram d l evel s of the H d Systemhave the sam net am i ybri e ount of bl urri ng, produced by three successi ve bl ur/deci m steps. T s am of bl urri ng ate hi ount i s equi val ent to convol vi ng w th a si ngl e G i aussi an kernel w th = i 21 = 12 + 22 +4 2 , w ch produces an 8x8 pi xel resol uti on. T i s, each patch of 8 2 8 pi xel s i n the i nput hi hat i m yi el ds a si ngl e pi xel i n an output i m f or each ori ented contrast f eature. In age age G reenspan (1996), cl assi

105. cati on resul ts at 16 2 16, 32 2 32, and 64 2 64 resol uti on w ere al so reported. Wthout f urther preprocessi ng, A T Xproduces f eature i m at si ngl e pi xel reso- i RE ages l uti on. T m a f ai r com son w th the resul ts reported by G o ake pari i reenspan et al . (1994) and G reenspan (1996), A T Xf eature i m need to be reduced, vi a bl urri ng and dec- RE ages i m on, to the sam resol uti on used there. F exam e, to change the A T Xf eatures ati e or pl RE to 8 2 8 resol uti on, the sm l er-scal e A T Xf eatures requi re addi ti onal bl urri ng pri or to al RE deci m on so that thei r net am of bl urri ng i s equi val ent to convol vi ng w th a si ngl e ati ount i G aussi an kernel w th = i 21. T net am of bl urri ng i s a cruci al consi derati onf or the tw types of tasks onw ch he ount o hi the system are com s pared. T

106. rst task i s cl assi

107. cati on of a l i brary of texture i m he ages. B ecause thi s task does not i ncl ude transi ti ons betw di erent textures, perf orm een ance m onotoni cal l y i mproves as bl urri ng i s i ncreased, si nce bl urri ng reduces vari ance and thus im proves the si gnal -to-noi se rati o. T second task i s cl assi

108. cati on of a texture m c. he osai H ere, texture transi ti ons need to be accuratel y resol ved, so perf orm degrades w th ance i over-bl urri ng. W dem e onstrate both of these phenom bel ow ena . 6.2 Cm no C opariso f lassi

109. catio eth s nM od In the H d System s

110. rst cl assi

111. cati on schem the extracted f eatures are cl ustered ybri ' e, i ndependentl y i n each f eature di m on usi ng the K eans procedure. M ngs f rom ensi -m appi these cl usters to cl ass l abel s are then f orm usi ng a batch l earni ng, rul e-based al gori thm ed cal l ed IT U R LE(G an, et al . , 1992). T cl usters i n thi s schem are f orm to di s- oodm he e ed creti ze the i nput, so that IT U can f ormexpl i ci t rul es m ng themto the output R LE appi cl asses. IT U f orm a l arge num of rul es. T exact num i s never stated i n R LE s ber he ber G reenspan (1996). O the l arge probl em how n s, ever, a m m axi umof 10, 000 i s al l ow and ed, as m as 430 rul es per cl ass are reported f or di scri m nati ng onl y tw textures. A any i o nother draw back of thi s approach i s that unsupervi sed di screti zati on vi a K eans cl usteri ng -m throw aw potenti al l y i m s ay portant i nf orm on because the cl usters m span di scri m - ati ay i nati on boundari es i n the i nput space. Fi nal l y, G Menj oys a m or practi cal advantage A aj i n that i t uses a si m e i ncrem pl ental l earni ng procedure as opposed to the com ex and pl com putati onal l y expensi ve batch l earni ng procedure used by IT U . R LE 14

112. T tw al ternati ve cl assi

113. ers used i n G he o reenspan (1996) are standard i ncrem ental l earni ng schem the K es: -nearest nei ghbor (K N cl assi

114. er and the m ti l ayer percep- -N ) ul tron (M ), backpropagati on al gori thm T LP . hese tw approaches have com em o pl entary advantages and aw K N l earns qui ckl y (one trai ni ng epoch) but achi eves no data s. -N com pressi on. M , on the other hand, achi eves better data com LP pressi on but l earns very sl ow y (500 sl ow earni ng trai ni ng epochs i n G l -l reenspan, 1996). A addi ti onal draw n back of M i s that i t uses a f ormof m sm l earni ng that m suer f romcatastrophi c LP i atch ay f orgetti ng i f trai ned on new data w th di erent conti ngenci es f romprevi ous data. A i s dem onstrated by our resul ts bel ow G Mcom nes the good properti es of the above three , A bi cl assi

115. ers: l i ke IT U , G Mpredi cts the posteri or probabi l i ti es of the output cl asses; R LE A l i ke K N G Ml earns l ocal m ngs qui ckl y; l i ke M , G Machi eves si gni

116. cant data -N , A appi LP A com pressi on. A though G Muse a m l ocal representati on than M , and thus coul d, l A ore LP i n pri nci pl e, requi re m m ory, G Mcom ore em A pensates f or thi s by constructi vel y f orm ng i a representati on of appropri ate si ze f or w hatever probl emi t i s trai ned on. ext ure l as s i cat i on es ul t s 7.1 1 ex re L rary 0-T tu ib A T Xw

117. rst com R E as pared to the H d Systemon the l i brary of ten textures show i n ybri n Fi gure 3A w , hose top rowcontai ns structured textures and w hose bottomrowcontai ns unstructured textures. E texture i m consi sts of 128 2 128 pi xel s. T ach age hree other im ages of each texture are not show In G n. reenspan (1996), cl assi

118. cati on resul ts of the H d Systemusi ng IT U , K N and M cl assi

119. ers w publ i shed f or thi s ybri R LE -N , LP ere database. T cl assi

120. ers w trai ned on data at three di erent l evel s of spati al resol uti on, he ere w th a di erent num of trai ni ng sam es per cl ass at each resol uti on: 300 sam es at 8 i ber pl pl 2 8 resol uti on, 125 sam es at 16 2 16 resol uti on, and 40 sam es at 32 2 32 resol uti on. pl pl A T Xw trai ned on the sam data set under the sam condi ti ons. Li ke the H d R E as e e ybri System A T Xused an ori entati onal l y vari ant, or O , representati on on thi s probl em , RE V si nce general i zati on to novel ori entati ons of the sam texture duri ng testi ng w not e as requi red. A T Xw eval uated w th

121. ve randomorderi ngs of the data, and the resul ts R E as i w averaged. ere T e 1 show com abl s parati ve resul ts f or the H d Systemand A T Xat the three ybri RE spati al resol uti ons. T e 1 l i sts the cl assi

122. cati on rate, num of epochs, and num abl ber ber of categori es (or hi dden uni ts, stored exem ars, etc. ) f or each systemcon

123. gurati on. pl T num of epochs i ndi cates howm trai ni ng tri al s w needed. T num of he ber any ere he ber categori es i ndi cate howw l the m com el odel presses the data. In the case of K N there -N , i s no compressi on, so each i nput or exem ar f orm a di erent category. T num pl s he ber of w ghts i ndi cate the m ory resources, or com ei em putati onal com exi ty, that i s needed pl to achi eve thi s degree of com pressi on. T goal i s to m ni m ze the num of epochs, he i i ber categori es, and w ghts. 60 hi dden uni ts are l i sted f or M because the average M ei LP LP 15

124. Fi g u r e 3 : ( Ne xt p a g e ) . ) 1 0 - t e x t ur e da t a ba s e o f t e x t ur e s c o r r e s p o n d i n g t o Fi g u r e 2 e t a l . ( 1 9 9 4 ) . To p r o w c o ns i s t s o f s t r uc t ur e d t e x t ur e s , a nd b o t t o m r o w o f u n s t r u c t Te x t ur e s f r o mBr o da t z a l buma r e l a b e l e d wi t h pl a t e numb e r . To p r o w ( l e f t t o r i g h t ) : h e r r i n g b o ne we a ve ( D1 7 ) , f r e nc h c a nva s ( D2 1 ) , c o t t o n c a nva s ( D7 7 ) , j e a n s . Bo t t o m r i g ht ) : g r a s s ( D9 ) , pr e s s e d c o r k ( D4 ) , ha ndma de pa p e r ( D5 7 ) , pi g s k i n ( D9 2 ) , a nd wo 4 2 - t e x t ur e d a t a b a s e f r o m Br o da t z a l bum. RO 1 : r e pt i l e s k i n ( D3 ) , c o r k ( D4 ) , wi r e ( D9 ) , b a r k ( D1 2 ) , s t r a w ( D1 5 ) . RO 2 : he r r i ng b o ne ( D1 7 ) , wo o l ( D1 9 ) , f r e nc h c a nva s ( D2 4 ) , s a nd ( D2 9 ) , wa t e r ( D3 8 ) . RO 3 : s t r a w ma t t i ng ( D5 5 ) , ha ndma de pa p e r ( D5 7 ( D6 8 ) , c o t t o n c a nva s ( D7 7 ) , r a a l o o p e d ( D8 4 ) , pi g s k i n ( D9 2 ) . RO 4 : f u r ( D9 3 ) , s k i n ( D1 0 ) , h o me s pun wo o l ( D1 1 ) , r a a we a v e ( D1 8 ) , c e r a mi c br i c k ( D2 6 ) , ne t t i ng ( D 5 : l i z a r d s k i n ( D3 6 ) , s t r a w s c r e e ni ng ( D4 9 ) , r a a wo ve n ( D5 0 ) , o r i e nt a l c l o t h ( c l o t h ( D5 3 ) , o r i e n t a l r a t t a n ( D6 5 ) . RO 6 : pl a s t i c p e l l e t s ( D6 6 ) , o r i e nt a l g r a o r i e n t a l c l o t h ( D7 8 ) , o r i e nt a l c l o t h ( D8 0 ) , o r i e nt a l c l o t h ( D8 2 ) , wo ve n ma t t i n g s t r a w ma t t i ng ( D8 5 ) , s e a f a n ( D8 7 ) , br i c k ( D9 5 ) , bur l a p ( D1 0 3 ) , c he e s e c l o t h ( D1 0 5 ( D1 1 0 ) . 16

125. A) B) 17

126. 1 -T tu P b 0 ex re ro lem Con

127. gurati on C ass. R l ate Sam es/C ass pl l Epochs Categori es W ghts ei 8 2 8R lu n eso tio : H d System IT U ybri , R LE 94. 3 300 Batch | | H d System M ybri , LP 94. 5 300 500 60 1, 500 H d System K N ybri , -N 87. 0 300 1 3, 000 48, 000 A T X al l f eatures RE , 95. 8 300 1 26. 6 958 A T X al l f eatures RE , 96. 3 300 5 34. 0 1, 224 A T X no l arge-scal e f eatures RE , 97. 1 300 5 41. 0 1, 148 A T X no bri ghtness f eature RE , 95. 6 300 5 38. 4 1, 306 A T X no l arge-scal e or RE , 95. 7 300 5 47. 2 1, 227 bri ghtness f eatures 1 2 1 eso tio : 6 6R lu n H d System IT U ybri , R LE 95. 0 125 Batch | | H d System M ybri , LP 96. 0 125 500 60 1, 500 H d System K N ybri , -N 93. 0 125 1 1, 250 20, 000 A T X al l f eatures RE , 97. 2 125 1 17. 4 626 32 2 3 eso tio : 2R lu n H d System IT U ybri , R LE 97. 8 40 Batch | | H d System M ybri , LP 100. 0 40 500 60 1, 500 H d System K N ybri , -N 99. 0 40 1 400 6, 400 A T X al l f eatures RE , 100. 0 40 1 10. 6 382 Ta b l e 1 : Re c o g ni t i o n s t a t i s t i c s o n 1 0 - t e x t ur e l i br a r y a t t hr e e pi x e l r e s o l u t i o n a n d 3 2 2 3 2 . The numb e r o f we i g ht s i s de t e r mi ne d by mul t i pl y i ng t he n u mb e r o f c a t e g t h e n u mb e r o f we i g h t s p e r c a t e g o r y, o r . i s c a l c ul a t e d ba s e d o n t h e d i me ns i nput s pa c e , , a nd t he numb e r o f o ut put c l a s s e s , . =1 5 f o r t he y b r i d S y s t e m, ARTEX, a n d =1 0 b e c a us e t he r e a r e 1 0 t e x t ur e s . Fo r LP, = = 2 5 . Fo r - NN = 1 = 1 6 . Fo r ARTEX wi t h a l l f e a t ur e s , =2 2 = 3 6 . Fo r ARTEX wi t h no l a r g e - s c a l e f e a t ur e s ( =1 3 ) , = 2 8 . Fo r ARTEX wi t h no br i g h t n e s s f e a t u r e , = 3 4 . Fo r ARTEX wi t h no l a r g e - s c a l e o r br i g ht ne s s f e a t ur e s ( = 1 2 ) , Fo r e x a mpl e , t h e 4 8 , 0 0 0 we i g ht s f o r - NN a r e c o mput e d a s f o l l o ws . Th e y b r i d S y s t f e a t u r e s p e r i nput s a mpl e . i t h - NN, t he s e 1 5 f e a t ur e s pl us t he c o r r e c t c l a s s s t o r e d f o r e a c h t r a i ni ng s a mpl e . The r e f o r e , t he numb e r o f we i g ht s t h a t mu s t b e s ( n u mb e r o f t r a i ni ng s a mpl e s ) . Si nc e t he r e a r e 3 0 0 s a mpl e s /c l a s s a n d 1 0 c l a s s e s , t r a i n i n g s a mpl e s . I n a l l 1 6 2 3 ; 0 0 0 = 4 8 ; 0 0 0 we i g ht s . 18

128. resul ts w reported f or 30, 60, and 90 hi dden uni ts. ere A T Xw tested w th several con

129. gurati ons, w th di erent subsets of i ts f eatures R E as i i rem oved. Wth i ts f ul l 17-di m onal f eature set, A T Xachi eved 95. 8 correct af ter i ensi RE onl y one i ncremental trai ni ng epoch, and 96. 3 af ter

130. ve epochs. B com son, the y pari H d Systemw th K Nachi eved onl y 87. 0 correct af ter one trai ni ng epoch, at the ybri i -N cost of 3, 000 stored exem ars com pl pared to 23 i nternal categori es f or A T X Wth RE . i m l onger trai ni ng ti m (i . e. , 500 trai ni ng epochs usi ng M , or the com uch es LP putati onal l y expensi ve batch-l earni ng procedures usi ng K eans and IT U ), the H d System -m R LE ybri di d not m the perf orm of A T Xw th onl y one i ncrem atch ance RE i ental l earni ng epoch, and exhi bi ted 49 m errors than A T Xw th 5 trai ni ng epochs. ore RE i Three al ternati ve A T Xcon

131. gurati ons w al so tested to el uci date w A T X RE ere hy R E achi eved better resul ts than the H d System A T Xuses f our spati al scal es versus ybri . RE onl y three f or the H d System T ybri . heref ore, perhaps i ts l argest spati al scal e conf erred an advantage to A T XRE . T s possi bi l i ty w tested by rem ng the l argest scal e, resul ti ng hi as ovi i n a sl i ght perf orm i ncrem (97. 1 ). A ance ent nother uni que f eature used by A T Xi s i ts RE

132. l l ed-i n surf ace bri ghtness f eature, w ch seem to be m eecti ve than the m ti -scal e hi s ore ul G aussi an bl urri ng used by the H d System R ovi ng the bri ghtness f eature resul ted ybri . em i n a perf orm ance decrem (95. 6 ). T s di erence quanti

133. es how m surf ace as ent hi uch opposed to boundary properti es i n uence recogni ti on accuracy on these data. Fi nal l y, both the l arge-scal e and the bri ghtness f eatures w rem ere oved. T s resul ted i n a si m l ar hi i perf orm decrem (95. 7 ). ance ent T m he odest rol e pl ayed by the surf ace bri ghtness f eature i n cl assi f yi ng these data i s consi stent w th cogni ti ve evi dence sum ari zed above suggesti ng that boundary i nputs i m that go di rectl y to the hum cogni ti ve recogni ti on systemare of ten su ent to ac- an ci curatel y recogni ze m obj ects. Surf ace bri ghtness and col or properti es becom m any e ore important i nsof ar as the boundary i nf orm on, by i tsel f , i s am guous. G venthat bound- ati bi i ari es are predi cted to be perceptual l y i nvi si bl e w thi n the B S i tsel f (vi z. , the i nterbl ob i C corti cal processi ng stream these resul ts are consi stent w th the possi bi l i ty of bei ng abl e ), i to qui ckl y begi n to recogni ze certai n obj ects usi ng thei r i nvi si bl e boundari es even bef ore these obj ects becom vi si bl e through thei r surf ace properti es. e T A T Xadvantage, even w th

134. ve A T Xf eatures rem he R E i RE oved, i s probabl y due to som rem ni ng di erences betw e ai eenthe system (1) the nature of band-pass

135. l teri ng pri or s: to ori entati onal

136. l teri ng, (2) the bandw dth characteri sti cs of the ori entati onal

137. l ters, (3) i spati al pool i ng at the thi rd spati al scal e, and/or (4) the cl assi

138. cati on schem T

139. rst e. he di erence i s i n the Stage 1 band-pass

140. l teri ng operati on pri or to the ori entati onal Gabor

141. l teri ng. T H d Systemuses a Lapl aci an pyramd i n w ch both the center and he ybri i hi surround G aussi ans that m up the band-pass

142. l ter doubl e i n si ze w th each scal e. In ake i A T X onthe other hand, onl y the surroundG RE , aussi angrow w theachsuccessi ve spati al s i scal e. It preserves on-center resol uti on w l e varyi ng the scal e of i m norm i zati on hi age al and noi se suppressi on. T the H d Systemi s m m restri cti ve i n the range of hus, ybri uch ore spati al f requenci es that are passed through to i ts ori entati onal

143. l teri ng stage. T second he 19

144. di erence i s that the ori ented

145. l ters used by the tw m s have di erent bandw dth o odel i characteri sti cs: the A T X G R E abor

146. l ters are de

147. ned w th hi gher-f requency si new i aves (50 hi gher f requency; see A ppendi x I f or param eters). T thi rd di erence i s that he Stage 4 of A T Xperf orm spati al pool i ng f ol l ow ng ori entati onal

148. l teri ng at eachspati al RE s i scal e. T H d Systemdoes not do thi s i n i ts l argest spati al f requency channel at 8 2 8 he ybri resol uti on. Theref ore, thi s di screpancy m ght hel p expl ai n w A T Xoutperf orm the i hy R E s H d Systemat 8 2 8 resol uti on, but not at l ow resol uti ons. T f ourth di erence ybri er he i s the cl assi

149. cati on stage. T advantages of the sel f -organi zi ng G he aussi an A T A R MP cl assi

150. er over those used by the H d Systemare descri bed above. ybri 7.2 L er T tu L raries arg ex re ib In G reenspan (1996), recogni ti on stati sti cs of the H d Systemon a 30-texture l i brary ybri w presented. T s l i brary consi sts of 19 textures f romthe B ere hi rodatz al bum and 11 , addi ti onal textures of com parabl e com exi ty. W w unabl e to obtai n thi s database, pl e ere and so w chose to eval uate A T Xon a l i brary of si m l ar textures obtai ned sol el y f rom e RE i the B rodatz al bum w ch contai ns the 19 textures used i n G , hi reenspan (1996) as a subset. Fi gure 3Bshow thi s l i brary of 42 B s rodatz textures. T pl ate num f romthe B he bers rodatz al bumare l i sted i n the capti on. T 19 textures eval uated i n G he reenspan (1996) com sepri the

151. rst three row of Fi gure 3, as w l as the

Neural System for Learning to Recognize Textured Scenes

Recomendados

Recomendados

Más contenido relacionado

Similar a Neural System for Learning to Recognize Textured Scenes

Similar a Neural System for Learning to Recognize Textured Scenes (20)

Más de ESCOM

Más de ESCOM (20)

Último

Último (20)

Neural System for Learning to Recognize Textured Scenes