Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16

1.257 visualizaciones

Publicado el

Can Cognitive Neuroscience Provide a Theory of Deep Learning Capacity?: Deep neural networks have achieved learning feats for video, image, and speech recognition that leave other techniques far behind. For example, the error rate on the ImageNet 2012 object recognition challenge was halved with the introduction of deep convolutional nets and now they dominate these competitions. At the same time, the industry is busy putting them to use on applications spanning autonomous driving to product recommenders and researchers continue to propose more elaborate topologies and intricate training techniques. But our theoretical understanding of how these networks encode representations of the “things they see” is far behind, as is our understanding of their limitations.

To advance deep neural network design from “black magic” to an engineering problem, we need to understand the impact that the choice of topology and parameters have on learnt representations and the processing that a network is capable of. How many representations can a given network store? How does representation “reuse” impact learning rate and learning capacity? How many tasks can a given network perform?

In this talk, I’ll describe why the human brain, with its seemingly unlimited parallel distributed processing, is downright terrible at multi-tasking and why this is totally logical. And I’ll describe the theoretical implications this may have for artificial neural networks. I’ll also describe very recent work that sheds some light on how representations are encoded and how our research team is extending this work to create practical best practices for network design.

Publicado en: Tecnología
  • Sé el primero en comentar

Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16

  1. 1. CanCognitiveNeuroscienceProvideaTheoryof DeepLearningCapacity? Ted Willke and the Mind’s Eye Team Intel Labs May 20, 2016
  2. 2. 2 “Breakthrough innovation occurs when we bring down boundaries and encourage disciplines to learn from each other” ― Gyan Nagpal, Talent Economics: The Fine Line Between Winning and Losing the Global War for Talent 2 MIND’SEYE
  3. 3. 3 Cognitive Neuroscience
  4. 4. 4 CognitiveNeuroscience  Is the study of the neurobiological mechanisms that underlie cognitive processes, like attention, control, and decision making  Answer questions like: How does the brain coordinate behaviour to achieve goals? What are the brain structures upon which these functions depend? How does brain function differ amongst people?  Draws upon brain imaging/recordings and other observations to derive models
  5. 5. 5 Context-DependentDecisionMaking Michael Shvartsman, Vibhav Srivatsava, Narayanan Sundaram, Jonathan D. Cohen, “Using behavior to decode allocation of attention in context dependent decision making”, accepted at International Conference on Cognitive Modeling, 2016.
  6. 6. 6 SelectiveForgetting Kim, Ghootae and Lewis-Peacock, Jarrod A. and Norman, Kenneth A. and Turk-Browne, Nicholas B., “Pruning of memories by context-based prediction error,” Proceedings of the National Academy of Sciences, 2014
  7. 7. 7 Productionandcomprehensionofnaturalisticnarrativespeech Silbert LJ, Honey CJ, Simony E, Poeppel D, Hasson U (2014) Coupled neural systems underlie the production and comprehension of naturalistic narrative speech. Proc Natl Acad Sci USA 111:E4687-4696.
  8. 8. 8 CRACKSAPPEAR,DISRUPTIVEIDEAS30yearson MIT Press, 1986
  9. 9. 9 CognitiveNeuroscience Adapted from Marvin Minksy in Artificial Intelligence at MIT, Expanding Frontiers, Patrick H. Winston (Ed.), Vol.1, MIT Press, 1990. Reprinted in AI Magazine, Summer 1991 evolve
  10. 10. 10 Neural networks
  11. 11. 11 NeuralNetworkpreliminaries
  12. 12. 12 NeuralNetworkpreliminaries Lecun et al., “Deep Learning” in Nature (2015)
  13. 13. 13 Arbitraryfunctions
  14. 14. 14 Theoriginaltenetsofparalleldistributedprocessing(roughly) 1. Cognitive processes arise from the real-time propagation of activation via weighted connections 2. Active representations are patterns of activation distributed over ensembles of units 3. Processing is interactive (bidirectional) 4. Knowledge is encoded in the connection weights (not in a separate store) 5. Learning and long-term memory depend on changes to these weights 6. Processing, learning, and representation are graded and continuous 7. Processing, learning, and representation depend on the environment T.T. Rogers, J.L. McClelland / Cognitive Science 28 (2014)
  15. 15. 15 Brain-Inspiredmachinelearning Structure-Inspired Learning  Neurons (e.g., spiking models)  Networks (e.g., deep belief networks)  Architectures (e.g., Human Brain Project) Cognitive-Inspired Learning  Reinforcement Learning  Context-based Memory  Noisy Decision Making 15 "Gray754" by Henry Vandyke Carter - Henry Gray (1918) Anatomy of the Human Body
  16. 16. 16 Deeplearningtakesadvantageofparalleldistributedprocessing
  17. 17. 17 Winningtopspotsinvisualrecognitionchallenges,etc. (1) Lin et al., 2015, (2) (3) Deng et al., 2009 (4) MS COCO (Common Objects in Context) CityScapes Datasets (Semantic Understanding) ImageNet (Object Localization) LSUN (Saliency Prediction)
  18. 18. 18 Yang et al. (2015) Whataresittinginthebasketonabicycle?
  19. 19. 19 Yang et al. (2015) StackedAttentionNetworksforImageQuestionAnswering
  20. 20. 20 TheGloryandtheremainingmystery We have achieved…  Exceeding human-level performance on visual recognition tasks  Mastering more and more complex games (Go)  Demonstrating human-level control in reinforcement learning (Atari)  Question-answering and other AI services are upon us but we still don’t know…  How learnt (feature) representations are encoded (or if they converge for the same networks trained on the same data)  The capacity for learning representations  The trade-off between efficiency of representation and flexibility of processing  How things learnt interfere with each other
  21. 21. 21 Representations and Learning Capacity
  22. 22. 22 Li et al. (ICLR 2016) Representationencoding:meaningfulandconsistent? Can we reliably map feature representations between these networks?
  23. 23. 23 Li et al. (ICLR 2016) ConvergentLearning? Conclusions: 1. Some features are learned reliably in multiple nets (some are not) 2. Units learn to span low-D subspaces, which are common (but specific basis vectors are not) 3. Representations are encoded as a mix of single unit and slightly distributed codes 4. Mean activation values across different networks converge to a nearly identical distribution
  24. 24. 24 Can cognitive neuroscience provide any insight into the nature of learning and task capacity?
  25. 25. 25 Theappealofhighly-parallelneuralnetworks Both cognitive neuroscience and machine learning applications exploit the following two features of neural networks to great benefit: a) The ability to learn and process complex representations, taking into account a large number of interrelated and interacting constraints b) The ability for the same network to process a wide range of potentially disparate representations (or tasks), sometimes called “multitask learning.” But what are their limits??
  26. 26. 26 Thebrain:Theblackboxattheendofournecks • Facts:  Only 2% of body weight but uses up to 20% of energy  ~200B neurons  Neurons fire up to ~10 kHz  1K to 10K connections per neuron • Cerebral neocortex:  ~20B neurons  ~125 trillion synapses There are more ways to organize the neocortex’s ~125 trillion synapses than stars in the known universe
  27. 27. 27 Theparadox–onetaskatatime
  28. 28. 28 Afundamentalpuzzleconcerninghumanprocessing Why, in some circumstances is the brain capable of a remarkable degree of parallelism (e.g., locomotion, navigation, speech, and bimanual gesticulation), while in others it’s capacity for parallelism is radically limited (e.g., the inability to conduct mental arithmetic while constructing a grocery list at the same time)??!!
  29. 29. 29 Atheory  The difference in multitasking ability may reflect the degree to which different tasks rely on shared representations  The more that different processes interact, the stronger the imposition of seriality  May reflect a fundamental trade-off in neural network architectures between the efficiency of shared representations (and the capacity for generalization that they afford) and the effectiveness of multitasking.
  30. 30. 30 Multi-taskingandcross-talk Feng et al. (CABN 2014)
  31. 31. 31 You will see a sequence of words. Quickly say the color of the letters.
  32. 32. 32 SNOW
  33. 33. 33 Ready!
  34. 34. 34 BLUE
  35. 35. 35 RED
  36. 36. 36 BLACK
  37. 37. 37 GREEN
  38. 38. 41 BLACK
  39. 39. 42 BLUE
  40. 40. 46 GREEN
  41. 41. 49 RED
  42. 42. 52 BLUE
  43. 43. 54 Now with the words upside down.
  44. 44. 55 BLACK
  45. 45. 56 GREEN
  46. 46. 58 BLACK
  47. 47. 59 RED
  48. 48. 60 Were you faster to answer?
  49. 49. 61 ADemonstrationofinterference Stroop (1935)
  50. 50. 62 multi-taskinginterference(Inthestrooptest) Cohen et al. (1990) Color Word Verbalize Task
  51. 51. 63 Control-DemandingBehavior(Fengetal.2014)  First to describe the trade-off between the efficiency of representation (“multiplexing”) and the simultaneous engagement of different processing pathways (“multitasking”)  Showed that even a modest amount of multiplexing rapidly introduces cross-talk among processing pathways  Proposed that the large advantage of efficient encoding have driven the human brain to favour this over the capacity for control-demanding processes.
  52. 52. 64 Typesofinterference
  53. 53. 65 Maximumindependentset(MIS) The MIS is the largest set of processes in the network that can be simultaneously executed without interference.
  54. 54. 66 networkstructure(distributioncomplexity)  The network capacity for multitasking depends on the distribution of in-degrees and out-degrees of the network (we only play with in-degree of output components though)  We represent this with a “distribution complexity” symmetry measure (maximized for uniform distribution)  We study the characteristics of the network with DC fixed
  55. 55. 67 Takeaway:Evenmodestamountsofprocessoverlapimposedramatic constraintsonparallelprocessingcapability
  56. 56. 68 Trade-offbetweengeneralizationandparallelism:Feed-Forwardsimulation
  57. 57. 69 Training/Testdetails Training  20 network groups, 20 random initializations per group  All networks trained on same stimuli, 16 tasks  Trained to generate 1-hot task outputs (MSE < 0.0001) Test  70/30 split  Generalization is MSE(ave) for ALL stimuli in test set  Parallel processing is measured response to (2,3,4) tasks simultaneously activated, measuring MSE for target pattern
  58. 58. 70 SharedRepresentations Smaller weights (a) Larger weights (b)
  59. 59. 71 Generalizationvsparallelprocessingcapability
  60. 60. 72 Parallelprocessingcapabilityvsmaxinitialweights
  61. 61. 73 Futurework  Extend analysis to weighted graphs  Study more complex networks (i.e., deeper structures, recurrent connections)  Study human performance (via neuroimaging data)!
  62. 62. 74 C.elegans 74 The OpenWorm Project (image generated by neuroConstruct) SINCE 1986
  63. 63. Thankyou!