Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Deep Learning for Natural Language Processing

112 visualizaciones

Publicado el

In this presentation I go over the theory and practical aspects of applying Deep Learning to solve NLP problems, more specifically, developing models for sentiment analysis.

All the code used in the demo can be found here:

https://github.com/ekholabs/DLinK
https://github.com/ekholabs/automated_ml

The presentation is available on YouTube: https://www.youtube.com/watch?v=eZavheF5TBE

I start on 1:06:02.

Publicado en: Tecnología
  • Inicia sesión para ver los comentarios

Deep Learning for Natural Language Processing

  1. 1. NATURAL LANGUAGE PROCESSING DEEP LEARNING FOR
  2. 2. MACHINE LEARNING ENGINEER WILDER RODRIGUES • Coursera Mentor • City.AI Ambassador; • IBM Watson AI XPRIZE contestant; • Kaggler; • Guest attendee at AI for Good Global Summit at the UN; • X-Men geek; • family man and father of 5 (3 kids and 2 cats). @wilderrodrigues https://medium.com/@wilder.rodrigues/
  3. 3. WHAT IS IN THERE FOR YOU? AGENDA • The Basics • Vector Representation of Words • The Shallow • [Deep] Neural Networks for NLP • The Deep • Convolutional Networks for NLP • The Recurrent • Long-short Term Memory for NLP • Where do we go from here? • Automation of AWS GPUs with Terraform
  4. 4. VECTOR REPRESENTATION OF WORDS THE BASICS
  5. 5. REPRESENTATIONS OF LANGUAGE
  6. 6. HOW DOES IT WORK? WORD2VEC • Cosine distance between words in the vector space: • X = vector(”biggest”)−vector(”big”) + vector(”small”) • X = smallest • Algorithms: • Skip-Gram • It predicts the context words from the target words. • CBOW • It predicts the target word from the bag of all context words. Cosine Distance Euclidian Distance The CBOW architecture predicts the current word based on the context, and the Skip-gram predicts surrounding words given the current word.
  7. 7. DEMO WORD2VEC
  8. 8. [DEEP] NEURAL NETWORKS THE SHALLOW
  9. 9. WHERE TO FOCUS FOR NOW?
  10. 10. DEMO SENTIMENT ANALYSIS
  11. 11. CONVOLUTIONAL NEURAL NETWORKS THE DEEP
  12. 12. HOW THEY WORK? CNNS • Filters • Kernel • Strides • Padding • One equation to rule them all: * = 6x6x3 3x3x3 4x4x16 4x4x16 2x2x16 2x2x16 * = 2 6 3 3 6 4 7 9 8 3 1 -1 4 0 0 4 2 3 91 1 6 2 3 2 5 7 9 7 2 1 4 3 2 7 7 4 8 2 6 7 3 4 4 3 9 1 55 (6 + 2 . 0 - 3) / 1 + 1 = 4 (6 + 2 . 0 - 3) / 1 + 1 = 4 16 4x4x16
  13. 13. HOW THEY WORK WITH TEXT? CNNS • Each row of the matrix corresponds to a word/token. Meaning, each row is a low-dimensional vector that represents a word/token. • The width of the filters is usually the same as the width of the input matrix. • The height may vary, but it’s typically between 2 and 5. So, for a 2x5 filter it means we would cover 2 words per sliding window.
  14. 14. DEMO SENTIMENT ANALYSIS
  15. 15. LONG SHORT TERM MEMORY THE RECURRENT
  16. 16. LONG-TERM DEPENDENCIES PROBLEMS RNNS • Small vs Large gap between the relevant information for the prediction: • “the clouds are in the sky.”; • “I grew up in France… I speak fluent French.”.
  17. 17. HOW THEY WORK? LSTMS • LSTMs’ Gates: • Forget • Decides whether the state will be passed through or not. • Input • Decides on which values to update and then feeds a tanh which will output the next Candidate state. • Update the new state based on the previous one plus the candidate state. • Output • Feeds a sigmoid function to decide which parts of the state will be output. • Feeds a tanh function with the State and multiplies its output with the sigmoid result.
  18. 18. DEMO SENTIMENT ANALYSIS
  19. 19. TERRAFORM WHERE DO WE GO FROM HERE?
  20. 20. INFRASTRUCTURE AS CODE BUILDING A LANDSCAPE • Abstracts resources and providers: • Physical hardware; • Virtual machines; and • Containers. • Multi-Tier Applications • Multi-Cloud Deployment • Software Demos
  21. 21. DEMO PUT IT ALL TOGETHER
  22. 22. WHERE DID I GET THIS STUFF FROM? REFERENCES • Efficient Estimation of Word Representations in Vector Space: Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean. Google, 2013. • A Neural Probabilistic Language Model: Yoshua Bengio, Réjean Ducharme, Pascal Vincent, Christian Jauvin. Université de Montréal, Montréal, Québec, Canada, 2013. • Dropout: A Simple Way to Prevent Neural Networks from Overfitting: Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, Ruslan Salakhutdinov. University of Toronto, Toronto, Ontario, Canada. • https://medium.com/cityai/deep-learning-for-natural-language-processing-part-i-8369895ffb98 • https://medium.com/cityai/deep-learning-for-natural-language-processing-part-ii-8b2b99b3fa1e • https://medium.com/cityai/deep-learning-for-natural-language-processing-part-iii-96cfc6acfcc3 • http://www.wildml.com/2015/11/understanding-convolutional-neural-networks-for-nlp/ • https://github.com/ekholabs/DLinK • https://github.com/ekholabs/automated_ml

×