This document provides an overview of recurrent neural networks and their applications. It discusses how RNNs can remember previous inputs through feedback loops and internal states. Long short-term memory networks are presented as an improvement over standard RNNs in dealing with long-term dependencies. The document also introduces word embeddings to map words to vectors, and transformers which provide an alternative to RNNs using self-attention. Code examples of RNNs in TensorFlow 2.0 are also shown.
9. Things you will learn
today
• Handling of sequences in Neural Networks
• RNN in Natural Language Processing Tasks
• Basics of TF2
• Show me the code!!
ATTENTION: Some changes ahead!
10. Things you will not learn
today
• What is a…
• Neural network
• Autoencoder
• …
• What is TensorFlow
• How to code in Python
• Star Trek lore
12. Language is a process of free creation; its laws and
principles are fixed; but the manner in which the
principles of generation are used is free and
infinitely varied. Even the interpretation and used
of words involves a process of free creation.
Noam Chomsky
15. Dynamic Structures
• Consider time as a variable
• Use energy functions to describe information inside the
network
• First time using own generated information → feedback
loop
16. Recurrent Neural Networks
• Dynamic properties → feedback loops
• Short term memory
• Adaptive behaviour
• Differential equation system
• Applications
• Signal processing
• Temporal series forecasting
17. How can we remember things?
Use an internal state
20. Far, far ahead in the future
Problem with long term dependencies
21. LSTM to the rescue!
• LSTMs are explicitly designed to avoid the long-term
dependency problem
• Remembering as default behavior.
• How? → 3 in 1 operation
Forget Update Output
25. Word embeddings
• Words or phrases are mapped to vectors of real numbers
• Relationships of words
• Their representation is learned by the usage of words
30. Some code about it
import tensorflow as tf
b = tf.Variable(tf.zeros((100,)))
W = tf.Variable(tf.random_uniform((784, 100), -1, 1))
x = tf.placeholder(tf.float32, (None, 784))
h_i = tf.nn.relu(tf.matmul(x, W) + b)
sess = tf.Session()
sess.run(tf.initialize_all_variables())
sess.run(h_i, {x: np.random.random(64, 784)})
31. TF 2.0: Eager Execution
import tensorflow as tf
import numpy as np
b = np.zeros(100,)
W = np.random_uniform((784, 100), -1, 1)
x = np.random.random(64, 784)
h_i = tf.nn.relu(tf.matmul(x, W) + b)
32. TF2.0 – They did it!!!
• Eager Execution as default
• No need to add tf.enable_eager_execution()
• Blocks or Functions can be defined to be executed in
graph mode
• Less conventions, more object-oriented and pythonic
design
• Variable_scope removed
• And yes Keras is inside
• Clean up in libraries and contrib
34. So LSTMs are here to stay,
right?
Ermmm…
• Unable to parallelize within training
examples
• Large memory constraints impact the
parallelization across training examples
36. Transformer
Jun 2017
• Google Research:
• “Attention is all you need” [Arxiv: 1706.03762]
• Vs. RNNs:
• Order-of-magnitude improvement on training time
• Vs. Convolutional Models:
• Complexity grows with distance/lenght on
convolutional models
44. Reality
No labelled language
Learn extracting relationships from text
Emmbedding structures are similar
Language Agnostics Models → Yes!!
Cross-lingual embeddings
45. Beyond words…
How about machines that make machines?
Encoder Decoderen en Encoder Decoderes es
Encoder Decoderen es Encoder Decoderes en
46.
47. Huge models
• GPT2 – OpenAI project
• Training on larger datasets: books, webs…
• increases: reading comprehension, translation,
summarization and QA
The bigger they are, the harder they fall
48.
49. Help me!!
Decision Making Support
HHRR and Hiring Processes
Grading Test
Law, Regulation and Compliance
Contract Analysis
50. Help me!!
Harder, Better, Faster, Stronger
Large and Multiple Documents
Multi-hop Reasoning
Contextualized Information in Dialogues