SlideShare a Scribd company logo
1 of 20
Download to read offline
Commercially available tools
Open source tools
Word2vec Words embedding model
RNN for generating Bahasa text
LSTM in theory
Integrating Words embedding model
into LSTM
Automatic question and reply engine
Overview of NLP using
Deep Learning Part 1
Open Source Tools
• Python : NLTK and Textblob
– POS Tagging: Part of speech tagging
– Named Entities Recognition:
• "Steve job" -> person
• "Apple" -> organization
– Semantic Identification
– Lemmatization and Stemming: reduce words to base form
• R: TM
http://textminingonline.com/getting-started-with-
- Word Tokenization
- Sentence Tokenization
- Part-of-speech tagging
- Noun phrase extraction
- Sentiment analysis
- Word Pluralization
- Word Singularization
- Spelling correction
- Parsing
- Classification (Naive Bayes, Decision Tree)
- Language translation and detection powered by - Google
Translate
- Word and phrase frequencies
- n-grams
- Word inflection (pluralization andsingularization) and
lemmatization
- JSON serialization
- Add new models or languages through extensions
- WordNet integration
Word2vec
https://iksinc.wordpress.com/tag/word2vec/https://www.slideshare.net/hen_drik/word2vec-from-theory-to-practice
Word2vec
https://iksinc.wordpress.com/tag/word2vec/https://www.slideshare.net/hen_drik/word2vec-from-theory-to-practice
word
vectors
Using Word2vec
• Consider the training corpus having the following sentences:
– “the dog saw a cat”
– “the dog chased the cat”
– “the cat climbed a tree”
• The corpus vocabulary has eight words. Once ordered alphabetically, each word can be referenced
by its index, i.e. a, cat, chased, climbed, dog, saw, the, tree}. For this example, the neural network
will have eight input neurons and eight output neurons. Let us assume that we decide to use three
neurons in the hidden layer. This means that Winput and Woutput will be 8×3 and 3×8 matrices,
respectively. Before training begins, these matrices are initialized to small random values as is
usual in neural network training. Just for the illustration sake, let us assume Winput and Woutput to be
initialized to the following values:
https://iksinc.wordpress.com/tag/word2vec/https://www.slideshare.net/hen_drik/word2vec-from-theory-to-practice
[,1] [,2] [,3]
[1,] -0.92513658 -0.743787260 1.6273785
[2,] 0.08458616 -1.258307794 0.4852640
[3,] 0.83675919 -0.001426922 -0.1703800
[4,] 0.94409916 0.018061199 -0.6304152
[5,] 0.04691568 -1.599246381 -0.8439630
[6,] -0.82112415 -1.084833252 0.1231866
[7,] -0.93035265 0.003375370 -1.3572083
[8,] -1.31701003 0.659632590 0.1134216
Winput = Woutput = [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] -1.3907119 0.5259696 1.0829041 1.9993983 0.3370346 1.4518856 0.4802576 0.3931751
[2,] -0.5347725 0.6776164 -0.3288658 -0.1287490 -0.5626609 0.6886097 -1.3618653 1.0593093
[3,] -0.9392639 -0.3924568 0.6765556 0.5703951 0.6843841 -0.9567421 -0.3512964 1.8581440
Using Word2vec
• Suppose we want the network to learn relationship between the words “cat” and “climbed”. That is,
the network should show a high probability for “climbed” when “cat” is inputted to the network. In
word embedding terminology, the word “cat” is referred as the context word and the word “climbed”
is referred as the target word.
• cat --> climbed
• In this case, the input vector X will be [0 1 0 0 0 0 0 0]. Notice that only the second component of
the vector is 1. This is because the input word is “cat” which is holding number two position in
sorted list of corpus words. Given that the target word is “climbed”, the target vector will look like
• [0 0 0 1 0 0 0 0 ].
• With the input vector representing “cat”, the output at the hidden layer neurons can be computed
as:
https://iksinc.wordpress.com/tag/word2vec/https://www.slideshare.net/hen_drik/word2vec-from-theory-to-practice
Ht = X.Winput
[,1] [,2] [,3]
[1,] -0.92513658 -0.743787260 1.6273785
[2,] 0.08458616 -1.258307794 0.4852640
[3,] 0.83675919 -0.001426922 -0.1703800
[4,] 0.94409916 0.018061199 -0.6304152
[5,] 0.04691568 -1.599246381 -0.8439630
[6,] -0.82112415 -1.084833252 0.1231866
[7,] -0.93035265 0.003375370 -1.3572083
[8,] -1.31701003 0.659632590 0.1134216
Using Word2vec
• It should not surprise us that the vector Ht of hidden neuron outputs mimics the weights of the
second row of Winput matrix because of 1-out-of-V representation. So the function of the input to
hidden layer connections is basically to copy the input word vector to hidden layer. Carrying out
similar manipulations for hidden to output layer, the activation vector for output layer neurons can
be written as:
• Since the goal is produce probabilities for words in the output layer, Pr(wordk|wordcontext) for k =
1, V, to reflect their next word relationship with the context word at input, we need the sum of
neuron outputs in the output layer to add to one. Word2vec achieves this by converting activation
values of output layer neurons to probabilities using the softmax function. Thus, the output of the k-
th neuron is computed by the following expression where activation(n) represents the activation
value of the n-th output layer neuron
https://iksinc.wordpress.com/tag/word2vec/https://www.slideshare.net/hen_drik/word2vec-from-theory-to-practice
Ht .Woutput = [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 0.09948251 -0.9986054 0.8337211 0.6079195 1.068616 -1.207946 1.583797 -0.3979896
Target vector = [0 0 0 1 0 0 0 0 ].
Using Word2vec
• Thus, the probabilities for eight words in the corpus are:
• The probability in bold is for the chosen target word “climbed”. Given the target vector [0 0 0 1 0 0 0
0 ], the error vector for the output layer is easily computed by subtracting the probability vector from
the target vector. Once the error is known, the weights in the matrices Winput and Woutput
can be updated using backpropagation. Thus, the training can proceed by presenting different
context-target words pair from the corpus. In essence, this is how Word2vec learns relationships
between words and in the process develops vector representations for words in the corpus.
https://iksinc.wordpress.com/tag/word2vec/https://www.slideshare.net/hen_drik/word2vec-from-theory-to-practice
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
0.0768 0.0256 0.1602 0.127 0.2026 0.0207 0.339 0.0467
Word2vec
• CBOW:
– Predict the current word based
on the context
– Order of words in the history
does not influence the
projection
– Faster & more appropriate for
larger corpora
https://iksinc.wordpress.com/tag/word2vec/https://www.slideshare.net/hen_drik/word2vec-from-theory-to-practice
Word2vec
• Continuous Skip Gram Model:
– maximize classification of a word
based on another word in the
same sentence
– better word vectors for frequent
words, but slower to train
https://iksinc.wordpress.com/tag/word2vec/https://www.slideshare.net/hen_drik/word2vec-from-theory-to-practice
Using Word2vec - CBOW
• The above description and architecture is meant for learning relationships between pair of words. In the continuous bag of
words model, context is represented by multiple words for a given target words. For example, we could use “cat” and “tree”
as context words for “climbed” as the target word. This calls for a modification to the neural network architecture. The
modification, shown below, consists of replicating the input to hidden layer connections C times, the number of context
words, and adding a divide by C operation in the hidden layer neurons.
• With the above configuration to specify C context words, each word being coded using 1-out-of-V representation means that
the hidden layer output is the average of word vectors corresponding to context words at input. The output layer remains the
same
https://iksinc.wordpress.com/tag/word2vec/https://www.slideshare.net/hen_drik/word2vec-from-theory-to-practice
cat
the
mat
sat
Using Word2vec - SGM
• Skip-gram model reverses the use of target and context words. In this case, the target word is fed at the input, the hidden
layer remains the same, and the output layer of the neural network is replicated multiple times to accommodate the chosen
number of context words. Taking the example of “cat” and “tree” as context words and “climbed” as the target word, the input
vector in the skim-gram model would be [0 0 0 1 0 0 0 0 ], while the two output layers would have [0 1 0 0 0 0 0 0] and [0 0 0
0 0 0 0 1 ] as target vectors respectively.
• In place of producing one vector of probabilities, two such vectors would be produced for the current example. The error
vector for each output layer is produced in the manner as discussed above. However, the error vectors from all output layers
are summed up to adjust the weights via backpropagation. This ensures that weight matrix Woutput for each output layer
remains identical all through training.
https://iksinc.wordpress.com/tag/word2vec/https://www.slideshare.net/hen_drik/word2vec-from-theory-to-practice
cat
the
mat
sat
Using Word2vec
https://iksinc.wordpress.com/tag/word2vec/https://www.slideshare.net/hen_drik/word2vec-from-theory-to-practice
*word vectors capture many
linguistic similarities
Using Word2vec
https://iksinc.wordpress.com/tag/word2vec/https://www.slideshare.net/hen_drik/word2vec-from-theory-to-practice
*word vectors capture many
linguistic similarities
Using Word2vec
https://iksinc.wordpress.com/tag/word2vec/https://www.slideshare.net/hen_drik/word2vec-from-theory-to-practice
*word vectors capture many
linguistic similarities
* vector operations:
V(Paris) - V(France) +V(Italy)
results in a vector
which is very close to V(Rome)
V(Beijing) - V(China) +
V(Poland) results in a vector
which is very close to V(Warsaw)
V(King) - V(Woman) +
V(Man) results in a vector
which is very close to V(Queen)
Using Word2vec
• Original: http://word2vec.googlecode.com/svn/trunk/
• C++11 version: https://github.com/jdeng/word2vec
• Python: http://radimrehurek.com/gensim/models/ word2vec.html
• R: https://github.com/bmschmidt/wordVectors
• Java: https://github.com/ansjsun/word2vec_java
• Parallel java: https://github.com/siegfang/word2vec
• Spark: https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/mllib/feature/Word2Vec.html
• CUDAversion: https://github.com/whatupbiatch/cuda-word2vec
• Numpy implementation:
https://github.com/lazyprogrammer/machine_learning_examples/blob/master/nlp_class2/word2vec.py
https://iksinc.wordpress.com/tag/word2vec/https://www.slideshare.net/hen_drik/word2vec-from-theory-to-practice
RNN for generating Bahasa text
• Simple implementation of RNN of tensorflow for word-level language model
• https://github.com/hunkim/word-rnn-tensorflow/blob/master/model.py
• Small traininging dataset (text file of size 3.1MB), it has cleaned Malay text with complete sentences.
• number of hidden states is 50, number of rnn layer is 2, number of rnn sequence length is 20 and number of epochs is 10.
• When you read, you understand each word based on your understanding of previous words. You don’t throw everything
away and start thinking from scratch again. Your thoughts have persistence. Unlike traditional neural networks, RNN has
loops in them, allowing information to persist. A recurrent neural network can be thought of as multiple copies of the same
network, each passing a message to a successor. This chain-like nature reveals that recurrent neural networks are
intimately related to sequences and lists. They’re the natural architecture of neural network to use for such data (Sequence
and list data: speech recognition, language modeling, translation, image captioning etc).
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
RNN for generating Bahasa text
• The hidden state at time step t is h_t. It is a function of the input at the same time step x_t, modified by a weight matrix W (like the
one we used for feedforward nets) added to the hidden state of the previous time step h_t-1 multiplied by its own hidden-state-to-
hidden-state matrix U, otherwise known as a transition matrix and similar to a Markov chain. The weight matrices are filters that
determine how much importance to accord to both the present input and the past hidden state. The error they generate will return via
backpropagation and be used to adjust their weights until error can’t go any lower.
• The sum of the weight input and hidden state is squashed by the function φ – either a logistic sigmoid function or tanh, depending –
which is a standard tool for condensing very large or very small values into a logistic space, as well as making gradients workable for
backpropagation.
• Because this feedback loop occurs at every time step in the series, each hidden state contains traces not only of the previous hidden
state, but also of all those that preceded h_t-1 for as long as memory can persist.
• Given a series of letters, a recurrent will use the first character to help determine its perception of the second character, such that an
initial q might lead it to infer that the next letter will be u, while an initial t might lead it to infer that the next letter will be h.
• Since recurrent nets span time, they are probably best illustrated with animation (the first vertical line of nodes to appear can be
thought of as a feedforward network, which becomes recurrent as it unfurls over time).
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Commercially available tools
Open source tools
Word2vec Words embedding model
RNN for generating Bahasa text
LSTM in theory
Integrating Words embedding model
into LSTM
Automatic question and reply engine
Overview of NLP using
Deep Learning Part 2
Personal Profile
• Advisory Data Scientist (IBM Malaysia)
• Linkedin: https://www.linkedin.com/in/brian-ho-34068a36/
• Github Blog: https://kimusu2008.github.io/
• Github Account: https://github.com/kimusu2008

More Related Content

What's hot

Building an NLP DNN in 5 Minutes
Building an NLP DNN in 5 MinutesBuilding an NLP DNN in 5 Minutes
Building an NLP DNN in 5 MinutesJenny Midwinter
 
Context-based movie search using doc2vec, word2vec
Context-based movie search using doc2vec, word2vecContext-based movie search using doc2vec, word2vec
Context-based movie search using doc2vec, word2vecJIN KYU CHANG
 
Mobile Email Security
Mobile Email SecurityMobile Email Security
Mobile Email SecurityRahul Sihag
 
Discovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender SystemsDiscovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender SystemsGabriel Moreira
 
text summarization using amr
text summarization using amrtext summarization using amr
text summarization using amramit nagarkoti
 
Text analytics in Python and R with examples from Tobacco Control
Text analytics in Python and R with examples from Tobacco ControlText analytics in Python and R with examples from Tobacco Control
Text analytics in Python and R with examples from Tobacco ControlBen Healey
 
Fdocuments.in sugeno style-fuzzy-inference
Fdocuments.in sugeno style-fuzzy-inferenceFdocuments.in sugeno style-fuzzy-inference
Fdocuments.in sugeno style-fuzzy-inferenceSudhansuPanda15
 
IRJET- Sentiment Analysis of Election Result based on Twitter Data using R
IRJET- Sentiment Analysis of Election Result based on Twitter Data using RIRJET- Sentiment Analysis of Election Result based on Twitter Data using R
IRJET- Sentiment Analysis of Election Result based on Twitter Data using RIRJET Journal
 
Elements of Text Mining Part - I
Elements of Text Mining Part - IElements of Text Mining Part - I
Elements of Text Mining Part - IJaganadh Gopinadhan
 
Language Technology Enhanced Learning
Language Technology Enhanced LearningLanguage Technology Enhanced Learning
Language Technology Enhanced Learningtelss09
 
An Efficient and Secure ID Based Group Signature Scheme from Bilinear Pairings
An Efficient and Secure ID Based Group Signature Scheme from Bilinear PairingsAn Efficient and Secure ID Based Group Signature Scheme from Bilinear Pairings
An Efficient and Secure ID Based Group Signature Scheme from Bilinear PairingsEswar Publications
 
Topic detecton by clustering and text mining
Topic detecton by clustering and text miningTopic detecton by clustering and text mining
Topic detecton by clustering and text miningIRJET Journal
 
ENSEMBLE MODEL FOR CHUNKING
ENSEMBLE MODEL FOR CHUNKINGENSEMBLE MODEL FOR CHUNKING
ENSEMBLE MODEL FOR CHUNKINGijasuc
 
semantic text doc clustering
semantic text doc clusteringsemantic text doc clustering
semantic text doc clusteringSouvik Roy
 

What's hot (18)

Building an NLP DNN in 5 Minutes
Building an NLP DNN in 5 MinutesBuilding an NLP DNN in 5 Minutes
Building an NLP DNN in 5 Minutes
 
Context-based movie search using doc2vec, word2vec
Context-based movie search using doc2vec, word2vecContext-based movie search using doc2vec, word2vec
Context-based movie search using doc2vec, word2vec
 
Mobile Email Security
Mobile Email SecurityMobile Email Security
Mobile Email Security
 
Discovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender SystemsDiscovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender Systems
 
AI applications in education, Pascal Zoleko, Flexudy
AI applications in education, Pascal Zoleko, FlexudyAI applications in education, Pascal Zoleko, Flexudy
AI applications in education, Pascal Zoleko, Flexudy
 
text summarization using amr
text summarization using amrtext summarization using amr
text summarization using amr
 
Text analytics in Python and R with examples from Tobacco Control
Text analytics in Python and R with examples from Tobacco ControlText analytics in Python and R with examples from Tobacco Control
Text analytics in Python and R with examples from Tobacco Control
 
Fdocuments.in sugeno style-fuzzy-inference
Fdocuments.in sugeno style-fuzzy-inferenceFdocuments.in sugeno style-fuzzy-inference
Fdocuments.in sugeno style-fuzzy-inference
 
IRJET- Sentiment Analysis of Election Result based on Twitter Data using R
IRJET- Sentiment Analysis of Election Result based on Twitter Data using RIRJET- Sentiment Analysis of Election Result based on Twitter Data using R
IRJET- Sentiment Analysis of Election Result based on Twitter Data using R
 
Elements of Text Mining Part - I
Elements of Text Mining Part - IElements of Text Mining Part - I
Elements of Text Mining Part - I
 
Language Technology Enhanced Learning
Language Technology Enhanced LearningLanguage Technology Enhanced Learning
Language Technology Enhanced Learning
 
228-SE3001_2
228-SE3001_2228-SE3001_2
228-SE3001_2
 
An Efficient and Secure ID Based Group Signature Scheme from Bilinear Pairings
An Efficient and Secure ID Based Group Signature Scheme from Bilinear PairingsAn Efficient and Secure ID Based Group Signature Scheme from Bilinear Pairings
An Efficient and Secure ID Based Group Signature Scheme from Bilinear Pairings
 
Topic detecton by clustering and text mining
Topic detecton by clustering and text miningTopic detecton by clustering and text mining
Topic detecton by clustering and text mining
 
Generation of Descriptive Elements for Text
Generation of Descriptive Elements for TextGeneration of Descriptive Elements for Text
Generation of Descriptive Elements for Text
 
ENSEMBLE MODEL FOR CHUNKING
ENSEMBLE MODEL FOR CHUNKINGENSEMBLE MODEL FOR CHUNKING
ENSEMBLE MODEL FOR CHUNKING
 
Ch 6 final
Ch 6 finalCh 6 final
Ch 6 final
 
semantic text doc clustering
semantic text doc clusteringsemantic text doc clustering
semantic text doc clustering
 

Similar to Deep learning Malaysia presentation 12/4/2017

Word_Embedding.pptx
Word_Embedding.pptxWord_Embedding.pptx
Word_Embedding.pptxNameetDaga1
 
BERT - Part 2 Learning Notes
BERT - Part 2 Learning NotesBERT - Part 2 Learning Notes
BERT - Part 2 Learning NotesSenthil Kumar M
 
IRJET - Automated Essay Grading System using Deep Learning
IRJET -  	  Automated Essay Grading System using Deep LearningIRJET -  	  Automated Essay Grading System using Deep Learning
IRJET - Automated Essay Grading System using Deep LearningIRJET Journal
 
The Java Fx Platform – A Java Developer’S Guide
The Java Fx Platform – A Java Developer’S GuideThe Java Fx Platform – A Java Developer’S Guide
The Java Fx Platform – A Java Developer’S GuideStephen Chin
 
Word_Embeddings.pptx
Word_Embeddings.pptxWord_Embeddings.pptx
Word_Embeddings.pptxGowrySailaja
 
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshopورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshopiwan_rg
 
"Applied Enterprise Metaprogramming in JavaScript", Vladyslav Dukhin
"Applied Enterprise Metaprogramming in JavaScript", Vladyslav Dukhin"Applied Enterprise Metaprogramming in JavaScript", Vladyslav Dukhin
"Applied Enterprise Metaprogramming in JavaScript", Vladyslav DukhinFwdays
 
IRJET- Automatic Text Summarization using Text Rank
IRJET- Automatic Text Summarization using Text RankIRJET- Automatic Text Summarization using Text Rank
IRJET- Automatic Text Summarization using Text RankIRJET Journal
 
Project Presentation
Project PresentationProject Presentation
Project Presentationbutest
 
Hands on Mahout!
Hands on Mahout!Hands on Mahout!
Hands on Mahout!OSCON Byrum
 
Sparkling Water, ASK CRAIG
Sparkling Water, ASK CRAIGSparkling Water, ASK CRAIG
Sparkling Water, ASK CRAIGSri Ambati
 
IRJET - Visual Question Answering – Implementation using Keras
IRJET -  	  Visual Question Answering – Implementation using KerasIRJET -  	  Visual Question Answering – Implementation using Keras
IRJET - Visual Question Answering – Implementation using KerasIRJET Journal
 
220921_atttention_is_all_you_need_논문리뷰.pdf
220921_atttention_is_all_you_need_논문리뷰.pdf220921_atttention_is_all_you_need_논문리뷰.pdf
220921_atttention_is_all_you_need_논문리뷰.pdfminalang
 
Automated Essay Scoring Using Efficient Transformer-Based Language Models
Automated Essay Scoring Using Efficient Transformer-Based Language ModelsAutomated Essay Scoring Using Efficient Transformer-Based Language Models
Automated Essay Scoring Using Efficient Transformer-Based Language ModelsNat Rice
 
Concept Detection of Multiple Choice Questions using Transformer Based Models
Concept Detection of Multiple Choice Questions using Transformer Based ModelsConcept Detection of Multiple Choice Questions using Transformer Based Models
Concept Detection of Multiple Choice Questions using Transformer Based ModelsIRJET Journal
 
WordNet Based Online Reverse Dictionary with Improved Accuracy and Parts-of-S...
WordNet Based Online Reverse Dictionary with Improved Accuracy and Parts-of-S...WordNet Based Online Reverse Dictionary with Improved Accuracy and Parts-of-S...
WordNet Based Online Reverse Dictionary with Improved Accuracy and Parts-of-S...IRJET Journal
 
Word embeddings
Word embeddingsWord embeddings
Word embeddingsShruti kar
 

Similar to Deep learning Malaysia presentation 12/4/2017 (20)

Word_Embedding.pptx
Word_Embedding.pptxWord_Embedding.pptx
Word_Embedding.pptx
 
Word2Vec
Word2VecWord2Vec
Word2Vec
 
BERT - Part 2 Learning Notes
BERT - Part 2 Learning NotesBERT - Part 2 Learning Notes
BERT - Part 2 Learning Notes
 
IRJET - Automated Essay Grading System using Deep Learning
IRJET -  	  Automated Essay Grading System using Deep LearningIRJET -  	  Automated Essay Grading System using Deep Learning
IRJET - Automated Essay Grading System using Deep Learning
 
The Java Fx Platform – A Java Developer’S Guide
The Java Fx Platform – A Java Developer’S GuideThe Java Fx Platform – A Java Developer’S Guide
The Java Fx Platform – A Java Developer’S Guide
 
Word_Embeddings.pptx
Word_Embeddings.pptxWord_Embeddings.pptx
Word_Embeddings.pptx
 
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshopورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
 
Word embeddings
Word embeddingsWord embeddings
Word embeddings
 
"Applied Enterprise Metaprogramming in JavaScript", Vladyslav Dukhin
"Applied Enterprise Metaprogramming in JavaScript", Vladyslav Dukhin"Applied Enterprise Metaprogramming in JavaScript", Vladyslav Dukhin
"Applied Enterprise Metaprogramming in JavaScript", Vladyslav Dukhin
 
IRJET- Automatic Text Summarization using Text Rank
IRJET- Automatic Text Summarization using Text RankIRJET- Automatic Text Summarization using Text Rank
IRJET- Automatic Text Summarization using Text Rank
 
Project Presentation
Project PresentationProject Presentation
Project Presentation
 
Hands on Mahout!
Hands on Mahout!Hands on Mahout!
Hands on Mahout!
 
Sparkling Water, ASK CRAIG
Sparkling Water, ASK CRAIGSparkling Water, ASK CRAIG
Sparkling Water, ASK CRAIG
 
IRJET - Visual Question Answering – Implementation using Keras
IRJET -  	  Visual Question Answering – Implementation using KerasIRJET -  	  Visual Question Answering – Implementation using Keras
IRJET - Visual Question Answering – Implementation using Keras
 
220921_atttention_is_all_you_need_논문리뷰.pdf
220921_atttention_is_all_you_need_논문리뷰.pdf220921_atttention_is_all_you_need_논문리뷰.pdf
220921_atttention_is_all_you_need_논문리뷰.pdf
 
Automated Essay Scoring Using Efficient Transformer-Based Language Models
Automated Essay Scoring Using Efficient Transformer-Based Language ModelsAutomated Essay Scoring Using Efficient Transformer-Based Language Models
Automated Essay Scoring Using Efficient Transformer-Based Language Models
 
Concept Detection of Multiple Choice Questions using Transformer Based Models
Concept Detection of Multiple Choice Questions using Transformer Based ModelsConcept Detection of Multiple Choice Questions using Transformer Based Models
Concept Detection of Multiple Choice Questions using Transformer Based Models
 
WordNet Based Online Reverse Dictionary with Improved Accuracy and Parts-of-S...
WordNet Based Online Reverse Dictionary with Improved Accuracy and Parts-of-S...WordNet Based Online Reverse Dictionary with Improved Accuracy and Parts-of-S...
WordNet Based Online Reverse Dictionary with Improved Accuracy and Parts-of-S...
 
Word embeddings
Word embeddingsWord embeddings
Word embeddings
 
For project
For projectFor project
For project
 

Recently uploaded

Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxraffaeleoman
 
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...amilabibi1
 
Sector 62, Noida Call girls :8448380779 Noida Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Noida Escorts | 100% verifiedSector 62, Noida Call girls :8448380779 Noida Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Noida Escorts | 100% verifiedDelhi Call girls
 
Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Vipesco
 
Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatmentnswingard
 
Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...
Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...
Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...Pooja Nehwal
 
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfThe workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfSenaatti-kiinteistöt
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar TrainingKylaCullinane
 
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfAWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfSkillCertProExams
 
Digital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of DrupalDigital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of DrupalFabian de Rijk
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoKayode Fayemi
 
My Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle BaileyMy Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle Baileyhlharris
 
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...Delhi Call girls
 
lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lodhisaajjda
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaKayode Fayemi
 
Causes of poverty in France presentation.pptx
Causes of poverty in France presentation.pptxCauses of poverty in France presentation.pptx
Causes of poverty in France presentation.pptxCamilleBoulbin1
 
Dreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIIDreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIINhPhngng3
 

Recently uploaded (18)

Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
 
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
 
Sector 62, Noida Call girls :8448380779 Noida Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Noida Escorts | 100% verifiedSector 62, Noida Call girls :8448380779 Noida Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Noida Escorts | 100% verified
 
Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510
 
Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatment
 
Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...
Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...
Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...
 
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfThe workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar Training
 
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfAWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
 
Digital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of DrupalDigital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of Drupal
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
 
My Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle BaileyMy Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle Bailey
 
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
 
lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New Nigeria
 
Causes of poverty in France presentation.pptx
Causes of poverty in France presentation.pptxCauses of poverty in France presentation.pptx
Causes of poverty in France presentation.pptx
 
ICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdfICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdf
 
Dreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIIDreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio III
 

Deep learning Malaysia presentation 12/4/2017

  • 1. Commercially available tools Open source tools Word2vec Words embedding model RNN for generating Bahasa text LSTM in theory Integrating Words embedding model into LSTM Automatic question and reply engine Overview of NLP using Deep Learning Part 1
  • 2. Open Source Tools • Python : NLTK and Textblob – POS Tagging: Part of speech tagging – Named Entities Recognition: • "Steve job" -> person • "Apple" -> organization – Semantic Identification – Lemmatization and Stemming: reduce words to base form • R: TM http://textminingonline.com/getting-started-with- - Word Tokenization - Sentence Tokenization - Part-of-speech tagging - Noun phrase extraction - Sentiment analysis - Word Pluralization - Word Singularization - Spelling correction - Parsing - Classification (Naive Bayes, Decision Tree) - Language translation and detection powered by - Google Translate - Word and phrase frequencies - n-grams - Word inflection (pluralization andsingularization) and lemmatization - JSON serialization - Add new models or languages through extensions - WordNet integration
  • 5. Using Word2vec • Consider the training corpus having the following sentences: – “the dog saw a cat” – “the dog chased the cat” – “the cat climbed a tree” • The corpus vocabulary has eight words. Once ordered alphabetically, each word can be referenced by its index, i.e. a, cat, chased, climbed, dog, saw, the, tree}. For this example, the neural network will have eight input neurons and eight output neurons. Let us assume that we decide to use three neurons in the hidden layer. This means that Winput and Woutput will be 8×3 and 3×8 matrices, respectively. Before training begins, these matrices are initialized to small random values as is usual in neural network training. Just for the illustration sake, let us assume Winput and Woutput to be initialized to the following values: https://iksinc.wordpress.com/tag/word2vec/https://www.slideshare.net/hen_drik/word2vec-from-theory-to-practice [,1] [,2] [,3] [1,] -0.92513658 -0.743787260 1.6273785 [2,] 0.08458616 -1.258307794 0.4852640 [3,] 0.83675919 -0.001426922 -0.1703800 [4,] 0.94409916 0.018061199 -0.6304152 [5,] 0.04691568 -1.599246381 -0.8439630 [6,] -0.82112415 -1.084833252 0.1231866 [7,] -0.93035265 0.003375370 -1.3572083 [8,] -1.31701003 0.659632590 0.1134216 Winput = Woutput = [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [1,] -1.3907119 0.5259696 1.0829041 1.9993983 0.3370346 1.4518856 0.4802576 0.3931751 [2,] -0.5347725 0.6776164 -0.3288658 -0.1287490 -0.5626609 0.6886097 -1.3618653 1.0593093 [3,] -0.9392639 -0.3924568 0.6765556 0.5703951 0.6843841 -0.9567421 -0.3512964 1.8581440
  • 6. Using Word2vec • Suppose we want the network to learn relationship between the words “cat” and “climbed”. That is, the network should show a high probability for “climbed” when “cat” is inputted to the network. In word embedding terminology, the word “cat” is referred as the context word and the word “climbed” is referred as the target word. • cat --> climbed • In this case, the input vector X will be [0 1 0 0 0 0 0 0]. Notice that only the second component of the vector is 1. This is because the input word is “cat” which is holding number two position in sorted list of corpus words. Given that the target word is “climbed”, the target vector will look like • [0 0 0 1 0 0 0 0 ]. • With the input vector representing “cat”, the output at the hidden layer neurons can be computed as: https://iksinc.wordpress.com/tag/word2vec/https://www.slideshare.net/hen_drik/word2vec-from-theory-to-practice Ht = X.Winput [,1] [,2] [,3] [1,] -0.92513658 -0.743787260 1.6273785 [2,] 0.08458616 -1.258307794 0.4852640 [3,] 0.83675919 -0.001426922 -0.1703800 [4,] 0.94409916 0.018061199 -0.6304152 [5,] 0.04691568 -1.599246381 -0.8439630 [6,] -0.82112415 -1.084833252 0.1231866 [7,] -0.93035265 0.003375370 -1.3572083 [8,] -1.31701003 0.659632590 0.1134216
  • 7. Using Word2vec • It should not surprise us that the vector Ht of hidden neuron outputs mimics the weights of the second row of Winput matrix because of 1-out-of-V representation. So the function of the input to hidden layer connections is basically to copy the input word vector to hidden layer. Carrying out similar manipulations for hidden to output layer, the activation vector for output layer neurons can be written as: • Since the goal is produce probabilities for words in the output layer, Pr(wordk|wordcontext) for k = 1, V, to reflect their next word relationship with the context word at input, we need the sum of neuron outputs in the output layer to add to one. Word2vec achieves this by converting activation values of output layer neurons to probabilities using the softmax function. Thus, the output of the k- th neuron is computed by the following expression where activation(n) represents the activation value of the n-th output layer neuron https://iksinc.wordpress.com/tag/word2vec/https://www.slideshare.net/hen_drik/word2vec-from-theory-to-practice Ht .Woutput = [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [1,] 0.09948251 -0.9986054 0.8337211 0.6079195 1.068616 -1.207946 1.583797 -0.3979896 Target vector = [0 0 0 1 0 0 0 0 ].
  • 8. Using Word2vec • Thus, the probabilities for eight words in the corpus are: • The probability in bold is for the chosen target word “climbed”. Given the target vector [0 0 0 1 0 0 0 0 ], the error vector for the output layer is easily computed by subtracting the probability vector from the target vector. Once the error is known, the weights in the matrices Winput and Woutput can be updated using backpropagation. Thus, the training can proceed by presenting different context-target words pair from the corpus. In essence, this is how Word2vec learns relationships between words and in the process develops vector representations for words in the corpus. https://iksinc.wordpress.com/tag/word2vec/https://www.slideshare.net/hen_drik/word2vec-from-theory-to-practice [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] 0.0768 0.0256 0.1602 0.127 0.2026 0.0207 0.339 0.0467
  • 9. Word2vec • CBOW: – Predict the current word based on the context – Order of words in the history does not influence the projection – Faster & more appropriate for larger corpora https://iksinc.wordpress.com/tag/word2vec/https://www.slideshare.net/hen_drik/word2vec-from-theory-to-practice
  • 10. Word2vec • Continuous Skip Gram Model: – maximize classification of a word based on another word in the same sentence – better word vectors for frequent words, but slower to train https://iksinc.wordpress.com/tag/word2vec/https://www.slideshare.net/hen_drik/word2vec-from-theory-to-practice
  • 11. Using Word2vec - CBOW • The above description and architecture is meant for learning relationships between pair of words. In the continuous bag of words model, context is represented by multiple words for a given target words. For example, we could use “cat” and “tree” as context words for “climbed” as the target word. This calls for a modification to the neural network architecture. The modification, shown below, consists of replicating the input to hidden layer connections C times, the number of context words, and adding a divide by C operation in the hidden layer neurons. • With the above configuration to specify C context words, each word being coded using 1-out-of-V representation means that the hidden layer output is the average of word vectors corresponding to context words at input. The output layer remains the same https://iksinc.wordpress.com/tag/word2vec/https://www.slideshare.net/hen_drik/word2vec-from-theory-to-practice cat the mat sat
  • 12. Using Word2vec - SGM • Skip-gram model reverses the use of target and context words. In this case, the target word is fed at the input, the hidden layer remains the same, and the output layer of the neural network is replicated multiple times to accommodate the chosen number of context words. Taking the example of “cat” and “tree” as context words and “climbed” as the target word, the input vector in the skim-gram model would be [0 0 0 1 0 0 0 0 ], while the two output layers would have [0 1 0 0 0 0 0 0] and [0 0 0 0 0 0 0 1 ] as target vectors respectively. • In place of producing one vector of probabilities, two such vectors would be produced for the current example. The error vector for each output layer is produced in the manner as discussed above. However, the error vectors from all output layers are summed up to adjust the weights via backpropagation. This ensures that weight matrix Woutput for each output layer remains identical all through training. https://iksinc.wordpress.com/tag/word2vec/https://www.slideshare.net/hen_drik/word2vec-from-theory-to-practice cat the mat sat
  • 15. Using Word2vec https://iksinc.wordpress.com/tag/word2vec/https://www.slideshare.net/hen_drik/word2vec-from-theory-to-practice *word vectors capture many linguistic similarities * vector operations: V(Paris) - V(France) +V(Italy) results in a vector which is very close to V(Rome) V(Beijing) - V(China) + V(Poland) results in a vector which is very close to V(Warsaw) V(King) - V(Woman) + V(Man) results in a vector which is very close to V(Queen)
  • 16. Using Word2vec • Original: http://word2vec.googlecode.com/svn/trunk/ • C++11 version: https://github.com/jdeng/word2vec • Python: http://radimrehurek.com/gensim/models/ word2vec.html • R: https://github.com/bmschmidt/wordVectors • Java: https://github.com/ansjsun/word2vec_java • Parallel java: https://github.com/siegfang/word2vec • Spark: https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/mllib/feature/Word2Vec.html • CUDAversion: https://github.com/whatupbiatch/cuda-word2vec • Numpy implementation: https://github.com/lazyprogrammer/machine_learning_examples/blob/master/nlp_class2/word2vec.py https://iksinc.wordpress.com/tag/word2vec/https://www.slideshare.net/hen_drik/word2vec-from-theory-to-practice
  • 17. RNN for generating Bahasa text • Simple implementation of RNN of tensorflow for word-level language model • https://github.com/hunkim/word-rnn-tensorflow/blob/master/model.py • Small traininging dataset (text file of size 3.1MB), it has cleaned Malay text with complete sentences. • number of hidden states is 50, number of rnn layer is 2, number of rnn sequence length is 20 and number of epochs is 10. • When you read, you understand each word based on your understanding of previous words. You don’t throw everything away and start thinking from scratch again. Your thoughts have persistence. Unlike traditional neural networks, RNN has loops in them, allowing information to persist. A recurrent neural network can be thought of as multiple copies of the same network, each passing a message to a successor. This chain-like nature reveals that recurrent neural networks are intimately related to sequences and lists. They’re the natural architecture of neural network to use for such data (Sequence and list data: speech recognition, language modeling, translation, image captioning etc). http://colah.github.io/posts/2015-08-Understanding-LSTMs/
  • 18. RNN for generating Bahasa text • The hidden state at time step t is h_t. It is a function of the input at the same time step x_t, modified by a weight matrix W (like the one we used for feedforward nets) added to the hidden state of the previous time step h_t-1 multiplied by its own hidden-state-to- hidden-state matrix U, otherwise known as a transition matrix and similar to a Markov chain. The weight matrices are filters that determine how much importance to accord to both the present input and the past hidden state. The error they generate will return via backpropagation and be used to adjust their weights until error can’t go any lower. • The sum of the weight input and hidden state is squashed by the function φ – either a logistic sigmoid function or tanh, depending – which is a standard tool for condensing very large or very small values into a logistic space, as well as making gradients workable for backpropagation. • Because this feedback loop occurs at every time step in the series, each hidden state contains traces not only of the previous hidden state, but also of all those that preceded h_t-1 for as long as memory can persist. • Given a series of letters, a recurrent will use the first character to help determine its perception of the second character, such that an initial q might lead it to infer that the next letter will be u, while an initial t might lead it to infer that the next letter will be h. • Since recurrent nets span time, they are probably best illustrated with animation (the first vertical line of nodes to appear can be thought of as a feedforward network, which becomes recurrent as it unfurls over time). http://colah.github.io/posts/2015-08-Understanding-LSTMs/
  • 19. Commercially available tools Open source tools Word2vec Words embedding model RNN for generating Bahasa text LSTM in theory Integrating Words embedding model into LSTM Automatic question and reply engine Overview of NLP using Deep Learning Part 2
  • 20. Personal Profile • Advisory Data Scientist (IBM Malaysia) • Linkedin: https://www.linkedin.com/in/brian-ho-34068a36/ • Github Blog: https://kimusu2008.github.io/ • Github Account: https://github.com/kimusu2008