SlideShare a Scribd company logo
1 of 16
Download to read offline
Unsupervised Learning of Sentence Embeddings using
Compositional n-Gram Features
2018/07/12 M1 Hiroki Shimanaka
Matteo Pagliardini, Prakhar Gupta and Martin Jaggi
NAACL 2018
1
lUnsupervised training of word representation, such as Word2Vec
[Mikolov et al., 2013] is now routinely trained on very large amounts of
raw text data, and have become ubiquitous building blocks of a majority
of current state-of-the-art NLP applications.
lWhile very useful semantic representations are available for words.
Abstract & Introduction (1)
2
lA strong trend in deep-learning for NLP leads towards increasingly
powerful and com-plex models.
ØWhile extremely strong in expressiveness, the increased model complexity
makes such models much slower to train on larger datasets.
lOn the other hand, simpler “shallow” models such as matrix
factorizations (or bilinear models) can benefit from training on much
larger sets of data, which can be a key advantage, especially in the
unsupervised setting.
Abstract & Introduction (2)
Abstract & Introduction (3)
lThe authors present a simple but efficient unsupervised objective to
train distributed representations of sentences.
lTheir method outperforms the state-of-the-art unsupervised models
on most benchmark tasks, highlighting the robustness of the produced
general-purpose sentence embeddings.
3
4
Approach
Their proposed model (Sent2Vec) can be seen as an extension of the Cl -
BOW [Mikolov et al., 2013] training objecKve to train sentence instead
of word embeddings.
Sent2Vec is a simple unsupervised model allowing to coml -pose
sentence embeddings using word vectors along with n-gram
embeddings, simultaneously training composiKon and the embed-ding
vectors themselves.
5
Model (1)
lTheir model is inspired by simple matrix factor models (bilinear
models) such as recently very successfully used in unsupervised learning
of word embeddings.
!∈ℝ$×&
: target word vectors
'∈ℝ&×|)|: learnt source word vectors
): vocabulary
ℎ: hidden size
ι4 ∈{0, 1}|)|
:binary vector encoding :
:: sentence
6
Model (2)
lConceptually, Sent2Vec can be interpreted as a natural extension of the
word-contexts from C-BOW [Mikolov et al., 2013] to a larger sentence
context.
!(#): the list of n−grams (including un−igrams)
ι&(') ∈{0, 1}|/|: binary vector encoding !(#) #: sentence
67: source (or context) embed−ding
7
Model (3)
lNegative sampling
!"#
:the set of words sampled nega6vely for the word $% ∈ '
': sentence.": source (or context) embed−ding
/": target embedding
0":the normalized frequency of $ in the corpus
8
Model (4)
lSubsampling
lTo select the possible target unigrams (positives), they use subsampling as in
[Joulin et al., 2017; Bojanowski et al., 2017], each word ! being discarded
with probability 1 − $%(!)
()*
:the set of words sampled negaJvely for the word !+ ∈ -
-: sentence
4): source (or context) embed−ding
5): target embed−ding$6 ! ≔
8)
∑):∈; 8):
8):the normalized frequency of ! in the corpus
9
Experimental Setting (1)
lDataset:
lthe Toronto book corpus (70M sentences)
lWikipedia sentences and tweets
lDropout:
lFor each sentence they use dropout on its list of n-grams ! " ∖ {%(")},
where %(") is the set of all unigrams contained in sentence ".
lThey find that dropping K n-grams (n > 1) for each sentence is giving superior
results compared to dropping each token with some fixed probability.
lRegularization
lApplying L1 regularization to the word vectors.
lOptimizer
lSGD
10
Experimental Setting (2)
11
Evalua&on Tasks
lTransfer tasks:
lSupervised
lMR: movie review senti-ment
lCR: product reviews
lSUBJ: subjectivity classification
lMPQA: opinion polarity
lTREC: question type classifi-cation
lMSRP: paraphrase identification
lUnsupervised
lSICK: semantic relatedness task
lSTS : Semantic textual similarlity
12
Experimental Results (1)
13
Experimental Results (2)
14
Experimental Results (3)
15
Conclusion
lIn this paper, they introduce a novel, computationally efficient, unsupervised,
C-BOW-inspired method to train and infer sentence embeddings.
lOn supervised evaluations, their method, on an average, achieves better
performance than all other unsupervised competitors with the exception of
Skip-Thought [Kiros et al., 2015].
lHowever, their model is generalizable, extremely fast to train, simple to
understand and easily interpretable, showing the relevance of simple and
well-grounded representation models in contrast to the models using deep
architectures.
lFuture work could focus on augmenting the model to exploit data with
ordered sentences.

More Related Content

What's hot

Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentationSurya Sg
 
Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Bhaskar Mitra
 
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastTextGDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastTextrudolf eremyan
 
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...ijtsrd
 
[Emnlp] what is glo ve part i - towards data science
[Emnlp] what is glo ve  part i - towards data science[Emnlp] what is glo ve  part i - towards data science
[Emnlp] what is glo ve part i - towards data scienceNikhil Jaiswal
 
Centroid-based Text Summarization through Compositionality of Word Embeddings
Centroid-based Text Summarization through Compositionality of Word EmbeddingsCentroid-based Text Summarization through Compositionality of Word Embeddings
Centroid-based Text Summarization through Compositionality of Word EmbeddingsGaetano Rossiello, PhD
 
5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information RetrievalBhaskar Mitra
 
Basic review on topic modeling
Basic review on  topic modelingBasic review on  topic modeling
Basic review on topic modelingHiroyuki Kuromiya
 
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...ijaia
 
Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...Innovation Quotient Pvt Ltd
 
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...Domain-Specific Term Extraction for Concept Identification in Ontology Constr...
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...Innovation Quotient Pvt Ltd
 
text summarization using amr
text summarization using amrtext summarization using amr
text summarization using amramit nagarkoti
 
Word2vec: From intuition to practice using gensim
Word2vec: From intuition to practice using gensimWord2vec: From intuition to practice using gensim
Word2vec: From intuition to practice using gensimEdgar Marca
 

What's hot (20)

What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?
 
Probabilistic content models,
Probabilistic content models,Probabilistic content models,
Probabilistic content models,
 
Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentation
 
Language models
Language modelsLanguage models
Language models
 
Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)
 
AINL 2016: Nikolenko
AINL 2016: NikolenkoAINL 2016: Nikolenko
AINL 2016: Nikolenko
 
Word2Vec
Word2VecWord2Vec
Word2Vec
 
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastTextGDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
 
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
 
Tensorflow
TensorflowTensorflow
Tensorflow
 
[Emnlp] what is glo ve part i - towards data science
[Emnlp] what is glo ve  part i - towards data science[Emnlp] what is glo ve  part i - towards data science
[Emnlp] what is glo ve part i - towards data science
 
Centroid-based Text Summarization through Compositionality of Word Embeddings
Centroid-based Text Summarization through Compositionality of Word EmbeddingsCentroid-based Text Summarization through Compositionality of Word Embeddings
Centroid-based Text Summarization through Compositionality of Word Embeddings
 
5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval
 
Basic review on topic modeling
Basic review on  topic modelingBasic review on  topic modeling
Basic review on topic modeling
 
Topic Modeling
Topic ModelingTopic Modeling
Topic Modeling
 
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
 
Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...
 
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...Domain-Specific Term Extraction for Concept Identification in Ontology Constr...
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...
 
text summarization using amr
text summarization using amrtext summarization using amr
text summarization using amr
 
Word2vec: From intuition to practice using gensim
Word2vec: From intuition to practice using gensimWord2vec: From intuition to practice using gensim
Word2vec: From intuition to practice using gensim
 

Similar to [Paper Reading] Unsupervised Learning of Sentence Embeddings using Compositional n-Gram Features

NLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic ClassificationNLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic ClassificationEugene Nho
 
Turkish language modeling using BERT
Turkish language modeling using BERTTurkish language modeling using BERT
Turkish language modeling using BERTAbdurrahimDerric
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesMatthew Lease
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Textkevig
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Textkevig
 
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSCONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSijseajournal
 
Knowledge distillation deeplab
Knowledge distillation deeplabKnowledge distillation deeplab
Knowledge distillation deeplabFrozen Paradise
 
TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...
TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...
TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...ijsc
 
Texts Classification with the usage of Neural Network based on the Word2vec’s...
Texts Classification with the usage of Neural Network based on the Word2vec’s...Texts Classification with the usage of Neural Network based on the Word2vec’s...
Texts Classification with the usage of Neural Network based on the Word2vec’s...ijsc
 
SENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORK
SENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORKSENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORK
SENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORKijnlc
 
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural NetworkSentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Networkkevig
 
Natural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 ReviewNatural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 Reviewchangedaeoh
 
SNLI_presentation_2
SNLI_presentation_2SNLI_presentation_2
SNLI_presentation_2Viral Gupta
 
Neural word embedding and language modelling
Neural word embedding and language modellingNeural word embedding and language modelling
Neural word embedding and language modellingRiddhi Jain
 
Sentence Validation by Statistical Language Modeling and Semantic Relations
Sentence Validation by Statistical Language Modeling and Semantic RelationsSentence Validation by Statistical Language Modeling and Semantic Relations
Sentence Validation by Statistical Language Modeling and Semantic RelationsEditor IJCATR
 
Challenges in transfer learning in nlp
Challenges in transfer learning in nlpChallenges in transfer learning in nlp
Challenges in transfer learning in nlpLaraOlmosCamarena
 
IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...
IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...
IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...IRJET Journal
 

Similar to [Paper Reading] Unsupervised Learning of Sentence Embeddings using Compositional n-Gram Features (20)

NLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic ClassificationNLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic Classification
 
Turkish language modeling using BERT
Turkish language modeling using BERTTurkish language modeling using BERT
Turkish language modeling using BERT
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
 
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSCONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
 
Knowledge distillation deeplab
Knowledge distillation deeplabKnowledge distillation deeplab
Knowledge distillation deeplab
 
TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...
TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...
TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...
 
Texts Classification with the usage of Neural Network based on the Word2vec’s...
Texts Classification with the usage of Neural Network based on the Word2vec’s...Texts Classification with the usage of Neural Network based on the Word2vec’s...
Texts Classification with the usage of Neural Network based on the Word2vec’s...
 
1808.10245v1 (1).pdf
1808.10245v1 (1).pdf1808.10245v1 (1).pdf
1808.10245v1 (1).pdf
 
SENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORK
SENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORKSENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORK
SENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORK
 
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural NetworkSentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
 
Natural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 ReviewNatural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 Review
 
SNLI_presentation_2
SNLI_presentation_2SNLI_presentation_2
SNLI_presentation_2
 
Neural word embedding and language modelling
Neural word embedding and language modellingNeural word embedding and language modelling
Neural word embedding and language modelling
 
Sentence Validation by Statistical Language Modeling and Semantic Relations
Sentence Validation by Statistical Language Modeling and Semantic RelationsSentence Validation by Statistical Language Modeling and Semantic Relations
Sentence Validation by Statistical Language Modeling and Semantic Relations
 
Challenges in transfer learning in nlp
Challenges in transfer learning in nlpChallenges in transfer learning in nlp
Challenges in transfer learning in nlp
 
IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...
IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...
IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...
 

More from Hiroki Shimanaka

[Tutorial] Sentence Representation
[Tutorial] Sentence Representation[Tutorial] Sentence Representation
[Tutorial] Sentence RepresentationHiroki Shimanaka
 
[論文紹介] Reference Bias in Monolingual Machine Translation Evaluation
[論文紹介] Reference Bias in Monolingual Machine Translation Evaluation[論文紹介] Reference Bias in Monolingual Machine Translation Evaluation
[論文紹介] Reference Bias in Monolingual Machine Translation EvaluationHiroki Shimanaka
 
[論文紹介] ReVal: A Simple and Effective Machine Translation Evaluation Metric Ba...
[論文紹介] ReVal: A Simple and Effective Machine Translation Evaluation Metric Ba...[論文紹介] ReVal: A Simple and Effective Machine Translation Evaluation Metric Ba...
[論文紹介] ReVal: A Simple and Effective Machine Translation Evaluation Metric Ba...Hiroki Shimanaka
 
[論文紹介] PARANMT-50M- Pushing the Limits of Paraphrastic Sentence Embeddings wi...
[論文紹介] PARANMT-50M- Pushing the Limits of Paraphrastic Sentence Embeddings wi...[論文紹介] PARANMT-50M- Pushing the Limits of Paraphrastic Sentence Embeddings wi...
[論文紹介] PARANMT-50M- Pushing the Limits of Paraphrastic Sentence Embeddings wi...Hiroki Shimanaka
 
[論文紹介] AN EFFICIENT FRAMEWORK FOR LEARNING SENTENCE REPRESENTATIONS.
[論文紹介] AN EFFICIENT FRAMEWORK FOR LEARNING SENTENCE REPRESENTATIONS.[論文紹介] AN EFFICIENT FRAMEWORK FOR LEARNING SENTENCE REPRESENTATIONS.
[論文紹介] AN EFFICIENT FRAMEWORK FOR LEARNING SENTENCE REPRESENTATIONS.Hiroki Shimanaka
 
[論文紹介] Are BLEU and Meaning Representation in Opposition?
[論文紹介] Are BLEU and Meaning Representation in Opposition?[論文紹介] Are BLEU and Meaning Representation in Opposition?
[論文紹介] Are BLEU and Meaning Representation in Opposition?Hiroki Shimanaka
 
[論文紹介] Skip-Thought Vectors
[論文紹介] Skip-Thought Vectors[論文紹介] Skip-Thought Vectors
[論文紹介] Skip-Thought VectorsHiroki Shimanaka
 

More from Hiroki Shimanaka (7)

[Tutorial] Sentence Representation
[Tutorial] Sentence Representation[Tutorial] Sentence Representation
[Tutorial] Sentence Representation
 
[論文紹介] Reference Bias in Monolingual Machine Translation Evaluation
[論文紹介] Reference Bias in Monolingual Machine Translation Evaluation[論文紹介] Reference Bias in Monolingual Machine Translation Evaluation
[論文紹介] Reference Bias in Monolingual Machine Translation Evaluation
 
[論文紹介] ReVal: A Simple and Effective Machine Translation Evaluation Metric Ba...
[論文紹介] ReVal: A Simple and Effective Machine Translation Evaluation Metric Ba...[論文紹介] ReVal: A Simple and Effective Machine Translation Evaluation Metric Ba...
[論文紹介] ReVal: A Simple and Effective Machine Translation Evaluation Metric Ba...
 
[論文紹介] PARANMT-50M- Pushing the Limits of Paraphrastic Sentence Embeddings wi...
[論文紹介] PARANMT-50M- Pushing the Limits of Paraphrastic Sentence Embeddings wi...[論文紹介] PARANMT-50M- Pushing the Limits of Paraphrastic Sentence Embeddings wi...
[論文紹介] PARANMT-50M- Pushing the Limits of Paraphrastic Sentence Embeddings wi...
 
[論文紹介] AN EFFICIENT FRAMEWORK FOR LEARNING SENTENCE REPRESENTATIONS.
[論文紹介] AN EFFICIENT FRAMEWORK FOR LEARNING SENTENCE REPRESENTATIONS.[論文紹介] AN EFFICIENT FRAMEWORK FOR LEARNING SENTENCE REPRESENTATIONS.
[論文紹介] AN EFFICIENT FRAMEWORK FOR LEARNING SENTENCE REPRESENTATIONS.
 
[論文紹介] Are BLEU and Meaning Representation in Opposition?
[論文紹介] Are BLEU and Meaning Representation in Opposition?[論文紹介] Are BLEU and Meaning Representation in Opposition?
[論文紹介] Are BLEU and Meaning Representation in Opposition?
 
[論文紹介] Skip-Thought Vectors
[論文紹介] Skip-Thought Vectors[論文紹介] Skip-Thought Vectors
[論文紹介] Skip-Thought Vectors
 

Recently uploaded

AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsvanyagupta248
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxMuhammadAsimMuhammad6
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARKOUSTAV SARKAR
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxJuliansyahHarahap1
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdfKamal Acharya
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxmaisarahman1
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...drmkjayanthikannan
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilVinayVitekari
 
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptxrouholahahmadi9876
 
Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Ramkumar k
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.Kamal Acharya
 
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...jabtakhaidam7
 
Digital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptxDigital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptxpritamlangde
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayEpec Engineered Technologies
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdfKamal Acharya
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptxJIT KUMAR GUPTA
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VDineshKumar4165
 

Recently uploaded (20)

AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech Civil
 
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
 
Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
 
Digital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptxDigital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptx
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 

[Paper Reading] Unsupervised Learning of Sentence Embeddings using Compositional n-Gram Features

  • 1. Unsupervised Learning of Sentence Embeddings using Compositional n-Gram Features 2018/07/12 M1 Hiroki Shimanaka Matteo Pagliardini, Prakhar Gupta and Martin Jaggi NAACL 2018
  • 2. 1 lUnsupervised training of word representation, such as Word2Vec [Mikolov et al., 2013] is now routinely trained on very large amounts of raw text data, and have become ubiquitous building blocks of a majority of current state-of-the-art NLP applications. lWhile very useful semantic representations are available for words. Abstract & Introduction (1)
  • 3. 2 lA strong trend in deep-learning for NLP leads towards increasingly powerful and com-plex models. ØWhile extremely strong in expressiveness, the increased model complexity makes such models much slower to train on larger datasets. lOn the other hand, simpler “shallow” models such as matrix factorizations (or bilinear models) can benefit from training on much larger sets of data, which can be a key advantage, especially in the unsupervised setting. Abstract & Introduction (2)
  • 4. Abstract & Introduction (3) lThe authors present a simple but efficient unsupervised objective to train distributed representations of sentences. lTheir method outperforms the state-of-the-art unsupervised models on most benchmark tasks, highlighting the robustness of the produced general-purpose sentence embeddings. 3
  • 5. 4 Approach Their proposed model (Sent2Vec) can be seen as an extension of the Cl - BOW [Mikolov et al., 2013] training objecKve to train sentence instead of word embeddings. Sent2Vec is a simple unsupervised model allowing to coml -pose sentence embeddings using word vectors along with n-gram embeddings, simultaneously training composiKon and the embed-ding vectors themselves.
  • 6. 5 Model (1) lTheir model is inspired by simple matrix factor models (bilinear models) such as recently very successfully used in unsupervised learning of word embeddings. !∈ℝ$×& : target word vectors '∈ℝ&×|)|: learnt source word vectors ): vocabulary ℎ: hidden size ι4 ∈{0, 1}|)| :binary vector encoding : :: sentence
  • 7. 6 Model (2) lConceptually, Sent2Vec can be interpreted as a natural extension of the word-contexts from C-BOW [Mikolov et al., 2013] to a larger sentence context. !(#): the list of n−grams (including un−igrams) ι&(') ∈{0, 1}|/|: binary vector encoding !(#) #: sentence 67: source (or context) embed−ding
  • 8. 7 Model (3) lNegative sampling !"# :the set of words sampled nega6vely for the word $% ∈ ' ': sentence.": source (or context) embed−ding /": target embedding 0":the normalized frequency of $ in the corpus
  • 9. 8 Model (4) lSubsampling lTo select the possible target unigrams (positives), they use subsampling as in [Joulin et al., 2017; Bojanowski et al., 2017], each word ! being discarded with probability 1 − $%(!) ()* :the set of words sampled negaJvely for the word !+ ∈ - -: sentence 4): source (or context) embed−ding 5): target embed−ding$6 ! ≔ 8) ∑):∈; 8): 8):the normalized frequency of ! in the corpus
  • 10. 9 Experimental Setting (1) lDataset: lthe Toronto book corpus (70M sentences) lWikipedia sentences and tweets lDropout: lFor each sentence they use dropout on its list of n-grams ! " ∖ {%(")}, where %(") is the set of all unigrams contained in sentence ". lThey find that dropping K n-grams (n > 1) for each sentence is giving superior results compared to dropping each token with some fixed probability. lRegularization lApplying L1 regularization to the word vectors. lOptimizer lSGD
  • 12. 11 Evalua&on Tasks lTransfer tasks: lSupervised lMR: movie review senti-ment lCR: product reviews lSUBJ: subjectivity classification lMPQA: opinion polarity lTREC: question type classifi-cation lMSRP: paraphrase identification lUnsupervised lSICK: semantic relatedness task lSTS : Semantic textual similarlity
  • 16. 15 Conclusion lIn this paper, they introduce a novel, computationally efficient, unsupervised, C-BOW-inspired method to train and infer sentence embeddings. lOn supervised evaluations, their method, on an average, achieves better performance than all other unsupervised competitors with the exception of Skip-Thought [Kiros et al., 2015]. lHowever, their model is generalizable, extremely fast to train, simple to understand and easily interpretable, showing the relevance of simple and well-grounded representation models in contrast to the models using deep architectures. lFuture work could focus on augmenting the model to exploit data with ordered sentences.