SlideShare a Scribd company logo
1 of 19
Dynamic Pooling and Unfolding Recursive 
Autoencoders for Paraphrase Detection 
R. Socher, et al, 2011 
Presenter: Shun Yoshida
Purpose of This Paper 
Objective: To detect paraphrase 
S1 The judge also refused to postpone the trial date of 
Sept. 29. 
S2 Obus also denied a defense motion to postpone the 
September trial date. 
➔Identifying paraphrases is an important task for 
information retrieval, text summarization, 
evaluation of machine translation etc. 
Relevance to My Research: 
This can help me to classify sentiment more precisely 
1
Word Representation 
In general, words are represented as vectors. 
1. One-hot representation 
This assigns ID to each word individually. 
2 
[ 0,0,…,1,0,…,0] 
Problem: 
• Very sparse 
• High dimension 
• Unable to measure the similarity 
between words 
1:apple 
2:book 
⋮ 
200:zoo 
⋮ 
Vocabulary
Word Representation 
2. Distributed Representation 
:word embedding 
This method aims to learn this representation 
Merit: 
• Low dimension 
• Similar words take similar vector 
3 
zoo [ 1.5, 1.8, 0.3, 4 ] 
This represents the semantic, syntactic information
Autoencoder 
 One kind of neural networks 
 #units of hidden is less 
than #units of input 
 Trained to reconstruct its 
own input 
4 
➔Enable to learn low dimensional representations 
which capture the information well
Autoencoder 
푊푑:weight of decode 
푊푒 :weight of encode 
Considered as binary tree; 
Input:2 childs [푐1; 푐2] ∈ ℝ2푛 Hidden:푝 ∈ ℝ푛 
5 
푥 ∈ ℝ푛:word embedding 
(initialized by neural language model) 
푐1 푐2 
푝 
푐1 ′ 
푐2 ′ 
childs to parent: 
reconstruction: 
reconstruction error:
Recursive Auto Encoders 
The dimension of child and parent is same, 
thus we can repeat same step until full tree is 
constructed. 
6 
phrase vector word embedding 
reconstruction error of tree:
Unfolding RAE 
Unfolding RAE tries to encode each hidden layer such 
that it best reconstructs its entire subtree to the leaf 
nodes. 
7
Why Unfolding RAE? 
Problem of RAE: 
• Equal weight to both children 
though each child could 
represent a different number of 
words 
• Lowers 퐸푟푒푐 by making the 
hidden layer very small 
➔Unfolding RAE can solve there 
problems. 
8 
1word 3words
RAE Training 
Training is computed by minimizing 
the sum of all node’s and all tree’s reconstruction error. 
퐸푟푒푐 (푡표푡푎푙) is function of 푥 (word embedding) 
and 푊푑 , 푊푒 (weight of neural network) 
➔Able to obtain word embeddings and phrase vectors 
after training 
9
Similarity Matrix 
After training, we compute the similarities (Euclidean 
distances) between all word and phrase vectors of the 
two sentences. 
These distances fill a similarity matrix 풮. 
10 
S[3,4] represents the similarity between node 4 of 
sentence1(mice) and node 3 of sentence2 (mice). 
➔zero distance
Why Dynamic Pooling? 
Classifying from average distance or histogram distances 
of 풮 does not result in good performance. 
➔Need to feed 풮 into a classifier. 
Problem: 
The matrix dimensions vary based on the sentence 
length 
풮 ∈ ℝ 2푛−1 ×(2푚−1) 
Solution: 
Map 풮 into a matrix 풮푝표표푙 of fixed size 
풮푝표표푙 ∈ ℝ푛푝×푛푝 
➔Dynamic Pooling 
11
Dynamic Pooling 12 
Example: 
푛푝 = 3 (2푛 − 1, 2푚 − 1 are divisible by 푛푝) 
2푛 − 1 = 3 
2푚 − 1 = 9 
1. Produce an 푛푝 × 푛푝 grid 
grid window size: 2푛−1 
푛푝 
× 
take 
minimum 
2푚−1 
푛푝 
=1×3 
푛푝 = 3 
푛푝 = 3 
2. Define element of 풮푝표표푙 to be minimum value of 
each grid 
(small value means that there are similar words or phrases in 
both sentences, thus take minimum to keep this information)
Dynamic Pooling 13 
Example: 
푛푝 = 2 (2푛 − 1, 2푚 − 1 are NOT divisible by 푛푝) 
2푛 − 1 = 3 
2푚 − 1 = 9 
1. Produce an 푛푝 × 푛푝 grid 
grid window size: 2푛−1 
푛푝 
× 
2푚−1 
푛푝 
=1×4 
2. Distribute remaining rows/columns to the last M 
grid. 
푛푝 = 2 
푛푝 = 2 
take 
minimum
Experiments 
1. Does autoencoders capture the phrase information? 
➔Unfolding RAE is better. 
14
Experiments 
2. Does unfolding RAE really decode the leaf nodes? 
➔Unfolding RAE is better 
This can reconstruct phrases up to length five very well 
15
Experiments 
3. How is the performance of proposed method 
to detect paraphrase? 
16 
➔Proposed method achieves state-of-the-art performance
Experiments 
4. Examples of classified data. 
17
おわり

More Related Content

Similar to Dynamic pooling and unfolding recursive autoencoders for paraphrase detection

A Neural Probabilistic Language Model.pptx
A Neural Probabilistic Language Model.pptxA Neural Probabilistic Language Model.pptx
A Neural Probabilistic Language Model.pptx
Rama Irsheidat
 
G6 m2-a-lesson 7-t
G6 m2-a-lesson 7-tG6 m2-a-lesson 7-t
G6 m2-a-lesson 7-t
mlabuski
 
SNLI_presentation_2
SNLI_presentation_2SNLI_presentation_2
SNLI_presentation_2
Viral Gupta
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
kevig
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
kevig
 
Unit 2 3rd grade cs 2012 2013
Unit 2 3rd grade cs 2012 2013Unit 2 3rd grade cs 2012 2013
Unit 2 3rd grade cs 2012 2013
Isaac_Schools_5
 
A supervised word sense disambiguation method using ontology and context know...
A supervised word sense disambiguation method using ontology and context know...A supervised word sense disambiguation method using ontology and context know...
A supervised word sense disambiguation method using ontology and context know...
Alexander Decker
 

Similar to Dynamic pooling and unfolding recursive autoencoders for paraphrase detection (20)

sentiment analysis
sentiment analysis sentiment analysis
sentiment analysis
 
Recursive Neural Networks
Recursive Neural NetworksRecursive Neural Networks
Recursive Neural Networks
 
Reasoning Over Knowledge Base
Reasoning Over Knowledge BaseReasoning Over Knowledge Base
Reasoning Over Knowledge Base
 
Reasoning Over Knowledge Base
Reasoning Over Knowledge BaseReasoning Over Knowledge Base
Reasoning Over Knowledge Base
 
A Neural Probabilistic Language Model.pptx
A Neural Probabilistic Language Model.pptxA Neural Probabilistic Language Model.pptx
A Neural Probabilistic Language Model.pptx
 
Introduction to Tree-LSTMs
Introduction to Tree-LSTMsIntroduction to Tree-LSTMs
Introduction to Tree-LSTMs
 
Incremental Sense Weight Training for In-depth Interpretation of Contextualiz...
Incremental Sense Weight Training for In-depth Interpretation of Contextualiz...Incremental Sense Weight Training for In-depth Interpretation of Contextualiz...
Incremental Sense Weight Training for In-depth Interpretation of Contextualiz...
 
G6 m2-a-lesson 7-t
G6 m2-a-lesson 7-tG6 m2-a-lesson 7-t
G6 m2-a-lesson 7-t
 
Turkish language modeling using BERT
Turkish language modeling using BERTTurkish language modeling using BERT
Turkish language modeling using BERT
 
Word embedding
Word embedding Word embedding
Word embedding
 
wordembedding.pptx
wordembedding.pptxwordembedding.pptx
wordembedding.pptx
 
X-RECOSA: MULTI-SCALE CONTEXT AGGREGATION FOR MULTI-TURN DIALOGUE GENERATION
X-RECOSA: MULTI-SCALE CONTEXT AGGREGATION FOR MULTI-TURN DIALOGUE GENERATIONX-RECOSA: MULTI-SCALE CONTEXT AGGREGATION FOR MULTI-TURN DIALOGUE GENERATION
X-RECOSA: MULTI-SCALE CONTEXT AGGREGATION FOR MULTI-TURN DIALOGUE GENERATION
 
SNLI_presentation_2
SNLI_presentation_2SNLI_presentation_2
SNLI_presentation_2
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
 
Distilling Linguistic Context for Language Model Compression
Distilling Linguistic Context for Language Model CompressionDistilling Linguistic Context for Language Model Compression
Distilling Linguistic Context for Language Model Compression
 
Distilling Linguistic Context for Language Model Compression
Distilling Linguistic Context for Language Model CompressionDistilling Linguistic Context for Language Model Compression
Distilling Linguistic Context for Language Model Compression
 
Unit 2 3rd grade cs 2012 2013
Unit 2 3rd grade cs 2012 2013Unit 2 3rd grade cs 2012 2013
Unit 2 3rd grade cs 2012 2013
 
A supervised word sense disambiguation method using ontology and context know...
A supervised word sense disambiguation method using ontology and context know...A supervised word sense disambiguation method using ontology and context know...
A supervised word sense disambiguation method using ontology and context know...
 
Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)
 

Recently uploaded

Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
Silpa
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
Silpa
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
Scintica Instrumentation
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
Silpa
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
Silpa
 

Recently uploaded (20)

Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLGwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptx
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 

Dynamic pooling and unfolding recursive autoencoders for paraphrase detection

  • 1. Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection R. Socher, et al, 2011 Presenter: Shun Yoshida
  • 2. Purpose of This Paper Objective: To detect paraphrase S1 The judge also refused to postpone the trial date of Sept. 29. S2 Obus also denied a defense motion to postpone the September trial date. ➔Identifying paraphrases is an important task for information retrieval, text summarization, evaluation of machine translation etc. Relevance to My Research: This can help me to classify sentiment more precisely 1
  • 3. Word Representation In general, words are represented as vectors. 1. One-hot representation This assigns ID to each word individually. 2 [ 0,0,…,1,0,…,0] Problem: • Very sparse • High dimension • Unable to measure the similarity between words 1:apple 2:book ⋮ 200:zoo ⋮ Vocabulary
  • 4. Word Representation 2. Distributed Representation :word embedding This method aims to learn this representation Merit: • Low dimension • Similar words take similar vector 3 zoo [ 1.5, 1.8, 0.3, 4 ] This represents the semantic, syntactic information
  • 5. Autoencoder  One kind of neural networks  #units of hidden is less than #units of input  Trained to reconstruct its own input 4 ➔Enable to learn low dimensional representations which capture the information well
  • 6. Autoencoder 푊푑:weight of decode 푊푒 :weight of encode Considered as binary tree; Input:2 childs [푐1; 푐2] ∈ ℝ2푛 Hidden:푝 ∈ ℝ푛 5 푥 ∈ ℝ푛:word embedding (initialized by neural language model) 푐1 푐2 푝 푐1 ′ 푐2 ′ childs to parent: reconstruction: reconstruction error:
  • 7. Recursive Auto Encoders The dimension of child and parent is same, thus we can repeat same step until full tree is constructed. 6 phrase vector word embedding reconstruction error of tree:
  • 8. Unfolding RAE Unfolding RAE tries to encode each hidden layer such that it best reconstructs its entire subtree to the leaf nodes. 7
  • 9. Why Unfolding RAE? Problem of RAE: • Equal weight to both children though each child could represent a different number of words • Lowers 퐸푟푒푐 by making the hidden layer very small ➔Unfolding RAE can solve there problems. 8 1word 3words
  • 10. RAE Training Training is computed by minimizing the sum of all node’s and all tree’s reconstruction error. 퐸푟푒푐 (푡표푡푎푙) is function of 푥 (word embedding) and 푊푑 , 푊푒 (weight of neural network) ➔Able to obtain word embeddings and phrase vectors after training 9
  • 11. Similarity Matrix After training, we compute the similarities (Euclidean distances) between all word and phrase vectors of the two sentences. These distances fill a similarity matrix 풮. 10 S[3,4] represents the similarity between node 4 of sentence1(mice) and node 3 of sentence2 (mice). ➔zero distance
  • 12. Why Dynamic Pooling? Classifying from average distance or histogram distances of 풮 does not result in good performance. ➔Need to feed 풮 into a classifier. Problem: The matrix dimensions vary based on the sentence length 풮 ∈ ℝ 2푛−1 ×(2푚−1) Solution: Map 풮 into a matrix 풮푝표표푙 of fixed size 풮푝표표푙 ∈ ℝ푛푝×푛푝 ➔Dynamic Pooling 11
  • 13. Dynamic Pooling 12 Example: 푛푝 = 3 (2푛 − 1, 2푚 − 1 are divisible by 푛푝) 2푛 − 1 = 3 2푚 − 1 = 9 1. Produce an 푛푝 × 푛푝 grid grid window size: 2푛−1 푛푝 × take minimum 2푚−1 푛푝 =1×3 푛푝 = 3 푛푝 = 3 2. Define element of 풮푝표표푙 to be minimum value of each grid (small value means that there are similar words or phrases in both sentences, thus take minimum to keep this information)
  • 14. Dynamic Pooling 13 Example: 푛푝 = 2 (2푛 − 1, 2푚 − 1 are NOT divisible by 푛푝) 2푛 − 1 = 3 2푚 − 1 = 9 1. Produce an 푛푝 × 푛푝 grid grid window size: 2푛−1 푛푝 × 2푚−1 푛푝 =1×4 2. Distribute remaining rows/columns to the last M grid. 푛푝 = 2 푛푝 = 2 take minimum
  • 15. Experiments 1. Does autoencoders capture the phrase information? ➔Unfolding RAE is better. 14
  • 16. Experiments 2. Does unfolding RAE really decode the leaf nodes? ➔Unfolding RAE is better This can reconstruct phrases up to length five very well 15
  • 17. Experiments 3. How is the performance of proposed method to detect paraphrase? 16 ➔Proposed method achieves state-of-the-art performance
  • 18. Experiments 4. Examples of classified data. 17

Editor's Notes

  1. この論文のしたいこと:言い換えの検出 例えばS1 裁判官は9/29の公判期日の延期を拒否した.と S2 Obusは9月の公判期日を延期しようとする動きを否定した. は同じ意味の文章,すなわち言い換え これを検知したい これができれば検索やテキスト要約や機械翻訳の評価などに使える 僕の研究で言えば,ネガティブの文章の言い換えが検知できたりして,よりよい精度でポジネガ判定ができるのではないかというところが 研究との関連性です
  2. 計算機で単語を扱うために,一般に単語をベクトルに変換して扱う ベクトルへの変換方法としてOne~ これはまずボキャブリーを作成してその単語にそれぞれIDを与え,そのID番目の要素だけ非0として他を0とするベクトルで表す表現 問題:非常にスパース 次元がボキャブラリーに登録されている単語数次元なので高次元    辞書にない未知の単語扱えない
  3. 意味的,構文的な情報を表す抽象的な情報量をベクトルで表す表現 この表現を使えば低次元で表現できる この論文はこの表現の学習も同時に行うことでパラフレーズ検出を行う また,意味的・構文的に似た単語はベクトルの値も近い値をとる 例えばappleとorangeは果物という意味で近いので,ある程度近いベクトルになるが, Appleとsoccerはぜんぜん違うベクトルになる
  4. NNの一種 隠れ層のユニット数は入力層のユニット数より少ない 出力が入力を再現するような学習を行う これにより特徴をよく捉えた低次元の表現(word embedding)が学習できる
  5. autoencoderの入力にはword embedding xを入力する 初期値はニューラルランゲージモデルというモデルを使って計算したword embeddingを使う word embegging xはn次元とする 単語2つのword embegging 2n次元をn次元に射影し,そこから入力の2n次元を復元しようとしている autoencoderを二分木とみなして,つまり下部は子,上部は親として 子から親への式はこれ 親から個を復元,再構築のしきはこれ 入力そのものを復元する,つまり教師ベクトルは入力そのものなので 再構築エラーは教師ベクトル-復元したベクトル の二乗差
  6. さっき注目してたのが青□で囲まれている範囲 入力ベクトルも親ベクトルも同じn次元なので同じ動作を繰り返して任意の長さの入力ベクトルで二分木を作成できる 文章全体の二分木の再構築エラーはそれぞれの場所でのエラーの足し合わせ どういう順番で二分木を作るかというのは,構文解析器を使うと書いてある...
  7. 先ほど説明したRAEでは入力とした2つの子を復元するだけ 例えば図左のx1とy2を子とした親y2はx1とy2を復元しようとする 今回のUnfolding RAEはx1とy1を復元し,さらにy1からx2とx3を復元する つまり,phrase vectorを復元するのではなくすべてword embeddingを復元しようとするのがこれ
  8. RAEの問題点として各ノードは異なる単語の情報を持ちうるのに同等の重みとして扱っていることである 例えば,図のこれは3語分の情報を持っているが,これは1語文の情報.なのにどちらも同等の重みで復元しようとしている. 2つめに再構築エラーはベクトルの2乗誤差で求めるが,2層目以降ではベクトルの大きさ自体を小さくしてしまうことでエラーを小さくしようとする傾向が現れる. つまり,隠れ層を小さくしてしまう. Unfoldingでは多くの情報をもつノードは大きな部分木を再構築しないといけないので必然的に重みをおくようになり, 隠れ層から葉まで復元するので隠れ層を小さくする問題が起こらなくなる.
  9. 訓練データの数だけ木が作られ,その木の各ノードでの再構築エラーを足し合わせる. この再構築エラーの合計を最小にすることで学習を行う. 再構築エラーはword embedding xとフレーズベクトルyの関数になっているので,これを勾配法で学習することで, 意味的・構文的な情報量を表すword embeddingとフレーズベクトルが得られる
  10. 学習が終わったあと,パラフレーズ判定したい2つの文(図左)で同様の手順で木を作り,木のそれぞれのベクトル間の類似性をユークリッド距離で測り, それをsimilarity matrixに格納する. 行列の[3,2]成分は左側の文のノード4であるmiceと右側の文のノード3であるmiceのユークリッド距離を表す
  11. similarity matrixの平均距離やヒストグラムから判定を行ってもよいパフォーマンスは得られない そこで,このSの値を別の識別器に入力して判定を行うようにしたい しかし,similarity matrixの次元は入力する文によって変化してしまうので,このまま識別器に放り込めない そこで,similarity matrixを固定次元のSpoolに射影し,このSpoolを識別器にいれるという方法をとる. 識別器はニューラルネットとか従来手法と同じものを使えばいいので,ここではSpookをどう作るかを説明する
  12. similarity matrixが3かける9次元で,これを3かける3次元に射影したいとする 𝑛 𝑝 =3 (2𝑛−1, 2𝑚−1 are divisible by 𝑛 𝑝 )のとき Sを3かける3のグリッドに分割する グリッドの窓の大きさはこの式で与えられる グリッドの最小値を対応する位置のSpoolに格納する Sの値が小さいことは,ユークリッド距離が小さい,すなわち似た意味の単語もしくはフレーズが存在することを意味しているので,この情報を保持するために最小値を取る
  13. similarity matrixが3かける9次元で,これを2かける2次元に射影したいとする Sを2かける2のグリッドに分割する グリッドの窓の大きさはこの式で与えられる ただあまりの部分がでてきてしまう あまりの数,今回は行も列も1なので,最後の1回だけ行と列のグリッドsizeを+1する ※行列の左上の法の要素はphraseでなく単語同士の類似度が格納されているが,この単語同士の類似度の情報の保持におもきをおいていることになる これはパラフレーズ検出では単語のoverlapが重要であると仮定しているから
  14. 実験結果です 1つめ:autoencoderは本当にphrase vectorをちゃんと学習できてるのか? 表1 左の列のフある文から抽出したフレーズと最も近いベクトルとなるフレーズを別の文から抽出してみた Unfolding RAEがいい感じですって結果でした
  15. 2つめの結果 unfolding RAEは葉のノード,つまり単語ベクトルを復元するということだったが 本当に復元できているのか?という結果 表2 単語の数が5以下のときはかなりできていて,3以下のときは完全に復元できるそう
  16. 最後,パラフレーズ検出に提案手法がどれくらい有効か?の結果 提案手法は従来手法よりよくなりました!