SlideShare una empresa de Scribd logo
1 de 14
Descargar para leer sin conexión
Simplifying Lexical Simplification:
Do We Need Simplified Corpora?
長岡技術科学大学 自然言語処理研究室
高橋寛治
Goran Glavas, Sanja Stajner
Proceedings of the 53rd Annual Meeting of the Association for
Computational Linguistics
and the 7th International Joint Conference
on Natural Language Processing (Short Papers), pages 63–68, Beijing,
China, July 26-31, 2015.
文献紹介 2016年3月3日
概要
•語彙平易化とは複雑な単語を簡単な同義語に置換
すること
•近年の研究はコーパスベース
•コーパスを利用せずに単語ベクトル表現を用いて
平易化を試みる
Domain-Specific	Paraphrase	Extraction 2
はじめに
•語彙平易化は読解を支援
Ø子供、学習者、失語症など(Petersenら2007)
•辞書や平易化コーパスを利用した研究が盛ん
Ø(Davlinら1998, De Belderら2010)
Ø資源がない言語では近年の手法を適用できない
•モチベーションとして単言語の普通のコーパスで
の平易化
Domain-Specific	Paraphrase	Extraction 3
関連研究
•ルールベース(より平易な同義語への置換)
Ø同義語(WordNetなど)の選び方が研究
•コーパスベース(Simple Wikipedia)
Øアライメントをとり言い換え
Ø編集履歴などを利用(Yatskarら2010)
•従来研究は辞書と平易化コーパスに頼る
Domain-Specific	Paraphrase	Extraction 4
提案手法:軽量リソースでの語彙平易化
•LIGHT-LS
•複雑な単語の平易な同義語を探す
Ø単語の複雑さ
Ø単語の意味的な類似度
•換言対象は1単語に絞る
Domain-Specific	Paraphrase	Extraction 5
平易な語の候補を選択
•分散表現の獲得にGloVeを利用
•意味的類似度はベクトル空間のコサイン類似度
•内容語(名詞、動詞、形容詞、副詞)から候補
Øある単語wに対して、類似している順に単語n語を候
補として取得
Domain-Specific	Paraphrase	Extraction 6
平易化に使う素性
•順位付けに用いる
Ø意味的類似度
uGloVeの単語間のコサイン角度を用いる
Ø文脈類似度
u元の単語の文脈の内容語とのコサイン類似度を比較する
Ø単語が有益か
Ø言語モデル
uN-gramで妥当性を判断
Domain-Specific	Paraphrase	Extraction 7
平易化アルゴリズム
Domain-Specific	Paraphrase	Extraction 8
評価
•自動評価および人手評価を行う
•既存手法と比較
ØHornら:教師あり(平易化コーパス利用)
ØBiranら:教師無し(平易化コーパス利用)
Domain-Specific	Paraphrase	Extraction 9
評価1
•置き換えタスクによる自動評価
•50人がつくった平易化文と比較
Domain-Specific	Paraphrase	Extraction 10
評価2
•人手評価、80文
Ø文法、平易化、意味の保持
Ø5段階評価
u5が良い
Domain-Specific	Paraphrase	Extraction 11
平易化例
Domain-Specific	Paraphrase	Extraction 12
とはいえ問題はある
•分布類似度が故
Ø「cool」の平易化候補が「warm」
•Water temperatures remained warm enough
for development.
Domain-Specific	Paraphrase	Extraction 13
まとめ
•LIGHT-LSという教師無しの語彙平易化手法を提
案
•平易化コーパスは不要で、大きなコーパスのみ必
用
•現在の平易化対象は単語のみだが、これから複単
語表現に取り組む
Domain-Specific	Paraphrase	Extraction 14

Más contenido relacionado

Más de Kanji Takahashi

20180718Eightニュースフィード活性化のための自然言語処理の取り組み
20180718Eightニュースフィード活性化のための自然言語処理の取り組み20180718Eightニュースフィード活性化のための自然言語処理の取り組み
20180718Eightニュースフィード活性化のための自然言語処理の取り組みKanji Takahashi
 
論文読み会 Creating Speech and Language Data With Amazon’s Mechanical Turk
論文読み会 Creating Speech and Language Data With Amazon’s Mechanical Turk論文読み会 Creating Speech and Language Data With Amazon’s Mechanical Turk
論文読み会 Creating Speech and Language Data With Amazon’s Mechanical TurkKanji Takahashi
 
論文読み会 Enriching Word Vectors with Subword Information
論文読み会 Enriching Word Vectors with Subword Information論文読み会 Enriching Word Vectors with Subword Information
論文読み会 Enriching Word Vectors with Subword InformationKanji Takahashi
 
第17回Machine Learning 15 minutes!:ビジネスの出会いを科学する
第17回Machine Learning 15 minutes!:ビジネスの出会いを科学する第17回Machine Learning 15 minutes!:ビジネスの出会いを科学する
第17回Machine Learning 15 minutes!:ビジネスの出会いを科学するKanji Takahashi
 
論文読み会 Data Augmentation for Low-Resource Neural Machine Translation
論文読み会 Data Augmentation for Low-Resource Neural Machine Translation論文読み会 Data Augmentation for Low-Resource Neural Machine Translation
論文読み会 Data Augmentation for Low-Resource Neural Machine TranslationKanji Takahashi
 
言語処理学会第23回年次大会参加報告
言語処理学会第23回年次大会参加報告言語処理学会第23回年次大会参加報告
言語処理学会第23回年次大会参加報告Kanji Takahashi
 
20170203The Effects of Data Size and Frequency Range on Distributional Semant...
20170203The Effects of Data Size and Frequency Range on Distributional Semant...20170203The Effects of Data Size and Frequency Range on Distributional Semant...
20170203The Effects of Data Size and Frequency Range on Distributional Semant...Kanji Takahashi
 
20161215Neural Machine Translation of Rare Words with Subword Units
20161215Neural Machine Translation of Rare Words with Subword Units20161215Neural Machine Translation of Rare Words with Subword Units
20161215Neural Machine Translation of Rare Words with Subword UnitsKanji Takahashi
 
Enriching Morphologically Poor Languages for Statistical Machine Translation
Enriching Morphologically Poor Languages for Statistical Machine TranslationEnriching Morphologically Poor Languages for Statistical Machine Translation
Enriching Morphologically Poor Languages for Statistical Machine TranslationKanji Takahashi
 
A Beam-Search Decoder for Normalization of Social Media Text with Application...
A Beam-Search Decoder for Normalization of Social Media Text with Application...A Beam-Search Decoder for Normalization of Social Media Text with Application...
A Beam-Search Decoder for Normalization of Social Media Text with Application...Kanji Takahashi
 
Reducing the Impact of Data Sparsity in Statistical Machine Translation
Reducing the Impact of Data Sparsity in Statistical Machine TranslationReducing the Impact of Data Sparsity in Statistical Machine Translation
Reducing the Impact of Data Sparsity in Statistical Machine TranslationKanji Takahashi
 
文献紹介:Morphological analysis for Statistical Machine Translation
文献紹介:Morphological analysis for Statistical Machine Translation文献紹介:Morphological analysis for Statistical Machine Translation
文献紹介:Morphological analysis for Statistical Machine TranslationKanji Takahashi
 
Distributed Representations of Words and Phrases and their Compositionally
Distributed Representations of Words and Phrases and their CompositionallyDistributed Representations of Words and Phrases and their Compositionally
Distributed Representations of Words and Phrases and their CompositionallyKanji Takahashi
 
Nlp2016参加報告(高橋)
Nlp2016参加報告(高橋)Nlp2016参加報告(高橋)
Nlp2016参加報告(高橋)Kanji Takahashi
 
Domain-spesific Paraphrase Extraction
Domain-spesific Paraphrase ExtractionDomain-spesific Paraphrase Extraction
Domain-spesific Paraphrase ExtractionKanji Takahashi
 
Vietnamese Word Segmentation with CRFs and SVMs: An Investigation
Vietnamese Word Segmentation with CRFs and SVMs: An InvestigationVietnamese Word Segmentation with CRFs and SVMs: An Investigation
Vietnamese Word Segmentation with CRFs and SVMs: An InvestigationKanji Takahashi
 
Improving vietnamese word segmentation and pos tagging using MEM with various...
Improving vietnamese word segmentation and pos tagging using MEM with various...Improving vietnamese word segmentation and pos tagging using MEM with various...
Improving vietnamese word segmentation and pos tagging using MEM with various...Kanji Takahashi
 
日本語機能表現の自動検出と統計的係り受け解析への応用
日本語機能表現の自動検出と統計的係り受け解析への応用日本語機能表現の自動検出と統計的係り受け解析への応用
日本語機能表現の自動検出と統計的係り受け解析への応用Kanji Takahashi
 
20150916How Far are We from Fully Automatic High Quality Grammatical Error Co...
20150916How Far are We from Fully Automatic High Quality Grammatical Error Co...20150916How Far are We from Fully Automatic High Quality Grammatical Error Co...
20150916How Far are We from Fully Automatic High Quality Grammatical Error Co...Kanji Takahashi
 
20150728So similar and yet incompatible: Toward automated identification of s...
20150728So similar and yet incompatible:Toward automated identification of s...20150728So similar and yet incompatible:Toward automated identification of s...
20150728So similar and yet incompatible: Toward automated identification of s...Kanji Takahashi
 

Más de Kanji Takahashi (20)

20180718Eightニュースフィード活性化のための自然言語処理の取り組み
20180718Eightニュースフィード活性化のための自然言語処理の取り組み20180718Eightニュースフィード活性化のための自然言語処理の取り組み
20180718Eightニュースフィード活性化のための自然言語処理の取り組み
 
論文読み会 Creating Speech and Language Data With Amazon’s Mechanical Turk
論文読み会 Creating Speech and Language Data With Amazon’s Mechanical Turk論文読み会 Creating Speech and Language Data With Amazon’s Mechanical Turk
論文読み会 Creating Speech and Language Data With Amazon’s Mechanical Turk
 
論文読み会 Enriching Word Vectors with Subword Information
論文読み会 Enriching Word Vectors with Subword Information論文読み会 Enriching Word Vectors with Subword Information
論文読み会 Enriching Word Vectors with Subword Information
 
第17回Machine Learning 15 minutes!:ビジネスの出会いを科学する
第17回Machine Learning 15 minutes!:ビジネスの出会いを科学する第17回Machine Learning 15 minutes!:ビジネスの出会いを科学する
第17回Machine Learning 15 minutes!:ビジネスの出会いを科学する
 
論文読み会 Data Augmentation for Low-Resource Neural Machine Translation
論文読み会 Data Augmentation for Low-Resource Neural Machine Translation論文読み会 Data Augmentation for Low-Resource Neural Machine Translation
論文読み会 Data Augmentation for Low-Resource Neural Machine Translation
 
言語処理学会第23回年次大会参加報告
言語処理学会第23回年次大会参加報告言語処理学会第23回年次大会参加報告
言語処理学会第23回年次大会参加報告
 
20170203The Effects of Data Size and Frequency Range on Distributional Semant...
20170203The Effects of Data Size and Frequency Range on Distributional Semant...20170203The Effects of Data Size and Frequency Range on Distributional Semant...
20170203The Effects of Data Size and Frequency Range on Distributional Semant...
 
20161215Neural Machine Translation of Rare Words with Subword Units
20161215Neural Machine Translation of Rare Words with Subword Units20161215Neural Machine Translation of Rare Words with Subword Units
20161215Neural Machine Translation of Rare Words with Subword Units
 
Enriching Morphologically Poor Languages for Statistical Machine Translation
Enriching Morphologically Poor Languages for Statistical Machine TranslationEnriching Morphologically Poor Languages for Statistical Machine Translation
Enriching Morphologically Poor Languages for Statistical Machine Translation
 
A Beam-Search Decoder for Normalization of Social Media Text with Application...
A Beam-Search Decoder for Normalization of Social Media Text with Application...A Beam-Search Decoder for Normalization of Social Media Text with Application...
A Beam-Search Decoder for Normalization of Social Media Text with Application...
 
Reducing the Impact of Data Sparsity in Statistical Machine Translation
Reducing the Impact of Data Sparsity in Statistical Machine TranslationReducing the Impact of Data Sparsity in Statistical Machine Translation
Reducing the Impact of Data Sparsity in Statistical Machine Translation
 
文献紹介:Morphological analysis for Statistical Machine Translation
文献紹介:Morphological analysis for Statistical Machine Translation文献紹介:Morphological analysis for Statistical Machine Translation
文献紹介:Morphological analysis for Statistical Machine Translation
 
Distributed Representations of Words and Phrases and their Compositionally
Distributed Representations of Words and Phrases and their CompositionallyDistributed Representations of Words and Phrases and their Compositionally
Distributed Representations of Words and Phrases and their Compositionally
 
Nlp2016参加報告(高橋)
Nlp2016参加報告(高橋)Nlp2016参加報告(高橋)
Nlp2016参加報告(高橋)
 
Domain-spesific Paraphrase Extraction
Domain-spesific Paraphrase ExtractionDomain-spesific Paraphrase Extraction
Domain-spesific Paraphrase Extraction
 
Vietnamese Word Segmentation with CRFs and SVMs: An Investigation
Vietnamese Word Segmentation with CRFs and SVMs: An InvestigationVietnamese Word Segmentation with CRFs and SVMs: An Investigation
Vietnamese Word Segmentation with CRFs and SVMs: An Investigation
 
Improving vietnamese word segmentation and pos tagging using MEM with various...
Improving vietnamese word segmentation and pos tagging using MEM with various...Improving vietnamese word segmentation and pos tagging using MEM with various...
Improving vietnamese word segmentation and pos tagging using MEM with various...
 
日本語機能表現の自動検出と統計的係り受け解析への応用
日本語機能表現の自動検出と統計的係り受け解析への応用日本語機能表現の自動検出と統計的係り受け解析への応用
日本語機能表現の自動検出と統計的係り受け解析への応用
 
20150916How Far are We from Fully Automatic High Quality Grammatical Error Co...
20150916How Far are We from Fully Automatic High Quality Grammatical Error Co...20150916How Far are We from Fully Automatic High Quality Grammatical Error Co...
20150916How Far are We from Fully Automatic High Quality Grammatical Error Co...
 
20150728So similar and yet incompatible: Toward automated identification of s...
20150728So similar and yet incompatible:Toward automated identification of s...20150728So similar and yet incompatible:Toward automated identification of s...
20150728So similar and yet incompatible: Toward automated identification of s...
 

文献紹介:Simplifying Lexical Simplification: Do We Need Simplified Corpora?