SlideShare una empresa de Scribd logo
1 de 17
Descargar para leer sin conexión
Modeling Missing Data in Distant Supervision for Information Extraction 
Alan Ritter (CMU) 
Luke Zettlemoyer(University of Washington) 
Mausam(University of Washington) 
Oren Etzioni(Vulcan Inc.) 
TACL, 1, 367-378, 2013. 
Presented by NaoakiOkazaki (Tohoku University) 
2014-09-05 Modeling Missing Data in Distant Supervision 
1
Relation instance extraction 
Steven Spielberg’s film Saving Private Ryan is loosely based on the brothers’ story. 
Extractor 
Film 
Director 
Saving Private Ryan 
Steven Spielberg 
Film-director relation 
• 
Fully-supervised learning (Zhou+ 05, …) 
• 
Uses ACE corpora to build relation-instance classifiers 
• 
Suffers from the limited number of training data 
• 
Unsupervised information extraction (Banko+ 07, …) 
• 
Extracts relational patterns between entities, and clusters the patterns into relations 
• 
Difficult to map clusters into relations of interest 
• 
Bootstrap learning (Brin98, …) 
• 
Uses seed instances to extract a new set of relational patterns 
• 
Often suffers from low precision (semantic drift) 
• 
Distant supervision (Mintz+ 09, …) 
• 
Combines the advantages of the above approaches 
2014-09-05 Modeling Missing Data in Distant Supervision 
2
Distant supervision (Mintz+, 09) 
Person 
Birthplace 
EdwinHubble 
Marshfield 
… 
… 
Automatic annotation 
Astronomer Edwin Hubble was born in Marshfield, Missouri. 
Feature extraction 
Mintzet al. (2009) Distant supervision for relation extraction without labeled data. ACL-2009, pages 1003–1011. 
* Each row presents a single feature. Concatenate features from different sentences containing the same entity pairs. 
Problem: An entity pair cannot have multiple relations 
E.g., Founded(Jobs, Apple) and CEO-of(Jobs, Apple) are true. 
2014-09-05 Modeling Missing Data in Distant Supervision 
3
MultiR(Hoffmann+, 11) 
Introduces latent variables (푧푧푖푖) to indicate the relation expressed by sentence 푥푥푖푖 
0 
1 
1 
0 
Founder 
Founder 
CEO-of 
푦푦born−in 
푦푦founder 
푦푦CEO−of 
푦푦capital−of 
Steve Jobs was founder of Apple. 
Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple. 
Steve Jobs is CEO of Apple. 
푧푧1 
푧푧2 
푧푧3 
푝푝풚풚,풛풛풙풙 = 1 푍푍푥푥 ෑ 푟푟 Φjoin(푦푦푟푟,풛풛)ෑ 푖푖 Φextract(푧푧푖푖,푥푥푖푖) 
푥푥1 
푥푥2 
푥푥3 
풛풛 
풙풙 
풚풚 
For entity pair, (Steve Jobs, Apple) 
푥푥푖푖: a sentence containing the entity pair 
푦푦푟푟∈{0,1}: 1if the knowledge base includes the pair with relation 푟푟, 0otherwise 
푧푧푖푖∈푅푅: the relation expressed by sentence 푥푥푖푖 
Φextract푧푧푖푖,푥푥푖푖=exp෍ 푗푗 휃휃푗푗휙휙푗푗(푧푧푖푖,푥푥푖푖) 
Φjoin푦푦푟푟,풛풛=1(¬푦푦푟푟⋁∃푖푖: 푗푗=푧푧푖푖) 
(Deterministic OR) 
The same as (Mintz+ 09) 
Φjoinensures that a sentence 푥푥푖푖expressing the relation 푟푟exists if 푟푟is true 
Allows multiple relations for the same entity pair 
2014-09-05 Modeling Missing Data in Distant Supervision 
4
MultiR: Training 
Hoffmann et al. (2011) Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations. ACL-2011, pages 541–550. 
Loop for passes over the training data 
Loop for entity pairs in the KB 
Predict sentence-level and KB-level relations (ignoring the facts in the KB) 
Find an optimal assignment of sentence-level relations consistent with the facts in KB 
We need two kinds of inferences 
Update feature weights similarly to the perceptron algorithm 
2014-09-05 Modeling Missing Data in Distant Supervision 5
MultiR: Inference 1: argmax 풚풚,풛풛 푝푝(풚풚,풛풛|풙풙) 
? 
? 
? 
? 
? 
? 
? 
푦푦born−in 
푦푦founder 
푦푦CEO−of 
푦푦capital−of 
Steve Jobs was founder of Apple. 
Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple. 
Steve Jobs is CEO of Apple. 
푧푧1 
푧푧2 
푧푧3 
푥푥1 
푥푥2 
푥푥3 
풛풛 
풙풙 
풚풚 
For entity pair, (Steve Jobs, Apple) 
0.5 
16.0 
9.0 
0.1 
8.0 
11.0 
6.0 
0.1 
7.0 
8.0 
7.0 
0.2 
born−in 
founder 
CEO−of 
capita−of 
Predict a relation label for each sentence independently 
Aggregate sentence- level predictions into global-level predictions 
2014-09-05 Modeling Missing Data in Distant Supervision 
6
MultiR: Inference 1: argmax 풚풚,풛풛 푝푝(풚풚,풛풛|풙풙) 
0 
1 
0 
0 
founder 
founder 
founder 
푦푦born−in 
푦푦founder 
푦푦CEO−of 
푦푦capital−of 
Steve Jobs was founder of Apple. 
Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple. 
Steve Jobs is CEO of Apple. 
푧푧1 
푧푧2 
푧푧3 
푥푥1 
푥푥2 
푥푥3 
풛풛 
풙풙 
풚풚 
For entity pair, (Steve Jobs, Apple) 
0.5 
16.0 
9.0 
0.1 
8.0 
11.0 
6.0 
0.1 
7.0 
8.0 
7.0 
0.2 
born−in 
founder 
CEO−of 
capita−of 
Predict a relation label for each sentence independently 
Aggregate sentence- level predictions into global-level predictions 
Very easy to find! 
Computational cost: 표표(푅푅풙풙) 
2014-09-05 Modeling Missing Data in Distant Supervision 
7
MultiR: Inference 2: argmax 풛풛 푝푝(풛풛|풙풙,풚풚) 
0 
1 
1 
0 
? 
? 
? 
푦푦born−in 
푦푦founder 
푦푦CEO−of 
푦푦capital−of 
Steve Jobs was founder of Apple. 
Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple. 
Steve Jobs is CEO of Apple. 
푧푧1 
푧푧2 
푧푧3 
푥푥1 
푥푥2 
푥푥3 
풛풛 
풙풙 
풚풚 
For entity pair, (Steve Jobs, Apple) 
0.5 
16.0 
9.0 
0.1 
8.0 
11.0 
6.0 
0.1 
7.0 
8.0 
7.0 
0.2 
born−in 
founder 
CEO−of 
capita−of 
0.5 
8 
7 
16 
11 
8 
9 
6 
7 
0.1 
0.1 
0.2 
Define an edge weight: w푦푦푟푟,푧푧푖푖=Φextract(푟푟,푥푥푖푖) 
A node with 푦푦푟푟=1must have at least an edge connecting to 푧푧푖푖 
Each node 푧푧푖푖must have an edge connecting to 푦푦푟푟 
Find a set of edges that maximize the sum of weights 
2014-09-05 Modeling Missing Data in Distant Supervision 
8
MultiR: Inference 2: argmax 풛풛 푝푝(풛풛|풙풙,풚풚) 
0 
1 
1 
0 
founder 
founder 
CEO-of 
푦푦born−in 
푦푦founder 
푦푦CEO−of 
푦푦capital−of 
Steve Jobs was founder of Apple. 
Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple. 
Steve Jobs is CEO of Apple. 
푧푧1 
푧푧2 
푧푧3 
푥푥1 
푥푥2 
푥푥3 
풛풛 
풙풙 
풚풚 
For entity pair, (Steve Jobs, Apple) 
0.5 
16.0 
9.0 
0.1 
8.0 
11.0 
6.0 
0.1 
7.0 
8.0 
7.0 
0.2 
born−in 
founder 
CEO−of 
capita−of 
16 
11 
8 
9 
6 
7 
Define an edge weight: w푦푦푟푟,푧푧푖푖=Φextract(푟푟,푥푥푖푖) 
A node with 푦푦푟푟=1must have at least an edge connecting to 푧푧푖푖 
Each node 푧푧푖푖must have an edge connecting to 푦푦푟푟 
Find a set of edges that maximize the sum of weights 
Exact solution in polynomial time 
In practice, approximate solution by greedy search (assigning 푧푧푖푖for each node 푦푦푟푟=1) is sufficient 
2014-09-05 Modeling Missing Data in Distant Supervision 
9
Contribution of this work 
• 
MultiRmakes two assumptions (hard constraints): 
• 
If a fact is not found in the database, it cannot be mentioned in the text 
• 
If a fact is in the database, it must be mentioned in at least one sentence. 
• 
Relax MultiRto handle the situation where: 
• 
A fact is not mentioned in text (MIT) 
• 
A fact mentioned in text is missing in database (MID) 
• 
Side effect of this relaxation 
• 
Incorporates the tendency that the knowledge base is likely to include popular entities and relations 
2014-09-05 Modeling Missing Data in Distant Supervision 
10
Distant Supervision with Data Not Missing at Random (DNMAR) 
0 
1 
1 
0 
Founder 
Founder 
visit 
푦푦born−in 
푦푦founder 
푦푦CEO−of 
푦푦visit 
Steve Jobs was founder of Apple. 
Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple. 
Steve Jobs visited Apple store… 
푧푧1 
푧푧2 
푧푧3 
푥푥1 
푥푥2 
푥푥3 
풛풛 
풙풙 
풚풚 
For entity pair, (Steve Jobs, Apple) 
0 
1 
0 
1 
풕풕 
Introduce a layer of latent variables (푡푡푟푟) to handle missing cases 
휙휙miss푦푦푟푟,푡푡푟푟 = −훼훼푀푀푀푀푀푀(푦푦푟푟=1⋀푡푡푟푟=0) (missingintext) −훼훼푀푀푀푀푀푀(푦푦푟푟=0⋀푡푡푟푟=1) (missinginDB) 0(otherwise) 
Relaxing two hard constraints in MultiRinto soft oneswith penalty factors −훼훼푀푀푀푀푀푀and −훼훼푀푀푀푀푀푀 
Introduce a new factor: 
Training algorithm is the same as the one used in MultiR 
2014-09-05 Modeling Missing Data in Distant Supervision 
11
Constrained inference: argmax 풛풛 푝푝(풛풛|풙풙,풚풚) 
0 
1 
1 
0 
? 
? 
? 
푦푦born−in 
푦푦founder 
푦푦CEO−of 
푦푦visit 
Steve Jobs was founder of Apple. 
Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple. 
Steve Jobs visited Apple store… 
푧푧1 
푧푧2 
푧푧3 
푥푥1 
푥푥2 
푥푥3 
풛풛 
풙풙 
풚풚 
For entity pair, (Steve Jobs, Apple) 
? 
? 
? 
? 
풕풕 
푧푧∗=argmax 풛풛 ෍ 푖푖=1 푛푛 휃휃ȉΦextract푧푧푖푖,푥푥푖푖+෍ 푟푟 훼훼푀푀푀푀푇ȉ1(푦푦푟푟⋁∃푖푖:푟푟=푧푧푖푖)−훼훼푀푀푀푀퐷ȉ1(¬푦푦푟푟⋁∃푖푖:푟푟=푧푧푖푖) 
Became more challenging 
A* search can find an exact solution, but is not scalable with many variables 
Present a greedy hill climbing approach for the inference: 
1. 
Initialize 푧푧푖푖at random 
2. 
Obtain neighborhoods of the current solution 
3. 
Move to the neighbor yielding the highest score 
4. 
Repeat this process 
2014-09-05 Modeling Missing Data in Distant Supervision 
12
Incorporating popularity in KB 
• 
We tune the penalty factors 훼훼푀푀푀푀푀푀and 훼훼푀푀푀푀퐷on a development set 
• 
We can take into account how likely each fact is to be observed in the text and the knowledge base 
• 
Facts about Barack Obama are likelyto exist 
• 
Facts about NaoakiOkazaki are unlikelyto exists 
• 
Control the penalty factor for each entity pair 
• 
Popularity of entities: 훼훼푀푀푀푀푀푀 (푒푒1,푒푒2)=−훾훾min(푐푐푒푒1,푐푐(푒푒2)) 
• 
A larger penalty if the model predicts that a fact about a popular entity does not exist in KB 
• 
Well-aligned relations: assign 3 kinds of values of 훼훼푀푀푀푀푇푟푟 
• 
A larger penalty if a popular relation such as contains, place_lived, and nationalitydoes not exist in text 
2014-09-05 Modeling Missing Data in Distant Supervision 
13
Experiments 
• 
Binary relation extraction 
• 
The standard setting (Riedel+, 10) 
• 
Knowledge base: Freebase relations 
• 
Text corpus: 1.8m New York Times articles 
• 
Two kinds of evaluation 
• 
Sentence-level extractions using the dataset (Hoffmann+, 11) 
• 
Holdout evaluation on Freebase knowledge 
• 
Unary relation extraction (NE categorization) 
• 
Twitter NE categorization dataset (Ritter+, 11) 
• 
Knowledge base: Freebase (instances and their categories) 
• 
Text corpus: tweets 
• 
Hold-out evaluation 
2014-09-05 Modeling Missing Data in Distant Supervision 
14
Results 
17% increase in area under the curve. 
Incorporating popularity yielded 27% increase over the baseline. 
This evaluation underestimate precision because many facts correctly extracted from text are missing in the database. 
DNMAR doubled the recall. 
Ritter et al. (2013) Modeling Missing Data in Distant Supervision for Information Extraction, TACL(1), 367-378. 
2014-09-05 Modeling Missing Data in Distant Supervision 
15
Conclusion 
• 
Investigated the problem of missing data in distant supervision 
• 
Presented an extension of MultiRto handle missing data 
• 
Could incorporate the popularity of facts to be included in the knowledge base and text 
• 
Presented a scalable inference algorithm based on greedy hill-climbing 
• 
Demonstrated the effectiveness of the modeling 
2014-09-05 Modeling Missing Data in Distant Supervision 
16
References 
• 
Raphael Hoffmann, CongleZhang, Xiao Ling, Luke Zettlemoyer, Daniel S. Weld. (2011) Knowledge- Based Weak Supervision for Information Extraction of Overlapping Relations. ACL-2011, pages 541–550. 
•Slides 
and 
codes 
• 
Mike Mintz, Steven Bills, RionSnow, Dan Jurafsky. (2009) Distant supervision for relation extraction without labeled data. ACL-2009, pages 1003–1011. 
2014-09-05 Modeling Missing Data in Distant Supervision 
17

Más contenido relacionado

La actualidad más candente

The Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery
The Lazy Traveling Salesman Memory Management for Large-Scale Link DiscoveryThe Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery
The Lazy Traveling Salesman Memory Management for Large-Scale Link DiscoveryHolistic Benchmarking of Big Linked Data
 
(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot Learning(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot LearningMasahiro Suzuki
 
GAN - Theory and Applications
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and ApplicationsEmanuele Ghelfi
 
Joint contrastive learning with infinite possibilities
Joint contrastive learning with infinite possibilitiesJoint contrastive learning with infinite possibilities
Joint contrastive learning with infinite possibilitiestaeseon ryu
 
GANs and Applications
GANs and ApplicationsGANs and Applications
GANs and ApplicationsHoang Nguyen
 
A Short Introduction to Generative Adversarial Networks
A Short Introduction to Generative Adversarial NetworksA Short Introduction to Generative Adversarial Networks
A Short Introduction to Generative Adversarial NetworksJong Wook Kim
 
IEEESSCI2017-FOCI4-1039
IEEESSCI2017-FOCI4-1039IEEESSCI2017-FOCI4-1039
IEEESSCI2017-FOCI4-1039Naoki Hayashi
 
【博士論文発表会】パラメータ制約付き特異モデルの統計的学習理論
【博士論文発表会】パラメータ制約付き特異モデルの統計的学習理論【博士論文発表会】パラメータ制約付き特異モデルの統計的学習理論
【博士論文発表会】パラメータ制約付き特異モデルの統計的学習理論Naoki Hayashi
 
A pixel to-pixel segmentation method of DILD without masks using CNN and perl...
A pixel to-pixel segmentation method of DILD without masks using CNN and perl...A pixel to-pixel segmentation method of DILD without masks using CNN and perl...
A pixel to-pixel segmentation method of DILD without masks using CNN and perl...남주 김
 
Probability distributions for ml
Probability distributions for mlProbability distributions for ml
Probability distributions for mlSung Yub Kim
 
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GAN[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GANNAVER Engineering
 
Dictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationDictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationArthur Mensch
 

La actualidad más candente (14)

The Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery
The Lazy Traveling Salesman Memory Management for Large-Scale Link DiscoveryThe Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery
The Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery
 
(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot Learning(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot Learning
 
GAN - Theory and Applications
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and Applications
 
Joint contrastive learning with infinite possibilities
Joint contrastive learning with infinite possibilitiesJoint contrastive learning with infinite possibilities
Joint contrastive learning with infinite possibilities
 
GANs and Applications
GANs and ApplicationsGANs and Applications
GANs and Applications
 
12_applications.pdf
12_applications.pdf12_applications.pdf
12_applications.pdf
 
A Short Introduction to Generative Adversarial Networks
A Short Introduction to Generative Adversarial NetworksA Short Introduction to Generative Adversarial Networks
A Short Introduction to Generative Adversarial Networks
 
IEEESSCI2017-FOCI4-1039
IEEESSCI2017-FOCI4-1039IEEESSCI2017-FOCI4-1039
IEEESSCI2017-FOCI4-1039
 
【博士論文発表会】パラメータ制約付き特異モデルの統計的学習理論
【博士論文発表会】パラメータ制約付き特異モデルの統計的学習理論【博士論文発表会】パラメータ制約付き特異モデルの統計的学習理論
【博士論文発表会】パラメータ制約付き特異モデルの統計的学習理論
 
A pixel to-pixel segmentation method of DILD without masks using CNN and perl...
A pixel to-pixel segmentation method of DILD without masks using CNN and perl...A pixel to-pixel segmentation method of DILD without masks using CNN and perl...
A pixel to-pixel segmentation method of DILD without masks using CNN and perl...
 
Probability distributions for ml
Probability distributions for mlProbability distributions for ml
Probability distributions for ml
 
lec18_ref.pdf
lec18_ref.pdflec18_ref.pdf
lec18_ref.pdf
 
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GAN[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
 
Dictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationDictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix Factorization
 

Destacado

Learning to automatically solve algebra word problems
Learning to automatically solve algebra word problemsLearning to automatically solve algebra word problems
Learning to automatically solve algebra word problemsNaoaki Okazaki
 
研究室における研究・実装ノウハウの共有
研究室における研究・実装ノウハウの共有研究室における研究・実装ノウハウの共有
研究室における研究・実装ノウハウの共有Naoaki Okazaki
 
Pennington, Socher, and Manning. (2014) GloVe: Global vectors for word repres...
Pennington, Socher, and Manning. (2014) GloVe: Global vectors for word repres...Pennington, Socher, and Manning. (2014) GloVe: Global vectors for word repres...
Pennington, Socher, and Manning. (2014) GloVe: Global vectors for word repres...Naoaki Okazaki
 
Visualizing and understanding neural models in NLP
Visualizing and understanding neural models in NLPVisualizing and understanding neural models in NLP
Visualizing and understanding neural models in NLPNaoaki Okazaki
 
Word2vecの並列実行時の学習速度の改善
Word2vecの並列実行時の学習速度の改善Word2vecの並列実行時の学習速度の改善
Word2vecの並列実行時の学習速度の改善Naoaki Okazaki
 
Supervised Learning of Universal Sentence Representations from Natural Langua...
Supervised Learning of Universal Sentence Representations from Natural Langua...Supervised Learning of Universal Sentence Representations from Natural Langua...
Supervised Learning of Universal Sentence Representations from Natural Langua...Naoaki Okazaki
 
単語・句の分散表現の学習
単語・句の分散表現の学習単語・句の分散表現の学習
単語・句の分散表現の学習Naoaki Okazaki
 
単語の分散表現と構成性の計算モデルの発展
単語の分散表現と構成性の計算モデルの発展単語の分散表現と構成性の計算モデルの発展
単語の分散表現と構成性の計算モデルの発展Naoaki Okazaki
 
深層ニューラルネットワーク による知識の自動獲得・推論
深層ニューラルネットワークによる知識の自動獲得・推論深層ニューラルネットワークによる知識の自動獲得・推論
深層ニューラルネットワーク による知識の自動獲得・推論Naoaki Okazaki
 
言語と画像の表現学習
言語と画像の表現学習言語と画像の表現学習
言語と画像の表現学習Yuki Noguchi
 
深層学習時代の自然言語処理
深層学習時代の自然言語処理深層学習時代の自然言語処理
深層学習時代の自然言語処理Yuya Unno
 

Destacado (11)

Learning to automatically solve algebra word problems
Learning to automatically solve algebra word problemsLearning to automatically solve algebra word problems
Learning to automatically solve algebra word problems
 
研究室における研究・実装ノウハウの共有
研究室における研究・実装ノウハウの共有研究室における研究・実装ノウハウの共有
研究室における研究・実装ノウハウの共有
 
Pennington, Socher, and Manning. (2014) GloVe: Global vectors for word repres...
Pennington, Socher, and Manning. (2014) GloVe: Global vectors for word repres...Pennington, Socher, and Manning. (2014) GloVe: Global vectors for word repres...
Pennington, Socher, and Manning. (2014) GloVe: Global vectors for word repres...
 
Visualizing and understanding neural models in NLP
Visualizing and understanding neural models in NLPVisualizing and understanding neural models in NLP
Visualizing and understanding neural models in NLP
 
Word2vecの並列実行時の学習速度の改善
Word2vecの並列実行時の学習速度の改善Word2vecの並列実行時の学習速度の改善
Word2vecの並列実行時の学習速度の改善
 
Supervised Learning of Universal Sentence Representations from Natural Langua...
Supervised Learning of Universal Sentence Representations from Natural Langua...Supervised Learning of Universal Sentence Representations from Natural Langua...
Supervised Learning of Universal Sentence Representations from Natural Langua...
 
単語・句の分散表現の学習
単語・句の分散表現の学習単語・句の分散表現の学習
単語・句の分散表現の学習
 
単語の分散表現と構成性の計算モデルの発展
単語の分散表現と構成性の計算モデルの発展単語の分散表現と構成性の計算モデルの発展
単語の分散表現と構成性の計算モデルの発展
 
深層ニューラルネットワーク による知識の自動獲得・推論
深層ニューラルネットワークによる知識の自動獲得・推論深層ニューラルネットワークによる知識の自動獲得・推論
深層ニューラルネットワーク による知識の自動獲得・推論
 
言語と画像の表現学習
言語と画像の表現学習言語と画像の表現学習
言語と画像の表現学習
 
深層学習時代の自然言語処理
深層学習時代の自然言語処理深層学習時代の自然言語処理
深層学習時代の自然言語処理
 

Similar a Modeling missing data in distant supervision for information extraction (Ritter+, TACL 2013)

Linked Data Generation for Adaptive Learning Analytics Systems
Linked Data Generation for Adaptive Learning Analytics SystemsLinked Data Generation for Adaptive Learning Analytics Systems
Linked Data Generation for Adaptive Learning Analytics SystemsSven Lieber
 
Using binary classifiers
Using binary classifiersUsing binary classifiers
Using binary classifiersbutest
 
Anomaly Detection for Real-World Systems
Anomaly Detection for Real-World SystemsAnomaly Detection for Real-World Systems
Anomaly Detection for Real-World SystemsManojit Nandi
 
SANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference ResolutionSANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference ResolutioneXascale Infolab
 
Mariia Havrylovych "Active learning and weak supervision in NLP projects"
Mariia Havrylovych "Active learning and weak supervision in NLP projects"Mariia Havrylovych "Active learning and weak supervision in NLP projects"
Mariia Havrylovych "Active learning and weak supervision in NLP projects"Fwdays
 
Dynamic Search Using Semantics & Statistics
Dynamic Search Using Semantics & StatisticsDynamic Search Using Semantics & Statistics
Dynamic Search Using Semantics & StatisticsPaul Hofmann
 
Machine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting StartedMachine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting StartedSqrrl
 
Fast Distributed Online Classification
Fast Distributed Online ClassificationFast Distributed Online Classification
Fast Distributed Online ClassificationPrasad Chalasani
 
Scene Description From Images To Sentences
Scene Description From Images To SentencesScene Description From Images To Sentences
Scene Description From Images To SentencesIRJET Journal
 
Finding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic WebFinding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic Webebiquity
 
Cs221 lecture5-fall11
Cs221 lecture5-fall11Cs221 lecture5-fall11
Cs221 lecture5-fall11darwinrlo
 
Machine learning presentation (razi)
Machine learning presentation (razi)Machine learning presentation (razi)
Machine learning presentation (razi)Rizwan Shaukat
 
[ESWC2017 - PhD Symposium] Enhancing white-box machine learning processes by ...
[ESWC2017 - PhD Symposium] Enhancing white-box machine learning processes by ...[ESWC2017 - PhD Symposium] Enhancing white-box machine learning processes by ...
[ESWC2017 - PhD Symposium] Enhancing white-box machine learning processes by ...Gilles Vandewiele
 
(Gaurav sawant & dhaval sawlani)bia 678 final project report
(Gaurav sawant & dhaval sawlani)bia 678 final project report(Gaurav sawant & dhaval sawlani)bia 678 final project report
(Gaurav sawant & dhaval sawlani)bia 678 final project reportGaurav Sawant
 
Profile-based Dataset Recommendation for RDF Data Linking
Profile-based Dataset Recommendation for RDF Data Linking  Profile-based Dataset Recommendation for RDF Data Linking
Profile-based Dataset Recommendation for RDF Data Linking Mohamed BEN ELLEFI
 
Oracle Machine Learning Overview and From Oracle Data Professional to Oracle ...
Oracle Machine Learning Overview and From Oracle Data Professional to Oracle ...Oracle Machine Learning Overview and From Oracle Data Professional to Oracle ...
Oracle Machine Learning Overview and From Oracle Data Professional to Oracle ...Charlie Berger
 
Lecture 19
Lecture 19Lecture 19
Lecture 19Shani729
 

Similar a Modeling missing data in distant supervision for information extraction (Ritter+, TACL 2013) (20)

Linked Data Generation for Adaptive Learning Analytics Systems
Linked Data Generation for Adaptive Learning Analytics SystemsLinked Data Generation for Adaptive Learning Analytics Systems
Linked Data Generation for Adaptive Learning Analytics Systems
 
Using binary classifiers
Using binary classifiersUsing binary classifiers
Using binary classifiers
 
Anomaly Detection for Real-World Systems
Anomaly Detection for Real-World SystemsAnomaly Detection for Real-World Systems
Anomaly Detection for Real-World Systems
 
SANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference ResolutionSANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference Resolution
 
Mariia Havrylovych "Active learning and weak supervision in NLP projects"
Mariia Havrylovych "Active learning and weak supervision in NLP projects"Mariia Havrylovych "Active learning and weak supervision in NLP projects"
Mariia Havrylovych "Active learning and weak supervision in NLP projects"
 
Dynamic Search Using Semantics & Statistics
Dynamic Search Using Semantics & StatisticsDynamic Search Using Semantics & Statistics
Dynamic Search Using Semantics & Statistics
 
Data Science.pptx
Data Science.pptxData Science.pptx
Data Science.pptx
 
Machine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting StartedMachine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting Started
 
Fast Distributed Online Classification
Fast Distributed Online ClassificationFast Distributed Online Classification
Fast Distributed Online Classification
 
Scene Description From Images To Sentences
Scene Description From Images To SentencesScene Description From Images To Sentences
Scene Description From Images To Sentences
 
Bayesian reasoning
Bayesian reasoningBayesian reasoning
Bayesian reasoning
 
Finding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic WebFinding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic Web
 
OpenML DALI
OpenML DALIOpenML DALI
OpenML DALI
 
Cs221 lecture5-fall11
Cs221 lecture5-fall11Cs221 lecture5-fall11
Cs221 lecture5-fall11
 
Machine learning presentation (razi)
Machine learning presentation (razi)Machine learning presentation (razi)
Machine learning presentation (razi)
 
[ESWC2017 - PhD Symposium] Enhancing white-box machine learning processes by ...
[ESWC2017 - PhD Symposium] Enhancing white-box machine learning processes by ...[ESWC2017 - PhD Symposium] Enhancing white-box machine learning processes by ...
[ESWC2017 - PhD Symposium] Enhancing white-box machine learning processes by ...
 
(Gaurav sawant & dhaval sawlani)bia 678 final project report
(Gaurav sawant & dhaval sawlani)bia 678 final project report(Gaurav sawant & dhaval sawlani)bia 678 final project report
(Gaurav sawant & dhaval sawlani)bia 678 final project report
 
Profile-based Dataset Recommendation for RDF Data Linking
Profile-based Dataset Recommendation for RDF Data Linking  Profile-based Dataset Recommendation for RDF Data Linking
Profile-based Dataset Recommendation for RDF Data Linking
 
Oracle Machine Learning Overview and From Oracle Data Professional to Oracle ...
Oracle Machine Learning Overview and From Oracle Data Professional to Oracle ...Oracle Machine Learning Overview and From Oracle Data Professional to Oracle ...
Oracle Machine Learning Overview and From Oracle Data Professional to Oracle ...
 
Lecture 19
Lecture 19Lecture 19
Lecture 19
 

Último

Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Monika Rani
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLkantirani197
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 

Último (20)

Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 

Modeling missing data in distant supervision for information extraction (Ritter+, TACL 2013)

  • 1. Modeling Missing Data in Distant Supervision for Information Extraction Alan Ritter (CMU) Luke Zettlemoyer(University of Washington) Mausam(University of Washington) Oren Etzioni(Vulcan Inc.) TACL, 1, 367-378, 2013. Presented by NaoakiOkazaki (Tohoku University) 2014-09-05 Modeling Missing Data in Distant Supervision 1
  • 2. Relation instance extraction Steven Spielberg’s film Saving Private Ryan is loosely based on the brothers’ story. Extractor Film Director Saving Private Ryan Steven Spielberg Film-director relation • Fully-supervised learning (Zhou+ 05, …) • Uses ACE corpora to build relation-instance classifiers • Suffers from the limited number of training data • Unsupervised information extraction (Banko+ 07, …) • Extracts relational patterns between entities, and clusters the patterns into relations • Difficult to map clusters into relations of interest • Bootstrap learning (Brin98, …) • Uses seed instances to extract a new set of relational patterns • Often suffers from low precision (semantic drift) • Distant supervision (Mintz+ 09, …) • Combines the advantages of the above approaches 2014-09-05 Modeling Missing Data in Distant Supervision 2
  • 3. Distant supervision (Mintz+, 09) Person Birthplace EdwinHubble Marshfield … … Automatic annotation Astronomer Edwin Hubble was born in Marshfield, Missouri. Feature extraction Mintzet al. (2009) Distant supervision for relation extraction without labeled data. ACL-2009, pages 1003–1011. * Each row presents a single feature. Concatenate features from different sentences containing the same entity pairs. Problem: An entity pair cannot have multiple relations E.g., Founded(Jobs, Apple) and CEO-of(Jobs, Apple) are true. 2014-09-05 Modeling Missing Data in Distant Supervision 3
  • 4. MultiR(Hoffmann+, 11) Introduces latent variables (푧푧푖푖) to indicate the relation expressed by sentence 푥푥푖푖 0 1 1 0 Founder Founder CEO-of 푦푦born−in 푦푦founder 푦푦CEO−of 푦푦capital−of Steve Jobs was founder of Apple. Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple. Steve Jobs is CEO of Apple. 푧푧1 푧푧2 푧푧3 푝푝풚풚,풛풛풙풙 = 1 푍푍푥푥 ෑ 푟푟 Φjoin(푦푦푟푟,풛풛)ෑ 푖푖 Φextract(푧푧푖푖,푥푥푖푖) 푥푥1 푥푥2 푥푥3 풛풛 풙풙 풚풚 For entity pair, (Steve Jobs, Apple) 푥푥푖푖: a sentence containing the entity pair 푦푦푟푟∈{0,1}: 1if the knowledge base includes the pair with relation 푟푟, 0otherwise 푧푧푖푖∈푅푅: the relation expressed by sentence 푥푥푖푖 Φextract푧푧푖푖,푥푥푖푖=exp෍ 푗푗 휃휃푗푗휙휙푗푗(푧푧푖푖,푥푥푖푖) Φjoin푦푦푟푟,풛풛=1(¬푦푦푟푟⋁∃푖푖: 푗푗=푧푧푖푖) (Deterministic OR) The same as (Mintz+ 09) Φjoinensures that a sentence 푥푥푖푖expressing the relation 푟푟exists if 푟푟is true Allows multiple relations for the same entity pair 2014-09-05 Modeling Missing Data in Distant Supervision 4
  • 5. MultiR: Training Hoffmann et al. (2011) Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations. ACL-2011, pages 541–550. Loop for passes over the training data Loop for entity pairs in the KB Predict sentence-level and KB-level relations (ignoring the facts in the KB) Find an optimal assignment of sentence-level relations consistent with the facts in KB We need two kinds of inferences Update feature weights similarly to the perceptron algorithm 2014-09-05 Modeling Missing Data in Distant Supervision 5
  • 6. MultiR: Inference 1: argmax 풚풚,풛풛 푝푝(풚풚,풛풛|풙풙) ? ? ? ? ? ? ? 푦푦born−in 푦푦founder 푦푦CEO−of 푦푦capital−of Steve Jobs was founder of Apple. Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple. Steve Jobs is CEO of Apple. 푧푧1 푧푧2 푧푧3 푥푥1 푥푥2 푥푥3 풛풛 풙풙 풚풚 For entity pair, (Steve Jobs, Apple) 0.5 16.0 9.0 0.1 8.0 11.0 6.0 0.1 7.0 8.0 7.0 0.2 born−in founder CEO−of capita−of Predict a relation label for each sentence independently Aggregate sentence- level predictions into global-level predictions 2014-09-05 Modeling Missing Data in Distant Supervision 6
  • 7. MultiR: Inference 1: argmax 풚풚,풛풛 푝푝(풚풚,풛풛|풙풙) 0 1 0 0 founder founder founder 푦푦born−in 푦푦founder 푦푦CEO−of 푦푦capital−of Steve Jobs was founder of Apple. Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple. Steve Jobs is CEO of Apple. 푧푧1 푧푧2 푧푧3 푥푥1 푥푥2 푥푥3 풛풛 풙풙 풚풚 For entity pair, (Steve Jobs, Apple) 0.5 16.0 9.0 0.1 8.0 11.0 6.0 0.1 7.0 8.0 7.0 0.2 born−in founder CEO−of capita−of Predict a relation label for each sentence independently Aggregate sentence- level predictions into global-level predictions Very easy to find! Computational cost: 표표(푅푅풙풙) 2014-09-05 Modeling Missing Data in Distant Supervision 7
  • 8. MultiR: Inference 2: argmax 풛풛 푝푝(풛풛|풙풙,풚풚) 0 1 1 0 ? ? ? 푦푦born−in 푦푦founder 푦푦CEO−of 푦푦capital−of Steve Jobs was founder of Apple. Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple. Steve Jobs is CEO of Apple. 푧푧1 푧푧2 푧푧3 푥푥1 푥푥2 푥푥3 풛풛 풙풙 풚풚 For entity pair, (Steve Jobs, Apple) 0.5 16.0 9.0 0.1 8.0 11.0 6.0 0.1 7.0 8.0 7.0 0.2 born−in founder CEO−of capita−of 0.5 8 7 16 11 8 9 6 7 0.1 0.1 0.2 Define an edge weight: w푦푦푟푟,푧푧푖푖=Φextract(푟푟,푥푥푖푖) A node with 푦푦푟푟=1must have at least an edge connecting to 푧푧푖푖 Each node 푧푧푖푖must have an edge connecting to 푦푦푟푟 Find a set of edges that maximize the sum of weights 2014-09-05 Modeling Missing Data in Distant Supervision 8
  • 9. MultiR: Inference 2: argmax 풛풛 푝푝(풛풛|풙풙,풚풚) 0 1 1 0 founder founder CEO-of 푦푦born−in 푦푦founder 푦푦CEO−of 푦푦capital−of Steve Jobs was founder of Apple. Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple. Steve Jobs is CEO of Apple. 푧푧1 푧푧2 푧푧3 푥푥1 푥푥2 푥푥3 풛풛 풙풙 풚풚 For entity pair, (Steve Jobs, Apple) 0.5 16.0 9.0 0.1 8.0 11.0 6.0 0.1 7.0 8.0 7.0 0.2 born−in founder CEO−of capita−of 16 11 8 9 6 7 Define an edge weight: w푦푦푟푟,푧푧푖푖=Φextract(푟푟,푥푥푖푖) A node with 푦푦푟푟=1must have at least an edge connecting to 푧푧푖푖 Each node 푧푧푖푖must have an edge connecting to 푦푦푟푟 Find a set of edges that maximize the sum of weights Exact solution in polynomial time In practice, approximate solution by greedy search (assigning 푧푧푖푖for each node 푦푦푟푟=1) is sufficient 2014-09-05 Modeling Missing Data in Distant Supervision 9
  • 10. Contribution of this work • MultiRmakes two assumptions (hard constraints): • If a fact is not found in the database, it cannot be mentioned in the text • If a fact is in the database, it must be mentioned in at least one sentence. • Relax MultiRto handle the situation where: • A fact is not mentioned in text (MIT) • A fact mentioned in text is missing in database (MID) • Side effect of this relaxation • Incorporates the tendency that the knowledge base is likely to include popular entities and relations 2014-09-05 Modeling Missing Data in Distant Supervision 10
  • 11. Distant Supervision with Data Not Missing at Random (DNMAR) 0 1 1 0 Founder Founder visit 푦푦born−in 푦푦founder 푦푦CEO−of 푦푦visit Steve Jobs was founder of Apple. Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple. Steve Jobs visited Apple store… 푧푧1 푧푧2 푧푧3 푥푥1 푥푥2 푥푥3 풛풛 풙풙 풚풚 For entity pair, (Steve Jobs, Apple) 0 1 0 1 풕풕 Introduce a layer of latent variables (푡푡푟푟) to handle missing cases 휙휙miss푦푦푟푟,푡푡푟푟 = −훼훼푀푀푀푀푀푀(푦푦푟푟=1⋀푡푡푟푟=0) (missingintext) −훼훼푀푀푀푀푀푀(푦푦푟푟=0⋀푡푡푟푟=1) (missinginDB) 0(otherwise) Relaxing two hard constraints in MultiRinto soft oneswith penalty factors −훼훼푀푀푀푀푀푀and −훼훼푀푀푀푀푀푀 Introduce a new factor: Training algorithm is the same as the one used in MultiR 2014-09-05 Modeling Missing Data in Distant Supervision 11
  • 12. Constrained inference: argmax 풛풛 푝푝(풛풛|풙풙,풚풚) 0 1 1 0 ? ? ? 푦푦born−in 푦푦founder 푦푦CEO−of 푦푦visit Steve Jobs was founder of Apple. Steve Jobs, Steve Wozniak and Ronald Wayne founded Apple. Steve Jobs visited Apple store… 푧푧1 푧푧2 푧푧3 푥푥1 푥푥2 푥푥3 풛풛 풙풙 풚풚 For entity pair, (Steve Jobs, Apple) ? ? ? ? 풕풕 푧푧∗=argmax 풛풛 ෍ 푖푖=1 푛푛 휃휃ȉΦextract푧푧푖푖,푥푥푖푖+෍ 푟푟 훼훼푀푀푀푀푇ȉ1(푦푦푟푟⋁∃푖푖:푟푟=푧푧푖푖)−훼훼푀푀푀푀퐷ȉ1(¬푦푦푟푟⋁∃푖푖:푟푟=푧푧푖푖) Became more challenging A* search can find an exact solution, but is not scalable with many variables Present a greedy hill climbing approach for the inference: 1. Initialize 푧푧푖푖at random 2. Obtain neighborhoods of the current solution 3. Move to the neighbor yielding the highest score 4. Repeat this process 2014-09-05 Modeling Missing Data in Distant Supervision 12
  • 13. Incorporating popularity in KB • We tune the penalty factors 훼훼푀푀푀푀푀푀and 훼훼푀푀푀푀퐷on a development set • We can take into account how likely each fact is to be observed in the text and the knowledge base • Facts about Barack Obama are likelyto exist • Facts about NaoakiOkazaki are unlikelyto exists • Control the penalty factor for each entity pair • Popularity of entities: 훼훼푀푀푀푀푀푀 (푒푒1,푒푒2)=−훾훾min(푐푐푒푒1,푐푐(푒푒2)) • A larger penalty if the model predicts that a fact about a popular entity does not exist in KB • Well-aligned relations: assign 3 kinds of values of 훼훼푀푀푀푀푇푟푟 • A larger penalty if a popular relation such as contains, place_lived, and nationalitydoes not exist in text 2014-09-05 Modeling Missing Data in Distant Supervision 13
  • 14. Experiments • Binary relation extraction • The standard setting (Riedel+, 10) • Knowledge base: Freebase relations • Text corpus: 1.8m New York Times articles • Two kinds of evaluation • Sentence-level extractions using the dataset (Hoffmann+, 11) • Holdout evaluation on Freebase knowledge • Unary relation extraction (NE categorization) • Twitter NE categorization dataset (Ritter+, 11) • Knowledge base: Freebase (instances and their categories) • Text corpus: tweets • Hold-out evaluation 2014-09-05 Modeling Missing Data in Distant Supervision 14
  • 15. Results 17% increase in area under the curve. Incorporating popularity yielded 27% increase over the baseline. This evaluation underestimate precision because many facts correctly extracted from text are missing in the database. DNMAR doubled the recall. Ritter et al. (2013) Modeling Missing Data in Distant Supervision for Information Extraction, TACL(1), 367-378. 2014-09-05 Modeling Missing Data in Distant Supervision 15
  • 16. Conclusion • Investigated the problem of missing data in distant supervision • Presented an extension of MultiRto handle missing data • Could incorporate the popularity of facts to be included in the knowledge base and text • Presented a scalable inference algorithm based on greedy hill-climbing • Demonstrated the effectiveness of the modeling 2014-09-05 Modeling Missing Data in Distant Supervision 16
  • 17. References • Raphael Hoffmann, CongleZhang, Xiao Ling, Luke Zettlemoyer, Daniel S. Weld. (2011) Knowledge- Based Weak Supervision for Information Extraction of Overlapping Relations. ACL-2011, pages 541–550. •Slides and codes • Mike Mintz, Steven Bills, RionSnow, Dan Jurafsky. (2009) Distant supervision for relation extraction without labeled data. ACL-2009, pages 1003–1011. 2014-09-05 Modeling Missing Data in Distant Supervision 17