SlideShare una empresa de Scribd logo
1 de 22
Descargar para leer sin conexión
[Freedman+ EMNLP11] Extreme
Extraction – Machine Reading in a
              Week

                23 Dec 2011
      Nakatani Shuyo @ Cybozu labs, Inc
               twitter : @shuyo
Abstract
• Target:
  – Rapid construction of concept and relation
    extraction system
• Method:
  – Extend an existing ACE system for new relation
  – in short time with minimum training data
     • in a Week (<50 person hours) with <20 example pairs
  – Evaluate by question answering task
Phases
1. Ontology and resources
2. Extending system for new ontology
3. Extracting relations
4. Evaluation
1. Ontology and resources
• possibleTreatment( Substance, Condition )
   – SSRIs(S) are effective treatments for depression(C)
• expectedDateOnMarket( Substance , Date )
   – More drugs for type 2(S) expected on market soon(D)
• responsibleForTreatment( Substance, Agent )
   – Officials(A) Responsible for Treatment of War Dead(S)
• studiesDisease( Agent , Condition )                       not
                                                           sure
   – cancer(C) researcher Dr. Henri Joyeux(A)
• hasSideEffect( Substance, Condition )
2. Extending system for new
               ontology
• Add new relation/class detectors into “our”
  extraction system for ACE task
  – Details of the system are not clear...
     • Class detectors with unsupervised word clustering
     • Bootstrap relation learner with a template and seeds
     • Pattern learning for relation extraction

• Annotate words for 4 classes
• Coreference
Bootstrap relation learner
• DAP(Double-Anchored Pattern) (Kozareva+ 08)
  – Web search with a query based on “<CLASS>
    such as <SEED> and *”
  – Add words at the position “*” in snippet into the
    class member as new seeds
  – Repeat “the bootstraping loop” while seeds are
    available
Relation detection with DAP
• CLASS = disease / SEED = cold
• Web search = “disease such as cold and”
Relation detection with DAP
• CLASS = disease / SEED = cold
• Web search = “disease such as cold and”
  – disease such as cold and flu (9). ...
  – disease such as cold and heat, external ...
  – disease such as cold and pneumonia. ...
  – disease (such as cold and hot diseases), ...
  – disease such as cold and flu viruses. ...
  – disease such as cold and food poisoning. ...
Four classes to annotate
• Substance-Name
  – medicine name
• Substance-Description
  – e.g. “new drags”
• Condition-Name
  – name of disease
• Condition-Description
  – e.g. “the illness”
Annotation
• Name tagging with active learning(Miller+ 04)
  – Unsupervised word clustering on binary tree
    (Brown+ 90)
  – Tagging with clustering information
     • Averaged Perceptron (Collins 02)

  – Request annotation for selected sentence based on
    “confidence score”
     • score = (highest perceptron score) - (second one)

                                       !?
Results of Class Detection
            What’s
       GS(GoldStandard)?




                                         from [Freedman+ 11]
• substances & conditions
   – -Name / -Description respectively
• without/with lists of known substances and conditions
Coreference
• It took the most time(20 of 43 hours)
• But its detail is not clear...
  – domain independent heuristics
  – appositive linking
3. Extracting relations
• Learned Patterns vs. Handwritten Patterns




                from [Freedman+ 11]
from [Freedman+ 11]
4. Evaluation
• Question Answering with extracted
  information


• Query examples
  – Find possible treatments for diabetes
  – What is expected date to market for Abilify?
Answer Example
• ACME produces a wide range of drugs
  including treatments for malaria and
  athletes foot
  – responsibleForTreatment(drugs, ACME)
  – possibleTreatment(drugs, malaria)
  – possibleTreatment(drugs, athletes foot)
from [Freedman+ 11]

• useful = answering complex query
When non-useful answers are removed




                                           from [Freedman+ 11]
•   annotator’s recall (A)
•   using combining both (C)
•   using only handwritten rules (H, HW)
•   using only learned patterns (L)
from [Freedman+ 11]
Discussion




 from [Freedman+ 11]
Conclusions
• The combination system can achieve
  F1 of 0.51 in a new domain in a week.
• It requires so little training data.
• The effectiveness of learning algorithms is
  still not competitive with handwritten
  patterns.
References
• [Freedman+ 11] Extreme Extraction – Machine
  Reading in a Week
• [Kozareva+ 08] Semantic Class Learning from the
  Web with Hyponym Pattern Linkage
• [Miller+ 04] Name Tagging with Word Cluster and
  Discriminative Training
   – [Brown+ 90] Class-based n-gram models of natural
     language
   – [Collins 02] Discriminative Training Methods for Hidden
     Markov Models: Theory and Experiments with Perceptron
     Algorithm

Más contenido relacionado

Similar a Extreme Extraction - Machine Reading in a Week

Blueprinting and drafting examination questions, Liz Norman, ANZCVS Exam Writ...
Blueprinting and drafting examination questions, Liz Norman, ANZCVS Exam Writ...Blueprinting and drafting examination questions, Liz Norman, ANZCVS Exam Writ...
Blueprinting and drafting examination questions, Liz Norman, ANZCVS Exam Writ...Liz Norman
 
Studying ppl scientifically nb 913
Studying ppl scientifically nb 913Studying ppl scientifically nb 913
Studying ppl scientifically nb 913Jim Forde
 
Soc. Unit I, Packet 2
Soc. Unit I, Packet 2Soc. Unit I, Packet 2
Soc. Unit I, Packet 2NHSDAnderson
 
Nursingnotes.info nursing-research-review
Nursingnotes.info nursing-research-reviewNursingnotes.info nursing-research-review
Nursingnotes.info nursing-research-reviewgrey clemente
 
Variations in citation practices across the scientific landscape: Analysis ba...
Variations in citation practices across the scientific landscape: Analysis ba...Variations in citation practices across the scientific landscape: Analysis ba...
Variations in citation practices across the scientific landscape: Analysis ba...Wout Lamers
 
MELJUN CORTES research seminar_1_the_research_process_coming_to_terms
MELJUN CORTES research seminar_1_the_research_process_coming_to_termsMELJUN CORTES research seminar_1_the_research_process_coming_to_terms
MELJUN CORTES research seminar_1_the_research_process_coming_to_termsMELJUN CORTES
 
Studyingpplscientificallynb914
Studyingpplscientificallynb914Studyingpplscientificallynb914
Studyingpplscientificallynb914Jim Forde
 
Clinical Epidemiology - Systematic PubMed Searching Workshop
Clinical Epidemiology - Systematic PubMed Searching WorkshopClinical Epidemiology - Systematic PubMed Searching Workshop
Clinical Epidemiology - Systematic PubMed Searching WorkshopRobin Featherstone
 
Systematic Reviews and Knowledge Syntheses: What a Librarian Needs to Know
Systematic Reviews and Knowledge Syntheses: What a Librarian Needs to KnowSystematic Reviews and Knowledge Syntheses: What a Librarian Needs to Know
Systematic Reviews and Knowledge Syntheses: What a Librarian Needs to KnowLorie Kloda
 
Information retrieval in systematic reviews: a case study of the crime preven...
Information retrieval in systematic reviews: a case study of the crime preven...Information retrieval in systematic reviews: a case study of the crime preven...
Information retrieval in systematic reviews: a case study of the crime preven...Lisa Tompson
 
Methodology and research process
Methodology and research processMethodology and research process
Methodology and research processToufik Kasmi
 
The best research method طرق البحث
The best research method طرق البحثThe best research method طرق البحث
The best research method طرق البحثabdullah alhariri
 
Pronunciation App - Research Proposal
Pronunciation App - Research ProposalPronunciation App - Research Proposal
Pronunciation App - Research ProposalLiza Pesenson
 
Classroom Research
Classroom ResearchClassroom Research
Classroom Researchharrindl
 

Similar a Extreme Extraction - Machine Reading in a Week (20)

Blueprinting and drafting examination questions, Liz Norman, ANZCVS Exam Writ...
Blueprinting and drafting examination questions, Liz Norman, ANZCVS Exam Writ...Blueprinting and drafting examination questions, Liz Norman, ANZCVS Exam Writ...
Blueprinting and drafting examination questions, Liz Norman, ANZCVS Exam Writ...
 
Studying ppl scientifically nb 913
Studying ppl scientifically nb 913Studying ppl scientifically nb 913
Studying ppl scientifically nb 913
 
R methods 66
R methods 66R methods 66
R methods 66
 
Soc. Unit I, Packet 2
Soc. Unit I, Packet 2Soc. Unit I, Packet 2
Soc. Unit I, Packet 2
 
Nursingnotes.info nursing-research-review
Nursingnotes.info nursing-research-reviewNursingnotes.info nursing-research-review
Nursingnotes.info nursing-research-review
 
Effectiveness of New, Informationist-led Curriculum Changes at the College of...
Effectiveness of New, Informationist-led Curriculum Changes at the College of...Effectiveness of New, Informationist-led Curriculum Changes at the College of...
Effectiveness of New, Informationist-led Curriculum Changes at the College of...
 
Variations in citation practices across the scientific landscape: Analysis ba...
Variations in citation practices across the scientific landscape: Analysis ba...Variations in citation practices across the scientific landscape: Analysis ba...
Variations in citation practices across the scientific landscape: Analysis ba...
 
MELJUN CORTES research seminar_1_the_research_process_coming_to_terms
MELJUN CORTES research seminar_1_the_research_process_coming_to_termsMELJUN CORTES research seminar_1_the_research_process_coming_to_terms
MELJUN CORTES research seminar_1_the_research_process_coming_to_terms
 
Studyingpplscientificallynb914
Studyingpplscientificallynb914Studyingpplscientificallynb914
Studyingpplscientificallynb914
 
Clinical Epidemiology - Systematic PubMed Searching Workshop
Clinical Epidemiology - Systematic PubMed Searching WorkshopClinical Epidemiology - Systematic PubMed Searching Workshop
Clinical Epidemiology - Systematic PubMed Searching Workshop
 
Systematic Reviews and Knowledge Syntheses: What a Librarian Needs to Know
Systematic Reviews and Knowledge Syntheses: What a Librarian Needs to KnowSystematic Reviews and Knowledge Syntheses: What a Librarian Needs to Know
Systematic Reviews and Knowledge Syntheses: What a Librarian Needs to Know
 
Searching for evidence - Paramedicine
Searching for evidence - ParamedicineSearching for evidence - Paramedicine
Searching for evidence - Paramedicine
 
Information retrieval in systematic reviews: a case study of the crime preven...
Information retrieval in systematic reviews: a case study of the crime preven...Information retrieval in systematic reviews: a case study of the crime preven...
Information retrieval in systematic reviews: a case study of the crime preven...
 
Meta analysis_Sharanbasappa
Meta analysis_SharanbasappaMeta analysis_Sharanbasappa
Meta analysis_Sharanbasappa
 
Exercise Science
Exercise ScienceExercise Science
Exercise Science
 
Methodology and research process
Methodology and research processMethodology and research process
Methodology and research process
 
The best research method طرق البحث
The best research method طرق البحثThe best research method طرق البحث
The best research method طرق البحث
 
Podiatry: Searching for Evidence
Podiatry: Searching for EvidencePodiatry: Searching for Evidence
Podiatry: Searching for Evidence
 
Pronunciation App - Research Proposal
Pronunciation App - Research ProposalPronunciation App - Research Proposal
Pronunciation App - Research Proposal
 
Classroom Research
Classroom ResearchClassroom Research
Classroom Research
 

Más de Shuyo Nakatani

画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15
画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15
画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15Shuyo Nakatani
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networksShuyo Nakatani
 
無限関係モデル (続・わかりやすいパターン認識 13章)
無限関係モデル (続・わかりやすいパターン認識 13章)無限関係モデル (続・わかりやすいパターン認識 13章)
無限関係モデル (続・わかりやすいパターン認識 13章)Shuyo Nakatani
 
Memory Networks (End-to-End Memory Networks の Chainer 実装)
Memory Networks (End-to-End Memory Networks の Chainer 実装)Memory Networks (End-to-End Memory Networks の Chainer 実装)
Memory Networks (End-to-End Memory Networks の Chainer 実装)Shuyo Nakatani
 
人工知能と機械学習の違いって?
人工知能と機械学習の違いって?人工知能と機械学習の違いって?
人工知能と機械学習の違いって?Shuyo Nakatani
 
RとStanでクラウドセットアップ時間を分析してみたら #TokyoR
RとStanでクラウドセットアップ時間を分析してみたら #TokyoRRとStanでクラウドセットアップ時間を分析してみたら #TokyoR
RとStanでクラウドセットアップ時間を分析してみたら #TokyoRShuyo Nakatani
 
ドラえもんでわかる統計的因果推論 #TokyoR
ドラえもんでわかる統計的因果推論 #TokyoRドラえもんでわかる統計的因果推論 #TokyoR
ドラえもんでわかる統計的因果推論 #TokyoRShuyo Nakatani
 
[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...
[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...
[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...Shuyo Nakatani
 
星野「調査観察データの統計科学」第3章
星野「調査観察データの統計科学」第3章星野「調査観察データの統計科学」第3章
星野「調査観察データの統計科学」第3章Shuyo Nakatani
 
星野「調査観察データの統計科学」第1&2章
星野「調査観察データの統計科学」第1&2章星野「調査観察データの統計科学」第1&2章
星野「調査観察データの統計科学」第1&2章Shuyo Nakatani
 
言語処理するのに Python でいいの? #PyDataTokyo
言語処理するのに Python でいいの? #PyDataTokyo言語処理するのに Python でいいの? #PyDataTokyo
言語処理するのに Python でいいの? #PyDataTokyoShuyo Nakatani
 
Zipf? (ジップ則のひみつ?) #DSIRNLP
Zipf? (ジップ則のひみつ?) #DSIRNLPZipf? (ジップ則のひみつ?) #DSIRNLP
Zipf? (ジップ則のひみつ?) #DSIRNLPShuyo Nakatani
 
ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickh...
ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickh...ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickh...
ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickh...Shuyo Nakatani
 
ソーシャルメディアの多言語判定 #SoC2014
ソーシャルメディアの多言語判定 #SoC2014ソーシャルメディアの多言語判定 #SoC2014
ソーシャルメディアの多言語判定 #SoC2014Shuyo Nakatani
 
猫に教えてもらうルベーグ可測
猫に教えてもらうルベーグ可測猫に教えてもらうルベーグ可測
猫に教えてもらうルベーグ可測Shuyo Nakatani
 
アラビア語とペルシャ語の見分け方 #DSIRNLP 5
アラビア語とペルシャ語の見分け方 #DSIRNLP 5アラビア語とペルシャ語の見分け方 #DSIRNLP 5
アラビア語とペルシャ語の見分け方 #DSIRNLP 5Shuyo Nakatani
 
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013Shuyo Nakatani
 
Active Learning 入門
Active Learning 入門Active Learning 入門
Active Learning 入門Shuyo Nakatani
 
数式を綺麗にプログラミングするコツ #spro2013
数式を綺麗にプログラミングするコツ #spro2013数式を綺麗にプログラミングするコツ #spro2013
数式を綺麗にプログラミングするコツ #spro2013Shuyo Nakatani
 
ノンパラベイズ入門の入門
ノンパラベイズ入門の入門ノンパラベイズ入門の入門
ノンパラベイズ入門の入門Shuyo Nakatani
 

Más de Shuyo Nakatani (20)

画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15
画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15
画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
無限関係モデル (続・わかりやすいパターン認識 13章)
無限関係モデル (続・わかりやすいパターン認識 13章)無限関係モデル (続・わかりやすいパターン認識 13章)
無限関係モデル (続・わかりやすいパターン認識 13章)
 
Memory Networks (End-to-End Memory Networks の Chainer 実装)
Memory Networks (End-to-End Memory Networks の Chainer 実装)Memory Networks (End-to-End Memory Networks の Chainer 実装)
Memory Networks (End-to-End Memory Networks の Chainer 実装)
 
人工知能と機械学習の違いって?
人工知能と機械学習の違いって?人工知能と機械学習の違いって?
人工知能と機械学習の違いって?
 
RとStanでクラウドセットアップ時間を分析してみたら #TokyoR
RとStanでクラウドセットアップ時間を分析してみたら #TokyoRRとStanでクラウドセットアップ時間を分析してみたら #TokyoR
RとStanでクラウドセットアップ時間を分析してみたら #TokyoR
 
ドラえもんでわかる統計的因果推論 #TokyoR
ドラえもんでわかる統計的因果推論 #TokyoRドラえもんでわかる統計的因果推論 #TokyoR
ドラえもんでわかる統計的因果推論 #TokyoR
 
[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...
[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...
[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...
 
星野「調査観察データの統計科学」第3章
星野「調査観察データの統計科学」第3章星野「調査観察データの統計科学」第3章
星野「調査観察データの統計科学」第3章
 
星野「調査観察データの統計科学」第1&2章
星野「調査観察データの統計科学」第1&2章星野「調査観察データの統計科学」第1&2章
星野「調査観察データの統計科学」第1&2章
 
言語処理するのに Python でいいの? #PyDataTokyo
言語処理するのに Python でいいの? #PyDataTokyo言語処理するのに Python でいいの? #PyDataTokyo
言語処理するのに Python でいいの? #PyDataTokyo
 
Zipf? (ジップ則のひみつ?) #DSIRNLP
Zipf? (ジップ則のひみつ?) #DSIRNLPZipf? (ジップ則のひみつ?) #DSIRNLP
Zipf? (ジップ則のひみつ?) #DSIRNLP
 
ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickh...
ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickh...ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickh...
ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickh...
 
ソーシャルメディアの多言語判定 #SoC2014
ソーシャルメディアの多言語判定 #SoC2014ソーシャルメディアの多言語判定 #SoC2014
ソーシャルメディアの多言語判定 #SoC2014
 
猫に教えてもらうルベーグ可測
猫に教えてもらうルベーグ可測猫に教えてもらうルベーグ可測
猫に教えてもらうルベーグ可測
 
アラビア語とペルシャ語の見分け方 #DSIRNLP 5
アラビア語とペルシャ語の見分け方 #DSIRNLP 5アラビア語とペルシャ語の見分け方 #DSIRNLP 5
アラビア語とペルシャ語の見分け方 #DSIRNLP 5
 
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
 
Active Learning 入門
Active Learning 入門Active Learning 入門
Active Learning 入門
 
数式を綺麗にプログラミングするコツ #spro2013
数式を綺麗にプログラミングするコツ #spro2013数式を綺麗にプログラミングするコツ #spro2013
数式を綺麗にプログラミングするコツ #spro2013
 
ノンパラベイズ入門の入門
ノンパラベイズ入門の入門ノンパラベイズ入門の入門
ノンパラベイズ入門の入門
 

Último

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 

Último (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 

Extreme Extraction - Machine Reading in a Week

  • 1. [Freedman+ EMNLP11] Extreme Extraction – Machine Reading in a Week 23 Dec 2011 Nakatani Shuyo @ Cybozu labs, Inc twitter : @shuyo
  • 2. Abstract • Target: – Rapid construction of concept and relation extraction system • Method: – Extend an existing ACE system for new relation – in short time with minimum training data • in a Week (<50 person hours) with <20 example pairs – Evaluate by question answering task
  • 3. Phases 1. Ontology and resources 2. Extending system for new ontology 3. Extracting relations 4. Evaluation
  • 4. 1. Ontology and resources • possibleTreatment( Substance, Condition ) – SSRIs(S) are effective treatments for depression(C) • expectedDateOnMarket( Substance , Date ) – More drugs for type 2(S) expected on market soon(D) • responsibleForTreatment( Substance, Agent ) – Officials(A) Responsible for Treatment of War Dead(S) • studiesDisease( Agent , Condition ) not sure – cancer(C) researcher Dr. Henri Joyeux(A) • hasSideEffect( Substance, Condition )
  • 5. 2. Extending system for new ontology • Add new relation/class detectors into “our” extraction system for ACE task – Details of the system are not clear... • Class detectors with unsupervised word clustering • Bootstrap relation learner with a template and seeds • Pattern learning for relation extraction • Annotate words for 4 classes • Coreference
  • 6. Bootstrap relation learner • DAP(Double-Anchored Pattern) (Kozareva+ 08) – Web search with a query based on “<CLASS> such as <SEED> and *” – Add words at the position “*” in snippet into the class member as new seeds – Repeat “the bootstraping loop” while seeds are available
  • 7. Relation detection with DAP • CLASS = disease / SEED = cold • Web search = “disease such as cold and”
  • 8. Relation detection with DAP • CLASS = disease / SEED = cold • Web search = “disease such as cold and” – disease such as cold and flu (9). ... – disease such as cold and heat, external ... – disease such as cold and pneumonia. ... – disease (such as cold and hot diseases), ... – disease such as cold and flu viruses. ... – disease such as cold and food poisoning. ...
  • 9. Four classes to annotate • Substance-Name – medicine name • Substance-Description – e.g. “new drags” • Condition-Name – name of disease • Condition-Description – e.g. “the illness”
  • 10. Annotation • Name tagging with active learning(Miller+ 04) – Unsupervised word clustering on binary tree (Brown+ 90) – Tagging with clustering information • Averaged Perceptron (Collins 02) – Request annotation for selected sentence based on “confidence score” • score = (highest perceptron score) - (second one) !?
  • 11. Results of Class Detection What’s GS(GoldStandard)? from [Freedman+ 11] • substances & conditions – -Name / -Description respectively • without/with lists of known substances and conditions
  • 12. Coreference • It took the most time(20 of 43 hours) • But its detail is not clear... – domain independent heuristics – appositive linking
  • 13. 3. Extracting relations • Learned Patterns vs. Handwritten Patterns from [Freedman+ 11]
  • 15. 4. Evaluation • Question Answering with extracted information • Query examples – Find possible treatments for diabetes – What is expected date to market for Abilify?
  • 16. Answer Example • ACME produces a wide range of drugs including treatments for malaria and athletes foot – responsibleForTreatment(drugs, ACME) – possibleTreatment(drugs, malaria) – possibleTreatment(drugs, athletes foot)
  • 17. from [Freedman+ 11] • useful = answering complex query
  • 18. When non-useful answers are removed from [Freedman+ 11] • annotator’s recall (A) • using combining both (C) • using only handwritten rules (H, HW) • using only learned patterns (L)
  • 21. Conclusions • The combination system can achieve F1 of 0.51 in a new domain in a week. • It requires so little training data. • The effectiveness of learning algorithms is still not competitive with handwritten patterns.
  • 22. References • [Freedman+ 11] Extreme Extraction – Machine Reading in a Week • [Kozareva+ 08] Semantic Class Learning from the Web with Hyponym Pattern Linkage • [Miller+ 04] Name Tagging with Word Cluster and Discriminative Training – [Brown+ 90] Class-based n-gram models of natural language – [Collins 02] Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithm