SlideShare a Scribd company logo
1 of 30
The Web is not a PERSON, Berners-
Lee is not an ORGANIZATION, and
African-Americans are not
LOCATIONS:
An Analysis of the Performance of
Named-Entity Recognition
Robert Krovetz (Lexicalresearch.com), Paul Deane, Nitin
Madnani (ETS)




A Review by Richard
Littauer (UdS)
The Background
   Named-Entity Recognition (NER) is
    normally judged in the context of
    Information Extraction (IE)
The Background
 Named-Entity Recognition (NER) is
  normally judged in the context of
  Information Extraction (IE)
 Various competitions
The Background
 Named-Entity Recognition (NER) is
  normally judged in the context of
  Information Extraction (IE)
 Various competitions
 Recently:
    ◦ non-English languages
    ◦ improving unsupervised learning methods
The Background
   “There are no well-established
    standards for evaluation of NER.”
The Background
   “There are no well-established
    standards for evaluation of NER.”
    ◦ Criteria for NER system changes for
      competitions
    ◦ Proprietary software
The Background
   KDM wanted to identify MWEs…
The Background
   KDM wanted to identify MWEs…
      … but false positives, tagging
      inconsistencies stopped this.
The Background
   KDM wanted to identify MWEs…
      … but false positives, tagging
      inconsistencies stopped this.

 IE derives Recall and Precision from
  Information Retrieval
 NER is just a small part of this, so is
  rarely evaluated independently
The Background
   So, they want to test NER systems,
    and provide a unit test based on the
    problems encountered
Evaluation
Compared three NER taggers:
 Stanford:
    ◦ CRF, 100m training corpus;
   University of Illinois (LBJ):
    ◦ Regularized average perceptron, Reuters
      1996 News Corpus;
   BBN IdentiFinder (IdentiFinder):
    ◦ HMMs, commercial
Evaluation
   Agreement on Classification
Evaluation
 Agreement on Classification
 Ambiguity in Discourse
Evaluation
 Agreement on Classification
 Ambiguity in Discourse


 Stanford vs. LBJ on internal ETS
  425m corpus
 All three on American National Corpus
Stanford vs. LBJ
   NER reported as 85-95% accurate.
Stanford vs. LBJ
 NER reported as 85-95% accurate.
 Same number for both: 1.95m for
  Stanford, 1.8m for LBJ (7.6%
  difference)
 However, errors:
Stanford vs. LBJ
   Agreement:
Stanford vs. LBJ
   Ambiguity:
Stanford vs. LBJ vs.
IdentiFinder
   Agreement:
Stanford vs. LBJ vs.
IdentiFinder
   Agreement:
Stanford vs. LBJ vs.
IdentiFinder
   Differences:
    ◦ How they are tokenized
    ◦ Number of entities recognized overall
Stanford vs. LBJ vs.
IdentiFinder
   Ambiguity:
Unit Test
   Created two documents that can be
    used as texts
    ◦ Different cases for true positives of
      PERSON, LOCATION, ORGANIZATION
    ◦ Entirely upper case not NE (Ex.
      AAARGH)
    ◦ Punctuated terms not NE
    ◦ Terms with Initials
    ◦ Acronyms (some expanded, some not)
    ◦ Last names in close proximity to first
      names
Unit Test
   Created two documents that can be
    used as texts
    ◦ Terms with prepositions (Mass. Inst. Of
      Tech.)
    ◦ Terms with location and organization
      (Amherst College)

   Provided freely online.
One NE Tag per Discourse
 Unusual for multiple occurrences of a
  token in a document to be different
  entities
 True for homonyms
 An exception: Location + sports team
One NE Tag per Discourse
 Stanford, LBJ have features for non-
  local dependencies to help with this.
 KDM: Two other uses for NLD:
    ◦ Source of error in evaluation
    ◦ A way to identify semantically related
      entities

   These should be treated as
    exceptions
Discussion
 There are guidelines for NER – but we
  need standards.
 The community should focus on
  PERSON, ORGANISATION,
  LOCATION, and MISC.
    ◦   Harder to deal with than Dates, Times.
    ◦   Disagreement between taggers.
    ◦   MISC is necessary.
    ◦   These have important value elsewhere.
Discussion
   To improve intrinsic evaluation for
    NER:
    1. Create test sets for divers domains.
    2. Use standardized sets for different
       phenomena.
    3. Report accuracy for POL separately.
    4. Establish uncertainty in the tagging
       system.
Conclusion
 90% accuracy not real.
 We need to use only entities that are
  agreed on by multiple taggers.
 Even in cases where they both
  disagree (Hint: Future work.)

   Unit test downloadable.
Cheers/PERSON


Richard/ORGANISATION thanks the
Mword Class/LOCATION for listening to
his talk about Berners-Lee/MISC

More Related Content

Viewers also liked

Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...
Olivier Grisel
 

Viewers also liked (20)

Dictionary-based named entity recognition
Dictionary-based named entity recognitionDictionary-based named entity recognition
Dictionary-based named entity recognition
 
Named Entities
Named EntitiesNamed Entities
Named Entities
 
A Semi-Automatic Annotation Tool For Arabic Online Handwritten Text
A Semi-Automatic Annotation Tool For Arabic Online Handwritten TextA Semi-Automatic Annotation Tool For Arabic Online Handwritten Text
A Semi-Automatic Annotation Tool For Arabic Online Handwritten Text
 
Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...
Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...
Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...
 
Automatic Term Ambiguity Detection
Automatic Term Ambiguity DetectionAutomatic Term Ambiguity Detection
Automatic Term Ambiguity Detection
 
Exploring Linked Data content through network analysis
Exploring Linked Data content through network analysisExploring Linked Data content through network analysis
Exploring Linked Data content through network analysis
 
Linked Data: What’s the Story?
Linked Data: What’s the Story?Linked Data: What’s the Story?
Linked Data: What’s the Story?
 
Entity Search Engine
Entity Search Engine Entity Search Engine
Entity Search Engine
 
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...
 
Multlingual Linked Data Patterns
Multlingual Linked Data PatternsMultlingual Linked Data Patterns
Multlingual Linked Data Patterns
 
QER : query entity recognition
QER : query entity recognitionQER : query entity recognition
QER : query entity recognition
 
Text mining
Text miningText mining
Text mining
 
RDF and other linked data standards — how to make use of big localization data
RDF and other linked data standards — how to make use of big localization dataRDF and other linked data standards — how to make use of big localization data
RDF and other linked data standards — how to make use of big localization data
 
Dynamically Optimizing Queries over Large Scale Data Platforms
Dynamically Optimizing Queries over Large Scale Data PlatformsDynamically Optimizing Queries over Large Scale Data Platforms
Dynamically Optimizing Queries over Large Scale Data Platforms
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked Data
 
Interaction with Linked Data
Interaction with Linked DataInteraction with Linked Data
Interaction with Linked Data
 
Discoverers of Surface Analysis
Discoverers of Surface AnalysisDiscoverers of Surface Analysis
Discoverers of Surface Analysis
 
Enhancing Entity Linking by Combining NER Models
Enhancing Entity Linking by Combining NER ModelsEnhancing Entity Linking by Combining NER Models
Enhancing Entity Linking by Combining NER Models
 
Natural language procssing
Natural language procssing Natural language procssing
Natural language procssing
 
Recipes for PhD
Recipes for PhDRecipes for PhD
Recipes for PhD
 

Similar to Named Entity Recognition - ACL 2011 Presentation

130102 venera arnaoudova - a new family of software anti-patterns linguisti...
130102   venera arnaoudova - a new family of software anti-patterns linguisti...130102   venera arnaoudova - a new family of software anti-patterns linguisti...
130102 venera arnaoudova - a new family of software anti-patterns linguisti...
Ptidej Team
 
How We Use Functional Programming to Find the Bad Guys
How We Use Functional Programming to Find the Bad GuysHow We Use Functional Programming to Find the Bad Guys
How We Use Functional Programming to Find the Bad Guys
New York City College of Technology Computer Systems Technology Colloquium
 

Similar to Named Entity Recognition - ACL 2011 Presentation (20)

Csmr13d.ppt
Csmr13d.pptCsmr13d.ppt
Csmr13d.ppt
 
130102 venera arnaoudova - a new family of software anti-patterns linguisti...
130102   venera arnaoudova - a new family of software anti-patterns linguisti...130102   venera arnaoudova - a new family of software anti-patterns linguisti...
130102 venera arnaoudova - a new family of software anti-patterns linguisti...
 
Creating an Urban Legend: A System for Electrophysiology Data Management and ...
Creating an Urban Legend: A System for Electrophysiology Data Management and ...Creating an Urban Legend: A System for Electrophysiology Data Management and ...
Creating an Urban Legend: A System for Electrophysiology Data Management and ...
 
Towards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology ApplicationsTowards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology Applications
 
leewayhertz.com-Named Entity Recognition NER Unveiling the value in unstructu...
leewayhertz.com-Named Entity Recognition NER Unveiling the value in unstructu...leewayhertz.com-Named Entity Recognition NER Unveiling the value in unstructu...
leewayhertz.com-Named Entity Recognition NER Unveiling the value in unstructu...
 
asdrfasdfasdf
asdrfasdfasdfasdrfasdfasdf
asdrfasdfasdf
 
SANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference ResolutionSANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference Resolution
 
How We Use Functional Programming to Find the Bad Guys
How We Use Functional Programming to Find the Bad GuysHow We Use Functional Programming to Find the Bad Guys
How We Use Functional Programming to Find the Bad Guys
 
Learn How to Overcome Patient Identity Challenges
Learn How to Overcome Patient Identity ChallengesLearn How to Overcome Patient Identity Challenges
Learn How to Overcome Patient Identity Challenges
 
columbia-gwu
columbia-gwucolumbia-gwu
columbia-gwu
 
Data Science Course In Pune
Data Science Course In Pune Data Science Course In Pune
Data Science Course In Pune
 
data science institute in bangalore
data science institute in bangaloredata science institute in bangalore
data science institute in bangalore
 
Data Science Course Pune
Data Science Course PuneData Science Course Pune
Data Science Course Pune
 
Data science course pdf
Data science course pdfData science course pdf
Data science course pdf
 
Data Science course in Pune
Data Science course in PuneData Science course in Pune
Data Science course in Pune
 
Data science course in pune
Data science course in puneData science course in pune
Data science course in pune
 
data science courses in banglore
data science courses in bangloredata science courses in banglore
data science courses in banglore
 
Data science certification
Data science certificationData science certification
Data science certification
 
Data Science Course
Data Science CourseData Science Course
Data Science Course
 
Data Science Course
Data Science CourseData Science Course
Data Science Course
 

More from Richard Littauer

On Tocharian Exceptionality to the centum/satem Isogloss
On Tocharian Exceptionality to the centum/satem IsoglossOn Tocharian Exceptionality to the centum/satem Isogloss
On Tocharian Exceptionality to the centum/satem Isogloss
Richard Littauer
 
Evolution of Morphological Agreement - Peche Kucha
Evolution of Morphological Agreement - Peche KuchaEvolution of Morphological Agreement - Peche Kucha
Evolution of Morphological Agreement - Peche Kucha
Richard Littauer
 
Towards Open Methods: Using Scientific Workflows in Linguistics
Towards Open Methods: Using Scientific Workflows in LinguisticsTowards Open Methods: Using Scientific Workflows in Linguistics
Towards Open Methods: Using Scientific Workflows in Linguistics
Richard Littauer
 

More from Richard Littauer (14)

Academic Research in the Blogosphere: Adapting to New Risks and Opportunities...
Academic Research in the Blogosphere: Adapting to New Risks and Opportunities...Academic Research in the Blogosphere: Adapting to New Risks and Opportunities...
Academic Research in the Blogosphere: Adapting to New Risks and Opportunities...
 
Marcu 2000 presentation
Marcu 2000 presentationMarcu 2000 presentation
Marcu 2000 presentation
 
Barzilay & Lapata 2008 presentation
Barzilay & Lapata 2008 presentationBarzilay & Lapata 2008 presentation
Barzilay & Lapata 2008 presentation
 
Saarland and UdS
Saarland and UdSSaarland and UdS
Saarland and UdS
 
Building Corpora from Social Media
Building Corpora from Social MediaBuilding Corpora from Social Media
Building Corpora from Social Media
 
Visualising Typological Relationships: Plotting WALS with Heat Maps
Visualising Typological Relationships: Plotting WALS with Heat MapsVisualising Typological Relationships: Plotting WALS with Heat Maps
Visualising Typological Relationships: Plotting WALS with Heat Maps
 
On Tocharian Exceptionality to the centum/satem Isogloss
On Tocharian Exceptionality to the centum/satem IsoglossOn Tocharian Exceptionality to the centum/satem Isogloss
On Tocharian Exceptionality to the centum/satem Isogloss
 
The Evolution of Morphological Agreement
The Evolution of Morphological AgreementThe Evolution of Morphological Agreement
The Evolution of Morphological Agreement
 
Trends in Use of Scientific Workflows: Insights from a Public Repository and ...
Trends in Use of Scientific Workflows: Insights from a Public Repository and ...Trends in Use of Scientific Workflows: Insights from a Public Repository and ...
Trends in Use of Scientific Workflows: Insights from a Public Repository and ...
 
Evolution of Morphological Agreement - Peche Kucha
Evolution of Morphological Agreement - Peche KuchaEvolution of Morphological Agreement - Peche Kucha
Evolution of Morphological Agreement - Peche Kucha
 
Workflow Classification and Open-Sourcing Methods: Towards a New Publication ...
Workflow Classification and Open-Sourcing Methods: Towards a New Publication ...Workflow Classification and Open-Sourcing Methods: Towards a New Publication ...
Workflow Classification and Open-Sourcing Methods: Towards a New Publication ...
 
The Evolution of Speech Segmentation: A Computer Simulation
The Evolution of Speech Segmentation: A Computer SimulationThe Evolution of Speech Segmentation: A Computer Simulation
The Evolution of Speech Segmentation: A Computer Simulation
 
Towards Open Methods: Using Scientific Workflows in Linguistics
Towards Open Methods: Using Scientific Workflows in LinguisticsTowards Open Methods: Using Scientific Workflows in Linguistics
Towards Open Methods: Using Scientific Workflows in Linguistics
 
A Reanalysis of Anatomical Changes for Language
A Reanalysis of Anatomical Changes for LanguageA Reanalysis of Anatomical Changes for Language
A Reanalysis of Anatomical Changes for Language
 

Recently uploaded

會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
中 央社
 
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
中 央社
 
The basics of sentences session 4pptx.pptx
The basics of sentences session 4pptx.pptxThe basics of sentences session 4pptx.pptx
The basics of sentences session 4pptx.pptx
heathfieldcps1
 

Recently uploaded (20)

會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
 
Dementia (Alzheimer & vasular dementia).
Dementia (Alzheimer & vasular dementia).Dementia (Alzheimer & vasular dementia).
Dementia (Alzheimer & vasular dementia).
 
When Quality Assurance Meets Innovation in Higher Education - Report launch w...
When Quality Assurance Meets Innovation in Higher Education - Report launch w...When Quality Assurance Meets Innovation in Higher Education - Report launch w...
When Quality Assurance Meets Innovation in Higher Education - Report launch w...
 
demyelinated disorder: multiple sclerosis.pptx
demyelinated disorder: multiple sclerosis.pptxdemyelinated disorder: multiple sclerosis.pptx
demyelinated disorder: multiple sclerosis.pptx
 
II BIOSENSOR PRINCIPLE APPLICATIONS AND WORKING II
II BIOSENSOR PRINCIPLE APPLICATIONS AND WORKING IIII BIOSENSOR PRINCIPLE APPLICATIONS AND WORKING II
II BIOSENSOR PRINCIPLE APPLICATIONS AND WORKING II
 
Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45
Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45
Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45
 
Championnat de France de Tennis de table/
Championnat de France de Tennis de table/Championnat de France de Tennis de table/
Championnat de France de Tennis de table/
 
“O BEIJO” EM ARTE .
“O BEIJO” EM ARTE                       .“O BEIJO” EM ARTE                       .
“O BEIJO” EM ARTE .
 
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
 
PSYPACT- Practicing Over State Lines May 2024.pptx
PSYPACT- Practicing Over State Lines May 2024.pptxPSYPACT- Practicing Over State Lines May 2024.pptx
PSYPACT- Practicing Over State Lines May 2024.pptx
 
Including Mental Health Support in Project Delivery, 14 May.pdf
Including Mental Health Support in Project Delivery, 14 May.pdfIncluding Mental Health Support in Project Delivery, 14 May.pdf
Including Mental Health Support in Project Delivery, 14 May.pdf
 
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading Room
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading RoomSternal Fractures & Dislocations - EMGuidewire Radiology Reading Room
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading Room
 
Implanted Devices - VP Shunts: EMGuidewire's Radiology Reading Room
Implanted Devices - VP Shunts: EMGuidewire's Radiology Reading RoomImplanted Devices - VP Shunts: EMGuidewire's Radiology Reading Room
Implanted Devices - VP Shunts: EMGuidewire's Radiology Reading Room
 
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...
 
Graduate Outcomes Presentation Slides - English (v3).pptx
Graduate Outcomes Presentation Slides - English (v3).pptxGraduate Outcomes Presentation Slides - English (v3).pptx
Graduate Outcomes Presentation Slides - English (v3).pptx
 
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjj
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjjStl Algorithms in C++ jjjjjjjjjjjjjjjjjj
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjj
 
Đề tieng anh thpt 2024 danh cho cac ban hoc sinh
Đề tieng anh thpt 2024 danh cho cac ban hoc sinhĐề tieng anh thpt 2024 danh cho cac ban hoc sinh
Đề tieng anh thpt 2024 danh cho cac ban hoc sinh
 
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
 
MOOD STABLIZERS DRUGS.pptx
MOOD     STABLIZERS           DRUGS.pptxMOOD     STABLIZERS           DRUGS.pptx
MOOD STABLIZERS DRUGS.pptx
 
The basics of sentences session 4pptx.pptx
The basics of sentences session 4pptx.pptxThe basics of sentences session 4pptx.pptx
The basics of sentences session 4pptx.pptx
 

Named Entity Recognition - ACL 2011 Presentation

  • 1. The Web is not a PERSON, Berners- Lee is not an ORGANIZATION, and African-Americans are not LOCATIONS: An Analysis of the Performance of Named-Entity Recognition Robert Krovetz (Lexicalresearch.com), Paul Deane, Nitin Madnani (ETS) A Review by Richard Littauer (UdS)
  • 2. The Background  Named-Entity Recognition (NER) is normally judged in the context of Information Extraction (IE)
  • 3. The Background  Named-Entity Recognition (NER) is normally judged in the context of Information Extraction (IE)  Various competitions
  • 4. The Background  Named-Entity Recognition (NER) is normally judged in the context of Information Extraction (IE)  Various competitions  Recently: ◦ non-English languages ◦ improving unsupervised learning methods
  • 5. The Background  “There are no well-established standards for evaluation of NER.”
  • 6. The Background  “There are no well-established standards for evaluation of NER.” ◦ Criteria for NER system changes for competitions ◦ Proprietary software
  • 7. The Background  KDM wanted to identify MWEs…
  • 8. The Background  KDM wanted to identify MWEs… … but false positives, tagging inconsistencies stopped this.
  • 9. The Background  KDM wanted to identify MWEs… … but false positives, tagging inconsistencies stopped this.  IE derives Recall and Precision from Information Retrieval  NER is just a small part of this, so is rarely evaluated independently
  • 10. The Background  So, they want to test NER systems, and provide a unit test based on the problems encountered
  • 11. Evaluation Compared three NER taggers:  Stanford: ◦ CRF, 100m training corpus;  University of Illinois (LBJ): ◦ Regularized average perceptron, Reuters 1996 News Corpus;  BBN IdentiFinder (IdentiFinder): ◦ HMMs, commercial
  • 12. Evaluation  Agreement on Classification
  • 13. Evaluation  Agreement on Classification  Ambiguity in Discourse
  • 14. Evaluation  Agreement on Classification  Ambiguity in Discourse  Stanford vs. LBJ on internal ETS 425m corpus  All three on American National Corpus
  • 15. Stanford vs. LBJ  NER reported as 85-95% accurate.
  • 16. Stanford vs. LBJ  NER reported as 85-95% accurate.  Same number for both: 1.95m for Stanford, 1.8m for LBJ (7.6% difference)  However, errors:
  • 17. Stanford vs. LBJ  Agreement:
  • 18. Stanford vs. LBJ  Ambiguity:
  • 19. Stanford vs. LBJ vs. IdentiFinder  Agreement:
  • 20. Stanford vs. LBJ vs. IdentiFinder  Agreement:
  • 21. Stanford vs. LBJ vs. IdentiFinder  Differences: ◦ How they are tokenized ◦ Number of entities recognized overall
  • 22. Stanford vs. LBJ vs. IdentiFinder  Ambiguity:
  • 23. Unit Test  Created two documents that can be used as texts ◦ Different cases for true positives of PERSON, LOCATION, ORGANIZATION ◦ Entirely upper case not NE (Ex. AAARGH) ◦ Punctuated terms not NE ◦ Terms with Initials ◦ Acronyms (some expanded, some not) ◦ Last names in close proximity to first names
  • 24. Unit Test  Created two documents that can be used as texts ◦ Terms with prepositions (Mass. Inst. Of Tech.) ◦ Terms with location and organization (Amherst College)  Provided freely online.
  • 25. One NE Tag per Discourse  Unusual for multiple occurrences of a token in a document to be different entities  True for homonyms  An exception: Location + sports team
  • 26. One NE Tag per Discourse  Stanford, LBJ have features for non- local dependencies to help with this.  KDM: Two other uses for NLD: ◦ Source of error in evaluation ◦ A way to identify semantically related entities  These should be treated as exceptions
  • 27. Discussion  There are guidelines for NER – but we need standards.  The community should focus on PERSON, ORGANISATION, LOCATION, and MISC. ◦ Harder to deal with than Dates, Times. ◦ Disagreement between taggers. ◦ MISC is necessary. ◦ These have important value elsewhere.
  • 28. Discussion  To improve intrinsic evaluation for NER: 1. Create test sets for divers domains. 2. Use standardized sets for different phenomena. 3. Report accuracy for POL separately. 4. Establish uncertainty in the tagging system.
  • 29. Conclusion  90% accuracy not real.  We need to use only entities that are agreed on by multiple taggers.  Even in cases where they both disagree (Hint: Future work.)  Unit test downloadable.
  • 30. Cheers/PERSON Richard/ORGANISATION thanks the Mword Class/LOCATION for listening to his talk about Berners-Lee/MISC

Editor's Notes

  1. NER: The Aim is to recognize and classify different types of entities (names, organizations, locations, dates, etc.)
  2. Not sure why they focused on competitions, to be honest. But they mention the Message Understanding Conference, and CoNLL.
  3. They give two possible reasons for this:
  4. Part of the problem is that
  5. No Gold Standards for any of these. So, they compared on two levels
  6. How well do they work on PERSON, ORGANIZATION, and LOCATION? How much to they agree? What mistakes?
  7. How frequently does each tagger produce multiple classifications for the same entity in a single document? Clinton as a person, and place, for instance.
  8. ANC tagged for IdentiFinder already.
  9. However, this was often not consistent
  10. Identifiner got much more ORGANISATION than the others. Also uses extra class, Geo-Political Entity
  11. Existing taggers treat the non-local dependencies as a way of dealing with the sparse data problem, and as a way to resolve tagging differences by look- ing at how often one token is classified as one type versus another.
  12. 1. They didn’t do this. 2. And actually use them, not just one of them. 3. Report accuracy rates separately for the three major classes. Accuracy rates should be further broken down according to the items in the unit test that are designed to assess mistakes: or- thography, acronym processing, frequent false positives, and knowledge-based classification.They go on to say that ANC is doing it right, but is too small, hence their ETS corpus.
  13. 1. They didn’t do this. 2. And actually use them, not just one of them. 3. Report accuracy rates separately for the three major classes. Accuracy rates should be further broken down according to the items in the unit test that are designed to assess mistakes: or- thography, acronym processing, frequent false positives, and knowledge-based classification.They go on to say that ANC is doing it right, but is too small, hence their ETS corpus.
  14. 1. They didn’t do this. 2. And actually use them, not just one of them. 3. Report accuracy rates separately for the three major classes. Accuracy rates should be further broken down according to the items in the unit test that are designed to assess mistakes: or- thography, acronym processing, frequent false positives, and knowledge-based classification.They go on to say that ANC is doing it right, but is too small, hence their ETS corpus.