Information Extraction Grammars

•

1 recomendación•1,104 vistas

Formal grammars are extensively used to represent patterns in Information Extraction, but they do not permit the use of several types of features. Finite-state transducers, which are based on regular grammars, solve this issue, but they have other disadvantages such as the lack of expressiveness and the rigid matching priority. As an alternative, we propose Information Extraction Grammars. This model, supported on Language Theory, does permit the use of several features, solves some of the problems of finite-state transducers, and has the same computational complexity in recognition as formal grammars, whether they describe regular or context-free languages.

Ciencias

Context-Free LanguagesRegular Languages
Information Extraction Grammars
ECIR 2015 Vienna, March 30th
Mónica Marrero
National Supercomputing Center, Spain
Julián Urbano
Universitat Pompeu Fabra, Spain
Problem: Grammar-based Named Entity (NE) Recognition Patterns
Features
Part of speech
Case
Gazetteers
Stem
[etc.]
(Semi-)automatic Learning Method
More than
one feature?
Regular Cascade Context-free
Natural/Markup
Lang. expressiveness?
Regular Cascade Context-free
Avoid extra
ambiguity?
Regular Cascade Context-free
Regular
Expressions
Cascade
Grammars
Context-Free
Grammars
Human-readable and based on standards
NE: Person NE: Time NE: Location
Information Extraction systems should be capable of adapting to different entities and domains.
How can we decide what is the best model for a Named Entity Recognition system?
Proposal: Information Extraction Grammars for Named Entity Recognition
Formally, 𝐼𝐸𝐺 = (𝒱, 𝑆, Σ, 𝒫, 𝒞)
𝒱: set of non-terminals
𝑆 ∈ 𝒱: initial symbol
Σ: input alphabet
𝒫: set of production rules
𝒞: set of condition sets assigned to non-terminals,
expressed as function-value pairs 𝑓, 𝑦
All derivations must meet:
𝐴
∗ 𝐼𝐸𝐺
𝜔 ≔ 𝐴
∗ 𝐺
𝜔 and ∀ 𝑓, 𝑦 ∈ 𝒞 𝐴 ∶ 𝑓 𝜔 = 𝑦
Context-Free
Grammar 𝐺
IEG for the recognition of full person names
using First/Last name gazetteers
𝑆 → 𝐹𝐿𝐿 𝑆 → 𝐹𝐿 𝑆 → 𝐹
𝐹 → 𝑇 𝐿 → 𝑇 𝑇 → [a-zA-Z0-9]+
𝒞 𝐹 = 𝐹𝑖𝑟𝑠𝑡𝐺𝑎𝑧, 𝑡𝑟𝑢𝑒 , 𝐶𝑎𝑠𝑒, 𝑢𝑝𝑝𝑒𝑟 , 𝑃𝑂𝑆, 𝑁𝑃
𝒞 𝐿 = 𝐹𝑖𝑟𝑠𝑡𝐺𝑎𝑧, 𝑡𝑟𝑢𝑒 , 𝐶𝑎𝑠𝑒, 𝑢𝑝𝑝𝑒𝑟 , 𝑃𝑂𝑆, 𝑁𝑃
Lisa Brown Smith will present at 4 pm in Foyer room
Similar to synthesized attributes in S-attributed grammars, but in this case
the values of the attributes are given upfront and they are used to constrain the parsing
Computational Complexity
Regular Expression
O(ns2)
Cascade Grammar
O(mns2)
IEG
O(n(tm+s2))
Context-Free Grammar
O(n3)
IEG
O(n3)
Sizes of n: input, m: features, s: states in the automata, t: non-terminals with conditions associated
Summary and Future Work
• Information Extraction Grammars
- Based on standards
- Expressiveness of context-free grammars
- Support for custom features
- Competitive complexity using standard
recognition methods
• Contributes to the flexibility of Information
Extraction tools that can work independently of
the kind of features and the expressiveness of the
language to recognize
• Future work: optimization of the recognition
methods and use of probabilities in the conditions

Más contenido relacionado

Destacado

Data and Information Visualization: the Principles of Infographics - English ...Bijan Yavar

Mark Harrison SPC ImplementationMark Harrison

SPC - Statistical process controlSenthil kumar

Information ExtractionRubén Izquierdo Beviá

Data, Information And Knowledge Management Framework And The Data Management ...Alan McSweeney

Management Information System (Full Notes)Harish Chand

Management Information System (MIS)Navneet Jingar

Management information systemSikander Saini

Management information systemRamya Sree

Management information systemAnamika Sonawane

Types Of Information SystemsManuel Ardales

Destacado (11)

Data and Information Visualization: the Principles of Infographics - English ...

Mark Harrison SPC Implementation

SPC - Statistical process control

Information Extraction

Data, Information And Knowledge Management Framework And The Data Management ...

Management Information System (Full Notes)

Management Information System (MIS)

Management information system

Types Of Information Systems

Más de Julián Urbano

Statistical Significance Testing in Information Retrieval: An Empirical Analy...Julián Urbano

Your PhD and YouJulián Urbano

Statistical Analysis of Results in Music Information Retrieval: Why and HowJulián Urbano

The Treatment of Ties in AP CorrelationJulián Urbano

A Plan for Sustainable MIR EvaluationJulián Urbano

Crawling the Web for Structured DocumentsJulián Urbano

How Do Gain and Discount Functions Affect the Correlation between DCG and Use...Julián Urbano

A Comparison of the Optimality of Statistical Significance Tests for Informat...Julián Urbano

MIREX 2010 Symbolic Melodic Similarity: Local Alignment with Geometric Repres...Julián Urbano

The University Carlos III of Madrid at TREC 2011 Crowdsourcing TrackJulián Urbano

What is the Effect of Audio Quality on the Robustness of MFCCs and Chroma Fea...Julián Urbano

Evaluation in (Music) Information Retrieval through the Audio Music Similarit...Julián Urbano

Symbolic Melodic Similarity (through Shape Similarity)Julián Urbano

Evaluation in Audio Music SimilarityJulián Urbano

Validity and Reliability of Cranfield-like Evaluation in Information RetrievalJulián Urbano

On the Measurement of Test Collection ReliabilityJulián Urbano

How Significant is Statistically Significant? The case of Audio Music Similar...Julián Urbano

Towards Minimal Test Collections for Evaluation of Audio Music Similarity and...Julián Urbano

The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track: Noteboo...Julián Urbano

Information Retrieval Meta-Evaluation: Challenges and Opportunities in the Mu...Julián Urbano

Más de Julián Urbano (20)

Statistical Significance Testing in Information Retrieval: An Empirical Analy...

Your PhD and You

Statistical Analysis of Results in Music Information Retrieval: Why and How

The Treatment of Ties in AP Correlation

A Plan for Sustainable MIR Evaluation

Crawling the Web for Structured Documents

How Do Gain and Discount Functions Affect the Correlation between DCG and Use...

A Comparison of the Optimality of Statistical Significance Tests for Informat...

MIREX 2010 Symbolic Melodic Similarity: Local Alignment with Geometric Repres...

The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track

What is the Effect of Audio Quality on the Robustness of MFCCs and Chroma Fea...

Evaluation in (Music) Information Retrieval through the Audio Music Similarit...

Symbolic Melodic Similarity (through Shape Similarity)

Evaluation in Audio Music Similarity

Validity and Reliability of Cranfield-like Evaluation in Information Retrieval

On the Measurement of Test Collection Reliability

How Significant is Statistically Significant? The case of Audio Music Similar...

Towards Minimal Test Collections for Evaluation of Audio Music Similarity and...

The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track: Noteboo...

Information Retrieval Meta-Evaluation: Challenges and Opportunities in the Mu...

Último

FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson

CELL -Structural and Functional unit of life.pdfNistarini College, Purulia (W.B) India

Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Servicemonikaservice1

GBSN - Microbiology (Unit 3)Areesha Ahmad

biology HL practice questions IB BIOLOGY1301aanya

Zoology 5th semester notes( Sumit_yadav).pdfSumit Kumar yadav

Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...Mohammad Khajehpour

Bacterial Identification and ClassificationsAreesha Ahmad

PSYCHOSOCIAL NEEDS. in nursing II sem pptxSuji236384

module for grade 9 for distance learninglevieagacer

Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju

Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani

Site Acceptance Test .Poonam Aher Patil

Proteomics: types, protein profiling steps etc.Silpa

Factory Acceptance Test( FAT).pptx .Poonam Aher Patil

Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Monika Rani

Conjugation, transduction and transformationAreesha Ahmad

High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑Damini Dixit

Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385

Clean In Place(CIP).pptx .Poonam Aher Patil

Information Extraction Grammars

1. Context-Free LanguagesRegular Languages Information Extraction Grammars ECIR 2015 Vienna, March 30th Mónica Marrero National Supercomputing Center, Spain Julián Urbano Universitat Pompeu Fabra, Spain Problem: Grammar-based Named Entity (NE) Recognition Patterns Features Part of speech Case Gazetteers Stem [etc.] (Semi-)automatic Learning Method More than one feature? Regular Cascade Context-free Natural/Markup Lang. expressiveness? Regular Cascade Context-free Avoid extra ambiguity? Regular Cascade Context-free Regular Expressions Cascade Grammars Context-Free Grammars Human-readable and based on standards NE: Person NE: Time NE: Location Information Extraction systems should be capable of adapting to different entities and domains. How can we decide what is the best model for a Named Entity Recognition system? Proposal: Information Extraction Grammars for Named Entity Recognition Formally, 𝐼𝐸𝐺 = (𝒱, 𝑆, Σ, 𝒫, 𝒞) 𝒱: set of non-terminals 𝑆 ∈ 𝒱: initial symbol Σ: input alphabet 𝒫: set of production rules 𝒞: set of condition sets assigned to non-terminals, expressed as function-value pairs 𝑓, 𝑦 All derivations must meet: 𝐴 ∗ 𝐼𝐸𝐺 𝜔 ≔ 𝐴 ∗ 𝐺 𝜔 and ∀ 𝑓, 𝑦 ∈ 𝒞 𝐴 ∶ 𝑓 𝜔 = 𝑦 Context-Free Grammar 𝐺 IEG for the recognition of full person names using First/Last name gazetteers 𝑆 → 𝐹𝐿𝐿 𝑆 → 𝐹𝐿 𝑆 → 𝐹 𝐹 → 𝑇 𝐿 → 𝑇 𝑇 → [a-zA-Z0-9]+ 𝒞 𝐹 = 𝐹𝑖𝑟𝑠𝑡𝐺𝑎𝑧, 𝑡𝑟𝑢𝑒 , 𝐶𝑎𝑠𝑒, 𝑢𝑝𝑝𝑒𝑟 , 𝑃𝑂𝑆, 𝑁𝑃 𝒞 𝐿 = 𝐹𝑖𝑟𝑠𝑡𝐺𝑎𝑧, 𝑡𝑟𝑢𝑒 , 𝐶𝑎𝑠𝑒, 𝑢𝑝𝑝𝑒𝑟 , 𝑃𝑂𝑆, 𝑁𝑃 Lisa Brown Smith will present at 4 pm in Foyer room Similar to synthesized attributes in S-attributed grammars, but in this case the values of the attributes are given upfront and they are used to constrain the parsing Computational Complexity Regular Expression O(ns2) Cascade Grammar O(mns2) IEG O(n(tm+s2)) Context-Free Grammar O(n3) IEG O(n3) Sizes of n: input, m: features, s: states in the automata, t: non-terminals with conditions associated Summary and Future Work • Information Extraction Grammars - Based on standards - Expressiveness of context-free grammars - Support for custom features - Competitive complexity using standard recognition methods • Contributes to the flexibility of Information Extraction tools that can work independently of the kind of features and the expressiveness of the language to recognize • Future work: optimization of the recognition methods and use of probabilities in the conditions

Information Extraction Grammars

Recomendados

Recomendados

Más contenido relacionado

Destacado

Destacado (11)

Más de Julián Urbano

Más de Julián Urbano (20)

Último

Último (20)

Information Extraction Grammars