Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Information Extraction Grammars

926 visualizaciones

Publicado el

Formal grammars are extensively used to represent patterns in Information Extraction, but they do not permit the use of several types of features. Finite-state transducers, which are based on regular grammars, solve this issue, but they have other disadvantages such as the lack of expressiveness and the rigid matching priority. As an alternative, we propose Information Extraction Grammars. This model, supported on Language Theory, does permit the use of several features, solves some of the problems of finite-state transducers, and has the same computational complexity in recognition as formal grammars, whether they describe regular or context-free languages.

Publicado en: Ciencias
  • Sé el primero en comentar

Information Extraction Grammars

  1. 1. Context-Free LanguagesRegular Languages Information Extraction Grammars ECIR 2015 Vienna, March 30th Mónica Marrero National Supercomputing Center, Spain Julián Urbano Universitat Pompeu Fabra, Spain Problem: Grammar-based Named Entity (NE) Recognition Patterns Features Part of speech Case Gazetteers Stem [etc.] (Semi-)automatic Learning Method More than one feature? Regular Cascade Context-free Natural/Markup Lang. expressiveness? Regular Cascade Context-free Avoid extra ambiguity? Regular Cascade Context-free Regular Expressions Cascade Grammars Context-Free Grammars Human-readable and based on standards NE: Person NE: Time NE: Location Information Extraction systems should be capable of adapting to different entities and domains. How can we decide what is the best model for a Named Entity Recognition system? Proposal: Information Extraction Grammars for Named Entity Recognition Formally, 𝐼𝐸𝐺 = (𝒱, 𝑆, Σ, 𝒫, 𝒞) 𝒱: set of non-terminals 𝑆 ∈ 𝒱: initial symbol Σ: input alphabet 𝒫: set of production rules 𝒞: set of condition sets assigned to non-terminals, expressed as function-value pairs 𝑓, 𝑦 All derivations must meet: 𝐴 ∗ 𝐼𝐸𝐺 𝜔 ≔ 𝐴 ∗ 𝐺 𝜔 and ∀ 𝑓, 𝑦 ∈ 𝒞 𝐴 ∶ 𝑓 𝜔 = 𝑦 Context-Free Grammar 𝐺 IEG for the recognition of full person names using First/Last name gazetteers 𝑆 → 𝐹𝐿𝐿 𝑆 → 𝐹𝐿 𝑆 → 𝐹 𝐹 → 𝑇 𝐿 → 𝑇 𝑇 → [a-zA-Z0-9]+ 𝒞 𝐹 = 𝐹𝑖𝑟𝑠𝑡𝐺𝑎𝑧, 𝑡𝑟𝑢𝑒 , 𝐶𝑎𝑠𝑒, 𝑢𝑝𝑝𝑒𝑟 , 𝑃𝑂𝑆, 𝑁𝑃 𝒞 𝐿 = 𝐹𝑖𝑟𝑠𝑡𝐺𝑎𝑧, 𝑡𝑟𝑢𝑒 , 𝐶𝑎𝑠𝑒, 𝑢𝑝𝑝𝑒𝑟 , 𝑃𝑂𝑆, 𝑁𝑃 Lisa Brown Smith will present at 4 pm in Foyer room Similar to synthesized attributes in S-attributed grammars, but in this case the values of the attributes are given upfront and they are used to constrain the parsing Computational Complexity Regular Expression O(ns2) Cascade Grammar O(mns2) IEG O(n(tm+s2)) Context-Free Grammar O(n3) IEG O(n3) Sizes of n: input, m: features, s: states in the automata, t: non-terminals with conditions associated Summary and Future Work • Information Extraction Grammars - Based on standards - Expressiveness of context-free grammars - Support for custom features - Competitive complexity using standard recognition methods • Contributes to the flexibility of Information Extraction tools that can work independently of the kind of features and the expressiveness of the language to recognize • Future work: optimization of the recognition methods and use of probabilities in the conditions