SlideShare una empresa de Scribd logo
1 de 25
Descargar para leer sin conexión
Codeco: A Grammar Notation for Controlled
Natural Language in Predictive Editors
Tobias Kuhn
Department of Informatics
University of Zurich
Second Workshop on Controlled Natural Language
15 September 2010
Marettimo (Italy)
Introduction
• Problem: Existing grammar frameworks do not work out
particularly well for CNLs.
• Reason:
• CNLs have essential differences to other languages (natural and
formal ones)
• To solve the writability problem, CNLs have to be embedded in
special tools with very specific requirements.
• Error messages and suggestions
• Predictive editors
• Language generation
• Solution: A new grammar notation that is dedicated to CNLs
and predictive editors.
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 2 / 25
CNL Grammar Requirements
Concreteness: CNL grammars should be fully formalized and
interpretable by computers.
Declarativeness: CNL grammars should not depend on a concrete
algorithm or implementation.
Lookahead Features: CNL grammars should allow for the retrieval
of possible next tokens for a partial text.
Anaphoric References: CNL grammars should allow for the
definition of nonlocal structures like anaphoric
references.
Implementability: CNL grammars should be easy to implement in
different programming languages.
Expressivity: CNL grammars should be sufficiently expressive to
express CNLs.
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 3 / 25
Lookahead Features
Predictive editors need to know which words can follow a partial text:
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 4 / 25
Anaphoric References
• Anaphoric references in CNLs are resolvable in a deterministic
way:
A country contains an area that is not controlled by the country.
If a person X is a relative of a person Y then the person Y is a
relative of the person X.
John protects himself and Mary helps him.
• Anaphoric references that cannot be resolved should be
disallowed:
* Every area is controlled by it.
* The person X is a relative of the person Y.
• Scopes have to be considered too:
Every man protects a house from every enemy and does not
destroy ...
... himself.
... the house.
* ... the enemy.
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 5 / 25
Existing Grammar Frameworks
• Grammar Frameworks for Natural Languages
• Head-Driven Phrase Structure Grammars
• Lexical-Functional Grammars
• Tree-Adjoining Grammars
• Combinatory Categorial Grammars
• Dependency Grammars
• ... and many more
• Backus-Naur Form (BNF)
• Parser Generators
• Yacc
• GNU Bison
• ANTLR
• Definite Clause Grammars (DCG)
• Grammatical Framework (GF)
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 6 / 25
How Existing Grammar Frameworks Fulfill our
Requirements for CNL Grammars
NL BNF PG DCG GF
Concreteness + + + + +
Declarativeness +/– + – (+) +
Lookahead Features – + (+) (+) +
Anaphoric References (+) – – (+) –
Implementability – + – – ?
Expressivity + – + + +
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 7 / 25
The Codeco Notation
Codeco = “Concrete and Declarative Grammar Notation for
Controlled Natural Languages”
• Formal and Declarative
• Easy to implement in different programming languages.
• Expressive enough for common CNLs.
• Lookahead features can be implemented in a practical and
efficient way.
• Deterministic anaphoric references can be defined in an adequate
and simple way.
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 8 / 25
Grammar Rules in Codeco
• Grammatical categories with flat feature structures
• Category Types:
• non-terminal (e.g. vp)
• pre-terminal (e.g. noun)
• terminal (e.g. [ does not ])
• Grammar Rule Examples:
• vp num: Num
neg: Neg
:
−→ v
num: Num
neg: Neg
type: tr
np case: acc
• v neg: +
type: Type
:
−→ [ does not ] verb type: Type
• np noun: Noun
:
−→ [ a ] noun text: Noun
• noun text: woman
gender: fem
→ [ woman ]
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 9 / 25
Forward and Backward References
The special categories “>” and “<” can be used to establish
nonlocal dependencies, e.g. for anaphoric references:
np
:
−→ det def: – noun text: Noun > type: noun
noun: Noun
ref
:
−→ det def: + noun text: Noun < type: noun
noun: Noun
s
vp
vp
np
ref
noun
area
det
the
v
tv
control
aux
does not
conj
and
vp
np
noun
area
det
an
v
tv
contains
np
>noun
country
det
A
>
<
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 10 / 25
Scopes
• Opening of scopes:
• Scopes in (controlled) English usually open at the position of the
scope triggering structure, or nearby.
• Scope opener category “ ” in Codeco:
quant exist: –
:
−→ [ every ]
• Closing of scopes:
• Scopes in (controlled) English usually close at the end of certain
structures like verb phrases, relative clauses, and sentences.
• Scope-closing rules “
∼
−→” in Codeco:
vp num: Num
∼
−−→ v num: Num
type: tr
np case: acc
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 11 / 25
Position Operators
• How to define reflexive pronouns like “herself ” that can only
attach to the subject?
• With the position operator “#”, position identifiers can be
assigned:
np id: Id
:
−→ # Id prop gender: G >
id: Id
gender: G
type: prop
ref subj: Subj
:
−→ [ herself ] < id: Subj
gender: fem
s
vp
np
ref
herself
v
tv
helps
np
prop
Maryp0 p1 p2 p3
#
#
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 12 / 25
Negative Backward References
• How to define that the same variable can be introduced only
once?
*A person X knows a person X.
• Negative backward references “ ” succeed only if no matching
antecedent is accessible:
newvar
:
−→ var text: V
type: var
var: V
> type: var
var: V
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 13 / 25
Complex Backward References
• How to define pronouns like “him” that cannot attach to the
subject?
*John helps him.
• Complex backward references “<+... −...” have one or more
positive feature structure “+” and zero or more negative
ones “−”.
• They succeed if there is an antecedent that matches one of the
positive feature structures but none of the negative ones:
ref subj: Subj
case: acc
:
−→ [ him ] <+ human: +
gender: masc
− id: Subj
• A more complicated (but probably less useful) example:
ref subj: Subj
:
−→ [ this ] <+ hasvar: –
human: –
type: relation − id: Subj type: prop
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 14 / 25
Strong Forward References
• How to define that propernames like “Bill” are always accessible?
*Mary does not love a man. Mary hates him.
Mary does not love Bill. Mary hates him.
• Strong forward references “ ” are always accessible:
np id: Id
:
−→ prop human: H
gender: G



id: Id
human: H
gender: G
type: prop



Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 15 / 25
Reference Resolution: Accessibility
Forward references are only accessible if they are not within a scope
that has already been closed before the position of the backward
reference:
s ∼
vp
vp ∼
np
ref
...
v
tv
destroy
aux
does not
conj
and
vp
pp
np
>n
enemy
det
every
prep
from
np
n
house
det
a
v
tv
protects
np
n
man
det
Every
∼
( ( ()
>
>
<
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 16 / 25
Reference Resolution: Accessibility
Strong forward references are always accessible:
s ∼
vp
vp ∼
np
ref
...
v
tv
likes
conj
and
vp
np
pp
np
prop
Bill
prep
of
>n
friend
det
every
v
tv
knows
np
prop
Mary
∼
( )
<
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 17 / 25
Reference Resolution: Proximity
If a backward reference matches more than one forward reference
then the closest one is taken:
s
s
...
...
np
ref
pn
it
conj
then
s
vp
np
n
error
det
an
v
tv
causes
np
pp
np
n
machine
det
a
prep
of
n
part
det
a
conj
If
>
>
> <
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 18 / 25
Possible Extensions
• Semantics (e.g. with λ-DRSs)
• General feature structures (instead of flat ones)
• ...
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 19 / 25
Parsers for Codeco
Two parsers with different parsing approaches exist:
• Transformation into Prolog DCG
• fast (1.5 ms per sentence)
• no lookahead features
• ideal for regression tests and parsing of large texts in batch mode
• Execution in a chart parser (Earley parser) under Java
• slower, but still reasonably fast (130 ms per sentence)
• lookahead features
• ideal for predictive editors in Java
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 20 / 25
ACE in Codeco
• Large subset of ACE in Codeco
• Includes: countable nouns, proper names, intransitive and
transitive verbs, adjectives, adverbs, prepositions, plurals,
negation, comparative and superlative adjectives and adverbs,
of-phrases, relative clauses, modality, numerical quantifiers,
coordination of sentences / verb phrases / relative clauses,
conditional sentences, questions, and anaphoric references (simple
definite noun phrases, variables, and reflexive and irreflexive
pronouns)
• Excludes: Mass nouns, measurement nouns, ditransitive verbs,
numbers and strings as noun phrases, sentences as verb phrase
complements, Saxon genitive, possessive pronouns, noun phrase
coordination, and commands
• 164 grammar rules
• Used in the ACE Editor:
http://attempto.ifi.uzh.ch/webapps/aceeditor/
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 21 / 25
Evaluation of ACE Codeco
Exhaustive Language Generation:
• Evaluation subgrammar with 97 grammar rules
• Minimal lexicon
• 2’250’869 sentences with 3–10 tokens:
sentence length number of sentences growth factor
3 6
4 87 14.50
5 385 4.43
6 1’959 5.09
7 11’803 6.03
8 64’691 5.48
9 342’863 5.30
10 1’829’075 5.33
3–10 2’250’869
• All are accepted by the ACE parser
→ ACE Codeco is a subset of ACE
• None is generated more than once
→ ACE Codeco is unambiguous
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 22 / 25
Evaluation of the Codeco notation and its
implementations
Prolog DCG representation versus Java Earley parser:
• Equivalence of the Implementations:
Generate the same set of sentences up to 8 tokens
→ The two implementations process Codeco in the same way
• Performance Tests:
task grammar implementation seconds/sentence
generation ACE Codeco eval. subset Prolog DCG 0.00286
generation ACE Codeco eval. subset Java Earley parser 0.0730
parsing ACE Codeco eval. subset Prolog DCG 0.000360
parsing ACE Codeco eval. subset Java Earley parser 0.0276
parsing full ACE Codeco Prolog DCG 0.00146
parsing full ACE Codeco Java Earley parser 0.134
parsing full ACE APE 0.0161
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 23 / 25
Conclusions
Codeco ...
• ... fulfills our requirements for CNLs in predictive editors.
• ... is suitable to describe a large subset of ACE.
• ... allows for automatic tests.
• ... stands for a principled and engineering focused approach to
CNL.
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 24 / 25
Thank you for your attention!
Questions & Discussion
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 25 / 25

Más contenido relacionado

La actualidad más candente

OUTDATED Text Mining 3/5: String Processing
OUTDATED Text Mining 3/5: String ProcessingOUTDATED Text Mining 3/5: String Processing
OUTDATED Text Mining 3/5: String ProcessingFlorian Leitner
 
A statistical approach to machine translation
A statistical approach to machine translationA statistical approach to machine translation
A statistical approach to machine translationHiroshi Matsumoto
 
Overview of text mining and NLP (+software)
Overview of text mining and NLP (+software)Overview of text mining and NLP (+software)
Overview of text mining and NLP (+software)Florian Leitner
 
Parallel Corpora in (Machine) Translation: goals, issues and methodologies
Parallel Corpora in (Machine) Translation: goals, issues and methodologiesParallel Corpora in (Machine) Translation: goals, issues and methodologies
Parallel Corpora in (Machine) Translation: goals, issues and methodologiesAntonio Toral
 
NLP_KASHK:Finite-State Morphological Parsing
NLP_KASHK:Finite-State Morphological ParsingNLP_KASHK:Finite-State Morphological Parsing
NLP_KASHK:Finite-State Morphological ParsingHemantha Kulathilake
 
Lecture 7: Definite Clause Grammars
Lecture 7: Definite Clause GrammarsLecture 7: Definite Clause Grammars
Lecture 7: Definite Clause GrammarsCS, NcState
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role LabelingMarina Santini
 
Class9
 Class9 Class9
Class9issbp
 

La actualidad más candente (12)

OUTDATED Text Mining 3/5: String Processing
OUTDATED Text Mining 3/5: String ProcessingOUTDATED Text Mining 3/5: String Processing
OUTDATED Text Mining 3/5: String Processing
 
A statistical approach to machine translation
A statistical approach to machine translationA statistical approach to machine translation
A statistical approach to machine translation
 
Overview of text mining and NLP (+software)
Overview of text mining and NLP (+software)Overview of text mining and NLP (+software)
Overview of text mining and NLP (+software)
 
AINL 2016: Eyecioglu
AINL 2016: EyeciogluAINL 2016: Eyecioglu
AINL 2016: Eyecioglu
 
Dragos Munteanu (SDL) at the Industry Leaders Forum 2015
Dragos Munteanu (SDL) at the Industry Leaders Forum 2015Dragos Munteanu (SDL) at the Industry Leaders Forum 2015
Dragos Munteanu (SDL) at the Industry Leaders Forum 2015
 
Parallel Corpora in (Machine) Translation: goals, issues and methodologies
Parallel Corpora in (Machine) Translation: goals, issues and methodologiesParallel Corpora in (Machine) Translation: goals, issues and methodologies
Parallel Corpora in (Machine) Translation: goals, issues and methodologies
 
Language models
Language modelsLanguage models
Language models
 
NLP_KASHK:Finite-State Morphological Parsing
NLP_KASHK:Finite-State Morphological ParsingNLP_KASHK:Finite-State Morphological Parsing
NLP_KASHK:Finite-State Morphological Parsing
 
Lecture 7: Definite Clause Grammars
Lecture 7: Definite Clause GrammarsLecture 7: Definite Clause Grammars
Lecture 7: Definite Clause Grammars
 
NLP_KASHK:Text Normalization
NLP_KASHK:Text NormalizationNLP_KASHK:Text Normalization
NLP_KASHK:Text Normalization
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role Labeling
 
Class9
 Class9 Class9
Class9
 

Destacado

Semantic Publishing and Nanopublications
Semantic Publishing and NanopublicationsSemantic Publishing and Nanopublications
Semantic Publishing and NanopublicationsTobias Kuhn
 
Image Mining from Gel Diagrams in Biomedical Publications
Image Mining from Gel Diagrams in Biomedical PublicationsImage Mining from Gel Diagrams in Biomedical Publications
Image Mining from Gel Diagrams in Biomedical PublicationsTobias Kuhn
 
The Controlled Natural Language of Randall Munroe’s Thing Explainer
The Controlled Natural Language of Randall Munroe’s Thing Explainer The Controlled Natural Language of Randall Munroe’s Thing Explainer
The Controlled Natural Language of Randall Munroe’s Thing Explainer Tobias Kuhn
 
Improving Text Mining with Controlled Natural Language: A Case Study for Prot...
Improving Text Mining with Controlled Natural Language: A Case Study for Prot...Improving Text Mining with Controlled Natural Language: A Case Study for Prot...
Improving Text Mining with Controlled Natural Language: A Case Study for Prot...Tobias Kuhn
 

Destacado (10)

Semantic Publishing and Nanopublications
Semantic Publishing and NanopublicationsSemantic Publishing and Nanopublications
Semantic Publishing and Nanopublications
 
Shreya
ShreyaShreya
Shreya
 
Nanopubs
NanopubsNanopubs
Nanopubs
 
Image Mining from Gel Diagrams in Biomedical Publications
Image Mining from Gel Diagrams in Biomedical PublicationsImage Mining from Gel Diagrams in Biomedical Publications
Image Mining from Gel Diagrams in Biomedical Publications
 
Task week 3 design
Task week 3 designTask week 3 design
Task week 3 design
 
Clares work
Clares workClares work
Clares work
 
Vacantions
VacantionsVacantions
Vacantions
 
Kyle
KyleKyle
Kyle
 
The Controlled Natural Language of Randall Munroe’s Thing Explainer
The Controlled Natural Language of Randall Munroe’s Thing Explainer The Controlled Natural Language of Randall Munroe’s Thing Explainer
The Controlled Natural Language of Randall Munroe’s Thing Explainer
 
Improving Text Mining with Controlled Natural Language: A Case Study for Prot...
Improving Text Mining with Controlled Natural Language: A Case Study for Prot...Improving Text Mining with Controlled Natural Language: A Case Study for Prot...
Improving Text Mining with Controlled Natural Language: A Case Study for Prot...
 

Similar a Codeco: A Grammar Notation for Controlled Natural Language in Predictive Editors

1908 working memory
1908 working memory1908 working memory
1908 working memoryWarNik Chow
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingTed Xiao
 
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Seth Grimes
 
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...Seth Grimes
 
Embedding for fun fumarola Meetup Milano DLI luglio
Embedding for fun fumarola Meetup Milano DLI luglioEmbedding for fun fumarola Meetup Milano DLI luglio
Embedding for fun fumarola Meetup Milano DLI luglioDeep Learning Italia
 
Anthiil Inside workshop on NLP
Anthiil Inside workshop on NLPAnthiil Inside workshop on NLP
Anthiil Inside workshop on NLPSatyam Saxena
 
Representation Learning of Text for NLP
Representation Learning of Text for NLPRepresentation Learning of Text for NLP
Representation Learning of Text for NLPAnuj Gupta
 
Automated Content Analysis of Discussion Transcripts
Automated Content Analysis of Discussion TranscriptsAutomated Content Analysis of Discussion Transcripts
Automated Content Analysis of Discussion TranscriptsVitomir Kovanovic
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsRoelof Pieters
 
Controlled Natural Language and Opportunities for Standardization
Controlled Natural Language and Opportunities for StandardizationControlled Natural Language and Opportunities for Standardization
Controlled Natural Language and Opportunities for StandardizationTobias Kuhn
 
Yves Peirsman - Deep Learning for NLP
Yves Peirsman - Deep Learning for NLPYves Peirsman - Deep Learning for NLP
Yves Peirsman - Deep Learning for NLPHendrik D'Oosterlinck
 
ENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptx
ENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptxENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptx
ENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptxSyedNadeemAbbas6
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxKalpit Desai
 
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...Tobias Kuhn
 
Deep Learning for Information Retrieval
Deep Learning for Information RetrievalDeep Learning for Information Retrieval
Deep Learning for Information RetrievalRoelof Pieters
 
Lecture: Context-Free Grammars
Lecture: Context-Free GrammarsLecture: Context-Free Grammars
Lecture: Context-Free GrammarsMarina Santini
 

Similar a Codeco: A Grammar Notation for Controlled Natural Language in Predictive Editors (20)

1908 working memory
1908 working memory1908 working memory
1908 working memory
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language Processing
 
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
 
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
 
haenelt.ppt
haenelt.ppthaenelt.ppt
haenelt.ppt
 
Embedding for fun fumarola Meetup Milano DLI luglio
Embedding for fun fumarola Meetup Milano DLI luglioEmbedding for fun fumarola Meetup Milano DLI luglio
Embedding for fun fumarola Meetup Milano DLI luglio
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
 
Anthiil Inside workshop on NLP
Anthiil Inside workshop on NLPAnthiil Inside workshop on NLP
Anthiil Inside workshop on NLP
 
Representation Learning of Text for NLP
Representation Learning of Text for NLPRepresentation Learning of Text for NLP
Representation Learning of Text for NLP
 
Automated Content Analysis of Discussion Transcripts
Automated Content Analysis of Discussion TranscriptsAutomated Content Analysis of Discussion Transcripts
Automated Content Analysis of Discussion Transcripts
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
 
Controlled Natural Language and Opportunities for Standardization
Controlled Natural Language and Opportunities for StandardizationControlled Natural Language and Opportunities for Standardization
Controlled Natural Language and Opportunities for Standardization
 
Yves Peirsman - Deep Learning for NLP
Yves Peirsman - Deep Learning for NLPYves Peirsman - Deep Learning for NLP
Yves Peirsman - Deep Learning for NLP
 
ENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptx
ENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptxENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptx
ENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptx
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptx
 
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
 
Deep Learning for Information Retrieval
Deep Learning for Information RetrievalDeep Learning for Information Retrieval
Deep Learning for Information Retrieval
 
AINL 2016: Nikolenko
AINL 2016: NikolenkoAINL 2016: Nikolenko
AINL 2016: Nikolenko
 
What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?
 
Lecture: Context-Free Grammars
Lecture: Context-Free GrammarsLecture: Context-Free Grammars
Lecture: Context-Free Grammars
 

Más de Tobias Kuhn

Nanopublications and Decentralized Publishing
Nanopublications and Decentralized PublishingNanopublications and Decentralized Publishing
Nanopublications and Decentralized PublishingTobias Kuhn
 
Linked Data Publishing with Nanopublications
Linked Data Publishing with NanopublicationsLinked Data Publishing with Nanopublications
Linked Data Publishing with NanopublicationsTobias Kuhn
 
Genuine semantic publishing
Genuine semantic publishingGenuine semantic publishing
Genuine semantic publishingTobias Kuhn
 
A Decentralized Approach to Dissemination, Retrieval, and Archiving of Data
A Decentralized Approach to Dissemination, Retrieval, and Archiving of DataA Decentralized Approach to Dissemination, Retrieval, and Archiving of Data
A Decentralized Approach to Dissemination, Retrieval, and Archiving of DataTobias Kuhn
 
Publishing without Publishers: a Decentralized Approach to Dissemination, Ret...
Publishing without Publishers: a Decentralized Approach to Dissemination, Ret...Publishing without Publishers: a Decentralized Approach to Dissemination, Ret...
Publishing without Publishers: a Decentralized Approach to Dissemination, Ret...Tobias Kuhn
 
nanopub-java: A Java Library for Nanopublications
nanopub-java: A Java Library for Nanopublicationsnanopub-java: A Java Library for Nanopublications
nanopub-java: A Java Library for NanopublicationsTobias Kuhn
 
Scientific Data Publishing
Scientific Data PublishingScientific Data Publishing
Scientific Data PublishingTobias Kuhn
 
A Decentralized Network for Publishing Linked Data — Nanopublications, Trusty...
A Decentralized Network for Publishing Linked Data — Nanopublications, Trusty...A Decentralized Network for Publishing Linked Data — Nanopublications, Trusty...
A Decentralized Network for Publishing Linked Data — Nanopublications, Trusty...Tobias Kuhn
 
Science Bots: A Model for the Future of Scientific Computation?
Science Bots: A Model for the Future of Scientific Computation?Science Bots: A Model for the Future of Scientific Computation?
Science Bots: A Model for the Future of Scientific Computation?Tobias Kuhn
 
Data Publishing and Post-Publication Reviews
Data Publishing and Post-Publication ReviewsData Publishing and Post-Publication Reviews
Data Publishing and Post-Publication ReviewsTobias Kuhn
 
Semantic Publishing with Nanopublications
Semantic Publishing with Nanopublications Semantic Publishing with Nanopublications
Semantic Publishing with Nanopublications Tobias Kuhn
 
Meme Extraction from Corpora of Scientific Literature using Citation Networks
Meme Extraction from Corpora of Scientific Literature using Citation NetworksMeme Extraction from Corpora of Scientific Literature using Citation Networks
Meme Extraction from Corpora of Scientific Literature using Citation NetworksTobias Kuhn
 
A Multilingual Semantic Wiki Based on Controlled Natural Language
A Multilingual Semantic Wiki Based on Controlled Natural LanguageA Multilingual Semantic Wiki Based on Controlled Natural Language
A Multilingual Semantic Wiki Based on Controlled Natural LanguageTobias Kuhn
 
Citation Graph Analysis to Identify Memes in Scientific Literature
Citation Graph Analysis to Identify Memes in Scientific LiteratureCitation Graph Analysis to Identify Memes in Scientific Literature
Citation Graph Analysis to Identify Memes in Scientific LiteratureTobias Kuhn
 
Citation Graph Analysis to Identify Memes in Scientific Literature
Citation Graph Analysis to Identify Memes in Scientific LiteratureCitation Graph Analysis to Identify Memes in Scientific Literature
Citation Graph Analysis to Identify Memes in Scientific LiteratureTobias Kuhn
 
Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linke...
Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linke...Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linke...
Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linke...Tobias Kuhn
 
Automatische Übersetzung in einem multilingualen, semantischen Wiki
Automatische Übersetzung in einem multilingualen, semantischen WikiAutomatische Übersetzung in einem multilingualen, semantischen Wiki
Automatische Übersetzung in einem multilingualen, semantischen WikiTobias Kuhn
 
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...Tobias Kuhn
 
AceRules: Executing Rules in Controlled Natural Language
AceRules: Executing Rules in Controlled Natural LanguageAceRules: Executing Rules in Controlled Natural Language
AceRules: Executing Rules in Controlled Natural LanguageTobias Kuhn
 
AceWiki: A Natural and Expressive Semantic Wiki
AceWiki: A Natural and Expressive Semantic WikiAceWiki: A Natural and Expressive Semantic Wiki
AceWiki: A Natural and Expressive Semantic WikiTobias Kuhn
 

Más de Tobias Kuhn (20)

Nanopublications and Decentralized Publishing
Nanopublications and Decentralized PublishingNanopublications and Decentralized Publishing
Nanopublications and Decentralized Publishing
 
Linked Data Publishing with Nanopublications
Linked Data Publishing with NanopublicationsLinked Data Publishing with Nanopublications
Linked Data Publishing with Nanopublications
 
Genuine semantic publishing
Genuine semantic publishingGenuine semantic publishing
Genuine semantic publishing
 
A Decentralized Approach to Dissemination, Retrieval, and Archiving of Data
A Decentralized Approach to Dissemination, Retrieval, and Archiving of DataA Decentralized Approach to Dissemination, Retrieval, and Archiving of Data
A Decentralized Approach to Dissemination, Retrieval, and Archiving of Data
 
Publishing without Publishers: a Decentralized Approach to Dissemination, Ret...
Publishing without Publishers: a Decentralized Approach to Dissemination, Ret...Publishing without Publishers: a Decentralized Approach to Dissemination, Ret...
Publishing without Publishers: a Decentralized Approach to Dissemination, Ret...
 
nanopub-java: A Java Library for Nanopublications
nanopub-java: A Java Library for Nanopublicationsnanopub-java: A Java Library for Nanopublications
nanopub-java: A Java Library for Nanopublications
 
Scientific Data Publishing
Scientific Data PublishingScientific Data Publishing
Scientific Data Publishing
 
A Decentralized Network for Publishing Linked Data — Nanopublications, Trusty...
A Decentralized Network for Publishing Linked Data — Nanopublications, Trusty...A Decentralized Network for Publishing Linked Data — Nanopublications, Trusty...
A Decentralized Network for Publishing Linked Data — Nanopublications, Trusty...
 
Science Bots: A Model for the Future of Scientific Computation?
Science Bots: A Model for the Future of Scientific Computation?Science Bots: A Model for the Future of Scientific Computation?
Science Bots: A Model for the Future of Scientific Computation?
 
Data Publishing and Post-Publication Reviews
Data Publishing and Post-Publication ReviewsData Publishing and Post-Publication Reviews
Data Publishing and Post-Publication Reviews
 
Semantic Publishing with Nanopublications
Semantic Publishing with Nanopublications Semantic Publishing with Nanopublications
Semantic Publishing with Nanopublications
 
Meme Extraction from Corpora of Scientific Literature using Citation Networks
Meme Extraction from Corpora of Scientific Literature using Citation NetworksMeme Extraction from Corpora of Scientific Literature using Citation Networks
Meme Extraction from Corpora of Scientific Literature using Citation Networks
 
A Multilingual Semantic Wiki Based on Controlled Natural Language
A Multilingual Semantic Wiki Based on Controlled Natural LanguageA Multilingual Semantic Wiki Based on Controlled Natural Language
A Multilingual Semantic Wiki Based on Controlled Natural Language
 
Citation Graph Analysis to Identify Memes in Scientific Literature
Citation Graph Analysis to Identify Memes in Scientific LiteratureCitation Graph Analysis to Identify Memes in Scientific Literature
Citation Graph Analysis to Identify Memes in Scientific Literature
 
Citation Graph Analysis to Identify Memes in Scientific Literature
Citation Graph Analysis to Identify Memes in Scientific LiteratureCitation Graph Analysis to Identify Memes in Scientific Literature
Citation Graph Analysis to Identify Memes in Scientific Literature
 
Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linke...
Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linke...Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linke...
Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linke...
 
Automatische Übersetzung in einem multilingualen, semantischen Wiki
Automatische Übersetzung in einem multilingualen, semantischen WikiAutomatische Übersetzung in einem multilingualen, semantischen Wiki
Automatische Übersetzung in einem multilingualen, semantischen Wiki
 
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
 
AceRules: Executing Rules in Controlled Natural Language
AceRules: Executing Rules in Controlled Natural LanguageAceRules: Executing Rules in Controlled Natural Language
AceRules: Executing Rules in Controlled Natural Language
 
AceWiki: A Natural and Expressive Semantic Wiki
AceWiki: A Natural and Expressive Semantic WikiAceWiki: A Natural and Expressive Semantic Wiki
AceWiki: A Natural and Expressive Semantic Wiki
 

Último

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 

Último (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 

Codeco: A Grammar Notation for Controlled Natural Language in Predictive Editors

  • 1. Codeco: A Grammar Notation for Controlled Natural Language in Predictive Editors Tobias Kuhn Department of Informatics University of Zurich Second Workshop on Controlled Natural Language 15 September 2010 Marettimo (Italy)
  • 2. Introduction • Problem: Existing grammar frameworks do not work out particularly well for CNLs. • Reason: • CNLs have essential differences to other languages (natural and formal ones) • To solve the writability problem, CNLs have to be embedded in special tools with very specific requirements. • Error messages and suggestions • Predictive editors • Language generation • Solution: A new grammar notation that is dedicated to CNLs and predictive editors. Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 2 / 25
  • 3. CNL Grammar Requirements Concreteness: CNL grammars should be fully formalized and interpretable by computers. Declarativeness: CNL grammars should not depend on a concrete algorithm or implementation. Lookahead Features: CNL grammars should allow for the retrieval of possible next tokens for a partial text. Anaphoric References: CNL grammars should allow for the definition of nonlocal structures like anaphoric references. Implementability: CNL grammars should be easy to implement in different programming languages. Expressivity: CNL grammars should be sufficiently expressive to express CNLs. Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 3 / 25
  • 4. Lookahead Features Predictive editors need to know which words can follow a partial text: Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 4 / 25
  • 5. Anaphoric References • Anaphoric references in CNLs are resolvable in a deterministic way: A country contains an area that is not controlled by the country. If a person X is a relative of a person Y then the person Y is a relative of the person X. John protects himself and Mary helps him. • Anaphoric references that cannot be resolved should be disallowed: * Every area is controlled by it. * The person X is a relative of the person Y. • Scopes have to be considered too: Every man protects a house from every enemy and does not destroy ... ... himself. ... the house. * ... the enemy. Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 5 / 25
  • 6. Existing Grammar Frameworks • Grammar Frameworks for Natural Languages • Head-Driven Phrase Structure Grammars • Lexical-Functional Grammars • Tree-Adjoining Grammars • Combinatory Categorial Grammars • Dependency Grammars • ... and many more • Backus-Naur Form (BNF) • Parser Generators • Yacc • GNU Bison • ANTLR • Definite Clause Grammars (DCG) • Grammatical Framework (GF) Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 6 / 25
  • 7. How Existing Grammar Frameworks Fulfill our Requirements for CNL Grammars NL BNF PG DCG GF Concreteness + + + + + Declarativeness +/– + – (+) + Lookahead Features – + (+) (+) + Anaphoric References (+) – – (+) – Implementability – + – – ? Expressivity + – + + + Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 7 / 25
  • 8. The Codeco Notation Codeco = “Concrete and Declarative Grammar Notation for Controlled Natural Languages” • Formal and Declarative • Easy to implement in different programming languages. • Expressive enough for common CNLs. • Lookahead features can be implemented in a practical and efficient way. • Deterministic anaphoric references can be defined in an adequate and simple way. Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 8 / 25
  • 9. Grammar Rules in Codeco • Grammatical categories with flat feature structures • Category Types: • non-terminal (e.g. vp) • pre-terminal (e.g. noun) • terminal (e.g. [ does not ]) • Grammar Rule Examples: • vp num: Num neg: Neg : −→ v num: Num neg: Neg type: tr np case: acc • v neg: + type: Type : −→ [ does not ] verb type: Type • np noun: Noun : −→ [ a ] noun text: Noun • noun text: woman gender: fem → [ woman ] Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 9 / 25
  • 10. Forward and Backward References The special categories “>” and “<” can be used to establish nonlocal dependencies, e.g. for anaphoric references: np : −→ det def: – noun text: Noun > type: noun noun: Noun ref : −→ det def: + noun text: Noun < type: noun noun: Noun s vp vp np ref noun area det the v tv control aux does not conj and vp np noun area det an v tv contains np >noun country det A > < Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 10 / 25
  • 11. Scopes • Opening of scopes: • Scopes in (controlled) English usually open at the position of the scope triggering structure, or nearby. • Scope opener category “ ” in Codeco: quant exist: – : −→ [ every ] • Closing of scopes: • Scopes in (controlled) English usually close at the end of certain structures like verb phrases, relative clauses, and sentences. • Scope-closing rules “ ∼ −→” in Codeco: vp num: Num ∼ −−→ v num: Num type: tr np case: acc Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 11 / 25
  • 12. Position Operators • How to define reflexive pronouns like “herself ” that can only attach to the subject? • With the position operator “#”, position identifiers can be assigned: np id: Id : −→ # Id prop gender: G > id: Id gender: G type: prop ref subj: Subj : −→ [ herself ] < id: Subj gender: fem s vp np ref herself v tv helps np prop Maryp0 p1 p2 p3 # # Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 12 / 25
  • 13. Negative Backward References • How to define that the same variable can be introduced only once? *A person X knows a person X. • Negative backward references “ ” succeed only if no matching antecedent is accessible: newvar : −→ var text: V type: var var: V > type: var var: V Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 13 / 25
  • 14. Complex Backward References • How to define pronouns like “him” that cannot attach to the subject? *John helps him. • Complex backward references “<+... −...” have one or more positive feature structure “+” and zero or more negative ones “−”. • They succeed if there is an antecedent that matches one of the positive feature structures but none of the negative ones: ref subj: Subj case: acc : −→ [ him ] <+ human: + gender: masc − id: Subj • A more complicated (but probably less useful) example: ref subj: Subj : −→ [ this ] <+ hasvar: – human: – type: relation − id: Subj type: prop Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 14 / 25
  • 15. Strong Forward References • How to define that propernames like “Bill” are always accessible? *Mary does not love a man. Mary hates him. Mary does not love Bill. Mary hates him. • Strong forward references “ ” are always accessible: np id: Id : −→ prop human: H gender: G    id: Id human: H gender: G type: prop    Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 15 / 25
  • 16. Reference Resolution: Accessibility Forward references are only accessible if they are not within a scope that has already been closed before the position of the backward reference: s ∼ vp vp ∼ np ref ... v tv destroy aux does not conj and vp pp np >n enemy det every prep from np n house det a v tv protects np n man det Every ∼ ( ( () > > < Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 16 / 25
  • 17. Reference Resolution: Accessibility Strong forward references are always accessible: s ∼ vp vp ∼ np ref ... v tv likes conj and vp np pp np prop Bill prep of >n friend det every v tv knows np prop Mary ∼ ( ) < Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 17 / 25
  • 18. Reference Resolution: Proximity If a backward reference matches more than one forward reference then the closest one is taken: s s ... ... np ref pn it conj then s vp np n error det an v tv causes np pp np n machine det a prep of n part det a conj If > > > < Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 18 / 25
  • 19. Possible Extensions • Semantics (e.g. with λ-DRSs) • General feature structures (instead of flat ones) • ... Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 19 / 25
  • 20. Parsers for Codeco Two parsers with different parsing approaches exist: • Transformation into Prolog DCG • fast (1.5 ms per sentence) • no lookahead features • ideal for regression tests and parsing of large texts in batch mode • Execution in a chart parser (Earley parser) under Java • slower, but still reasonably fast (130 ms per sentence) • lookahead features • ideal for predictive editors in Java Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 20 / 25
  • 21. ACE in Codeco • Large subset of ACE in Codeco • Includes: countable nouns, proper names, intransitive and transitive verbs, adjectives, adverbs, prepositions, plurals, negation, comparative and superlative adjectives and adverbs, of-phrases, relative clauses, modality, numerical quantifiers, coordination of sentences / verb phrases / relative clauses, conditional sentences, questions, and anaphoric references (simple definite noun phrases, variables, and reflexive and irreflexive pronouns) • Excludes: Mass nouns, measurement nouns, ditransitive verbs, numbers and strings as noun phrases, sentences as verb phrase complements, Saxon genitive, possessive pronouns, noun phrase coordination, and commands • 164 grammar rules • Used in the ACE Editor: http://attempto.ifi.uzh.ch/webapps/aceeditor/ Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 21 / 25
  • 22. Evaluation of ACE Codeco Exhaustive Language Generation: • Evaluation subgrammar with 97 grammar rules • Minimal lexicon • 2’250’869 sentences with 3–10 tokens: sentence length number of sentences growth factor 3 6 4 87 14.50 5 385 4.43 6 1’959 5.09 7 11’803 6.03 8 64’691 5.48 9 342’863 5.30 10 1’829’075 5.33 3–10 2’250’869 • All are accepted by the ACE parser → ACE Codeco is a subset of ACE • None is generated more than once → ACE Codeco is unambiguous Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 22 / 25
  • 23. Evaluation of the Codeco notation and its implementations Prolog DCG representation versus Java Earley parser: • Equivalence of the Implementations: Generate the same set of sentences up to 8 tokens → The two implementations process Codeco in the same way • Performance Tests: task grammar implementation seconds/sentence generation ACE Codeco eval. subset Prolog DCG 0.00286 generation ACE Codeco eval. subset Java Earley parser 0.0730 parsing ACE Codeco eval. subset Prolog DCG 0.000360 parsing ACE Codeco eval. subset Java Earley parser 0.0276 parsing full ACE Codeco Prolog DCG 0.00146 parsing full ACE Codeco Java Earley parser 0.134 parsing full ACE APE 0.0161 Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 23 / 25
  • 24. Conclusions Codeco ... • ... fulfills our requirements for CNLs in predictive editors. • ... is suitable to describe a large subset of ACE. • ... allows for automatic tests. • ... stands for a principled and engineering focused approach to CNL. Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 24 / 25
  • 25. Thank you for your attention! Questions & Discussion Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 25 / 25