Traditional vs. Modern Parenting: Unveiling the Pros and Cons for Your Child’...
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammatical Framework
1. A Multilingual Semantic Wiki based on
Attempto Controlled English and
Grammatical Framework
Tobias Kuhn
Chair of Sociology, in particular of Modeling and Simulation, ETH Zurich,
Switzerland
UNISA, Pretoria (South Africa)
5 June 2013
2. About This Talk
This talk is mainly based on the following paper:
Kaarel Kaljurand and Tobias Kuhn. A Multilingual Semantic Wiki
Based on Attempto Controlled English and Grammatical Framework.
In Proceedings of the 10th Extended Semantic Web Conference
(ESWC). 2013.
It can be downloaded here:
http://purl.org/tkuhn/eswc2013acewikigf
Tobias Kuhn, ETH Zurich
A Multilingual Semantic Wiki
2 / 31
3. Imagine ...
... that Wikipedia can check consistency and
answer questions about the contained
knowledge, and
...that all content is instantly available in all
languages!
Tobias Kuhn, ETH Zurich
A Multilingual Semantic Wiki
3 / 31
4. • AceWiki is a semantic wiki
• Articles are written in Attempto Controlled English (ACE)
• These sentences are internally translated into the Semantic Web
language OWL
• An OWL reasoner is built in to answer questions and detect
inconsistencies
• Special editor for writing ACE statements
• Has been extended to support multilinguality
Tobias Kuhn, ETH Zurich
A Multilingual Semantic Wiki
4 / 31
6. Attempto Controlled English (ACE)
Subset of natural English:
• Conjunction, disjunction, negation, if-then, ...
• Anaphoric references: pronouns, definite noun phrases, variables
• Quantifiers: every, no, at least 3, ...
• Content words: proper names, nouns, verbs, adjectives, ...
Grammar is fixed, but users can change content words.
Deterministic ambiguity handling:
• Anaphora resolution (France borders Spain and it borders
Portugal.)
• Quantifier scope (Every country borders a country.)
• Attachment (Every EU-country borders a country that is a
EU-country and is a NATO-country.)
Well-defined translations to and from first-order logic, OWL, ...
Tobias Kuhn, ETH Zurich
A Multilingual Semantic Wiki
6 / 31
8. Consistency Checking
Consistency of the knowledge base is very important, because it is a
prerequisite for all other reasoning tasks.
AceWiki ensures consistency by checking every new statement:
Tobias Kuhn, ETH Zurich
A Multilingual Semantic Wiki
8 / 31
11. ACE Reasoning via Translation to OWL
Every country that does not border a sea is a landlocked-country.
SubClassOf(
ObjectIntersectionOf(
:country
ObjectComplementOf(
ObjectSomeValuesFrom(
:border
:sea
)
)
)
:landlocked-country
)
Which country is a landlocked-country?
ObjectIntersectionOf(
:country
:landlocked-country
)
Tobias Kuhn, ETH Zurich
A Multilingual Semantic Wiki
11 / 31
12. Expressiveness versus Efficiency
Trade-off: Expressiveness/Complexity ⇔ Decidability/Efficiency
• First-order logic: expressive, undecidable, very inefficient
• Description Logics, OWL: less expressive, decidable, inefficient
• OWL Profiles: even less expressive, decidable, efficient
AceWiki can use full OWL or an OWL profile for reasoning.
Sentences that are more complex get a red triangle and are ignored
for reasoning:
Tobias Kuhn, ETH Zurich
A Multilingual Semantic Wiki
12 / 31
13. Evaluation
Two small usability experiments with earlier versions of AceWiki:
• Altogether 26 untrained participants
• Task: Collaborative creation of a knowledge base
Results:
• 78%-81% of the sentences were correct and sensible
• 61%-70% of them were complex (containing negations,
implications, disjunctions or number restrictions)
• Creation of a correct sentence every 5–6 minutes
• Definition of a new word every 5–7 minutes
→ Even untrained users can effectively use AceWiki
Tobias Kuhn, ETH Zurich
A Multilingual Semantic Wiki
13 / 31
14. Multilingual AceWiki: AceWiki-GF
General ideas:
• Make wiki content available in different languages
• Automatically translated content using rule-based machine
translation: Grammatical Framework (GF)
• Language switching like in Wikipedia
• Localization of the user interface
Tobias Kuhn, ETH Zurich
A Multilingual Semantic Wiki
14 / 31
15. Grammatical Framework (GF)
GF is a framework for multilingual grammar engineering:
• Rule-based (i.e. not statistical)
• Functional programming language optimized to handle natural
language
• Resource Grammar Library implementing common morphological
and syntactic structures
• Mildly context sensitive
• Bidirectional translations: concrete language ⇔ abstract syntax
Tobias Kuhn, ETH Zurich
A Multilingual Semantic Wiki
15 / 31
16. GF grammars and translations
GF grammars consist of:
• One language-neutral abstract syntax
• Multiple concrete syntaxes that implement the given abstract
categories, specifying words, word order, agreement, etc.
Example
border : Country -> Country -> Relation
English: border x y = x!Nom + "borders" + y!Nom
Estonian: border x y = x!Gen + "naaber on" + y!Nom
GF translations consist of:
• First, parse a string in the original language to a tree (or trees)
in the abstract syntax
• Then, linearize these trees as strings in the target language
Tobias Kuhn, ETH Zurich
A Multilingual Semantic Wiki
16 / 31
18. GF Resource Grammar Library (RGL)
• Morphology and syntax for ∼30 languages via language-neutral
API
• Developers do not need detailed knowledge of the languages
that they want to support in their application
Tobias Kuhn, ETH Zurich
A Multilingual Semantic Wiki
18 / 31
19. Implementation of AceWiki-GF
Integration of ACE with GF (ACE-in-GF):
• Implemented a multilingual grammar of ACE in the GF
framework
• Covered the languages supported by the GF resource grammar
• Not fine-tuned to any particular language (apart from ACE)
Integration of AceWiki with GF (AceWiki-GF):
• Implemented connection to GF tools (GF Webservice / Cloud
Service)
• Added support for the management of multilinguality, ambiguity,
grammar
Tobias Kuhn, ETH Zurich
A Multilingual Semantic Wiki
19 / 31
21. ACE-in-GF
An ACE grammar implemented in GF adds multiple natural languages
as front-ends to ACE. As a result, these languages can be mapped to
and from various formal languages already supported by ACE.
Tobias Kuhn, ETH Zurich
A Multilingual Semantic Wiki
21 / 31
22. ACE-in-GF: Example
German: Jedes Land, das nicht an ein Meer grenzt, ist ein
Binnenland.
ACE-in-GF tree:
baseText (sText (s (vpS (everyNP (relCN (cn_as_VarCN country_CN)
(neg_predRS which_RP (v2VP border_V2 (thereNP_as_NP
(aNP (cn_as_VarCN sea_CN))))))) (npVP (thereNP_as_NP
(aNP (cn_as_VarCN landlocked_country_CN)))))))
ACE: Every country that does not border a sea is a
landlocked-country.
OWL:
SubClassOf(
ObjectIntersectionOf(
:country
ObjectComplementOf(
ObjectSomeValuesFrom( :border :sea )
)
)
:landlocked-country
)
Tobias Kuhn, ETH Zurich
A Multilingual Semantic Wiki
22 / 31
23. ACE-in-GF: Implementation
Implementation of the ACE syntax:
• Extension of Angelov and Ranta (CNL 2009)
• Focus on the subset of ACE that can be mapped to OWL
• Almost 100% coverage at almost 0% ambiguity
Support most RGL languages:
• Bulgarian, Catalan, Chinese, Danish, Dutch, English, Finnish,
French, German, Greek, Hindi, Italian, Latvian, Norwegian,
Polish, Romanian, Russian, Spanish, Swedish, Thai, Urdu
• RGL-based design provides automatic increase in quality and
language-coverage over time
Status
• Some precision problems, e.g. with anaphoric references
• Ambiguity and coverage problems in some languages
Tobias Kuhn, ETH Zurich
A Multilingual Semantic Wiki
23 / 31
25. Evaluation of ACE-in-GF
Design
• Generated ∼100 ACE sentences/questions and automatically
translated them to all the languages
• Full coverage of all the grammar functions
• Large coverage of OWL axiom structures (subclass, range,
domain, transitivity, ...)
• Measured translation accuracy from ACE to other languages
• Used Google Translate as the baseline
• 20 human evaluators (2 per language) as the gold standard
Results
• Participants preferred ACE-in-GF translations to Google
translations and post-edited them less
• Many edits were stylistic
Tobias Kuhn, ETH Zurich
A Multilingual Semantic Wiki
25 / 31
27. Evaluation of AceWiki-GF
Hypothesis: A group of users reaches almost the same level of agreement
on the content of an article presented to them in different languages as
when the article is presented to all of them in the same language.
Design
• Based on a 500-word lexicon on European geography in three
languages: English, German and Spanish
• 30 participants accessed AceWiki-GF and wrote sentences in
their language (10 participants for each language)
• They had to enter true and false sentences and tag them as such
• In a post-editing task, each participant checked the output of
two other participants: one translated from another language
and one written in the same language (true/false tags were
removed and sentences shuffled)
Tobias Kuhn, ETH Zurich
A Multilingual Semantic Wiki
27 / 31
28. Evaluation of AceWiki-GF: Results
• 30 participants spent on average 37 minutes using AceWiki-GF,
creating in total 316 sentences
• Definition of agreement level: (Tk + Fd )/S
S is the total number of sentences, Tk the number of sentences marked as
true and kept, and Fd the ones marked as false and deleted
• Agreement level with and without translation: 84.0% and 82.2%
(difference is not significant)
• Assumption: translation introduces a constant translation error
rate r that has the effect that the agreement level is (1 − r ) × a
instead of a
• New hypothesis: The translation error rate is less than 5%.
• p-value with one-tailed Wilcoxon signed rank test: 0.046
• With AceWiki-GF, translation error rate is less than 5%!
Tobias Kuhn, ETH Zurich
A Multilingual Semantic Wiki
28 / 31
29. Evaluation of AceWiki-GF: Feedback
Questionnaire for the participants contained these questions:
1
Was AceWiki Geography easy or difficult to use in general?
2
Was the sentence editor easy or difficult to use?
3
Was creating true and false statements easy or difficult to
perform?
Possible answers: “very difficult” (0), “difficult” (1), “medium” (2),
“easy” (3), and “very easy” (4)
Results:
1
Average: 2.93 (∼“easy”)
2
Average: 2.77 (∼“easy”)
3
Average: 2.70 (∼“easy”)
Tobias Kuhn, ETH Zurich
A Multilingual Semantic Wiki
29 / 31
30. Links
ACE parser (APE) source code: https://github.com/Attempto/APE
ACE-in-GF source code: http://github.com/Attempto/ACE-in-GF
AceWiki and AceWikiGF
• Source code: http://github.com/AceWiki/AceWiki
• Demos (non-GF): http://attempto.ifi.uzh.ch/acewiki/
• Demos (GF): http://attempto.ifi.uzh.ch/acewiki-gf/
MOLTO project web site: http://www.molto-project.eu
Attempto project web site: http://attempto.ifi.uzh.ch
Grammatical Framework project
• Web site: http://www.grammaticalframework.org
• GF Summer School, August 2013 in Germany:
http://school.grammaticalframework.org/2013/
Tobias Kuhn, ETH Zurich
A Multilingual Semantic Wiki
30 / 31
31. Thank you for your Attention!
Questions?
Tobias Kuhn, ETH Zurich
A Multilingual Semantic Wiki
31 / 31