1. ISWC 2017, Vienna, Austria
AMUSE: Multilingual Semantic Parsing for
Question Answering over Linked Data
Sherzod Hakimov, Soufian Jebbara & Philipp Cimiano
Semantic Computing Group
CITEC, Bielefeld University
1
2. ISWC 2017 Vienna, Austria
Virtual Assistants
2
Siri
Google Now
Alexa
Cortana
6. ISWC 2017 Vienna, Austria
Problem Definition -Question Answering
SELECT ?x WHERE {
dbr:Wikipedia dbo:author ?x .
}
6
Who created Wikipedia?
7. ISWC 2017 Vienna, Austria
Problem Definition -Question Answering
SELECT ?x WHERE {
dbr:Wikipedia dbo:author ?x .
}
7
Who created Wikipedia?
Wer hat Wikipedia gegründet?
¿Quién creó Wikipedia?
8. ISWC 2017 Vienna, Austria
Problem Definition -Question Answering
SELECT ?x WHERE {
dbr:Wikipedia dbo:author ?x .
}
8
Who created Wikipedia?
Wer hat Wikipedia gegründet?
¿Quién creó Wikipedia?
9. ISWC 2017 Vienna, Austria
Problem Definition -Question Answering
SELECT ?x WHERE {
dbr:Wikipedia dbo:author ?x .
}
9
Who created Wikipedia?
Wer hat Wikipedia gegründet?
¿Quién creó Wikipedia?
QALD - 7 Multilingual Question Answering Dataset, ESWC 2017
8 languages
215 Train instances
44 Test instances
10. ISWC 2017 Vienna, Austria
Motivation
10
Who created Wikipedia? Wer hat Wikipedia gegründet? ¿Quién creó Wikipedia?
11. ISWC 2017 Vienna, Austria
Motivation
11
Who created Wikipedia? Wer hat Wikipedia gegründet? ¿Quién creó Wikipedia?
Universal Dependencies v2, 50 languages
12. ISWC 2017 Vienna, Austria
Knowledge Base -DBpedia
! 2016-04 release
! 125 languages
! 754 classes
! 1,103 object properties
! 1,608 datatype properties
12
13. ISWC 2017 Vienna, Austria
Preliminaries
Logical Form
Semantic Composition using Dependency Parse Tree
Factor Graphs
13
14. ISWC 2017 Vienna, Austria
Logical Form -DUDES
• Dependency-based Underspecified Discourse Representation Structures (Cimiano et al [1])
• Formalism for specifying meaning representation
• Flexible semantic composition w.r.t order of application
• Build on semantic dependencies e.g. suitable for working with dependency-based syntactic
analysis
14
[1] Cimiano, P., 2009, Flexible semantic composition with DUDES. In Proceedings of the Eighth International Conference on Computational Semantics (pp. 272-276). Association for Computational
Linguistics.
15. ISWC 2017 Vienna, Austria
DUDES
15
v : is the main variable
vs : is a set of variables (possibly empty), the projection variables
l : is the label of the main DRS
drs : is a DRS (Discourse Representation Structure)
slots : is a set of semantic dependencies (possibly empty)
16. ISWC 2017 Vienna, Austria
DUDES
16
v : is the main variable
vs : is a set of variables (possibly empty), the projection variables
l : is the label of the main DRS
drs : is a DRS (Discourse Representation Structure)
slots : is a set of semantic dependencies (possibly empty)
17. ISWC 2017 Vienna, Austria
DUDES
17
v : is the main variable
vs : is a set of variables (possibly empty), the projection variables
l : is the label of the main DRS
drs : is a DRS (Discourse Representation Structure)
slots : is a set of semantic dependencies (possibly empty)
18. ISWC 2017 Vienna, Austria
DUDES
18
v : is the main variable
vs : is a set of variables (possibly empty), the projection variables
l : is the label of the main DRS
drs : is a DRS (Discourse Representation Structure)
slots : is a set of semantic dependencies (possibly empty)
19. ISWC 2017 Vienna, Austria
Semantic Composition with DUDES
Who created Wikipedia?
19
30. ISWC 2017 Vienna, Austria
Approach
- Semantic Parsing using dependency parse tree
- Language independent pipeline
- Model based on factor graphs
- SampleRank to optimise features
- Inference strategy based on MCMC (Markov Chain Monte Carlo)
30
35. ISWC 2017 Vienna, Austria
Inference
35
Initial State
m - sampling steps
36. ISWC 2017 Vienna, Austria
Inference
36
Initial State Sampled State
m - sampling steps
37. ISWC 2017 Vienna, Austria
Inference
2 strategies to explore the search space
1) Linking to Knowledge Base (L2KB)
2) Query Construction (QC)
37
38. ISWC 2017 Vienna, Austria
Inference
2 strategies to explore the search space
1) Linking to Knowledge Base (L2KB)
• objective : compare set of URIs to the expected set of URIs
2) Query Construction (QC)
• objective : compare the constructed query to the expected query
38
39. ISWC 2017 Vienna, Austria
Linking to Knowledge Base (L2KB)
Explore the edges and assign Knowledge Base IDs based on lemmas of nodes
39
40. ISWC 2017 Vienna, Austria
Linking to Knowledge Base (L2KB)
Explore the edges and assign Knowledge Base IDs based on lemmas of nodes
Check the triple pattern- ?x dbo:author dbr:Wikipedia : Slot 2, dbr:Wikipedia dbo:author ?x : Slot1
40
45. ISWC 2017 Vienna, Austria
Addressing The Lexical Gap
45
Who is the writer of The Hunger Games ?
Give me all movies with Tom Cruise. movies — rdf:type dbo:Film
writer — dbo:author
46. ISWC 2017 Vienna, Austria
Addressing The Lexical Gap
46
DBpedia Labels
MATOLL[1]
Word Embeddings[2]
[1] : Walter, S., Unger, C., Cimiano, P.: Dblexipedia: A nucleus for a multilingual lexical semanticweb. In: Proceedings of 3th International Workshop on NLP and DBpedia, co-located withthe ISWC 2015
[2] : Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems.(2013)
47. ISWC 2017 Vienna, Austria
Addressing The Lexical Gap
47
DBpedia Labels
• rdfs:label translated from English to Spanish and German using dict.cc
MATOLL[1]
Word Embeddings[2]
[1] : Walter, S., Unger, C., Cimiano, P.: Dblexipedia: A nucleus for a multilingual lexical semanticweb. In: Proceedings of 3th International Workshop on NLP and DBpedia, co-located withthe ISWC 2015
[2] : Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems.(2013)
48. ISWC 2017 Vienna, Austria
Addressing The Lexical Gap
48
DBpedia Labels
• rdfs:label translated from English to Spanish and German using dict.cc
MATOLL[1]
• English, German and Spanish lexica
Word Embeddings[2]
[1] : Walter, S., Unger, C., Cimiano, P.: Dblexipedia: A nucleus for a multilingual lexical semanticweb. In: Proceedings of 3th International Workshop on NLP and DBpedia, co-located withthe ISWC 2015
[2] : Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems.(2013)
49. ISWC 2017 Vienna, Austria
Addressing The Lexical Gap
49
DBpedia Labels
• rdfs:label translated from English to Spanish and German using dict.cc
MATOLL[1]
• English, German and Spanish lexica
Word Embeddings[2]
• Skip-gram model with 100 dimensions trained on Wikipedia text (for 3
languages)
• Cosine similarity between rdfs:label and mention text
[1] : Walter, S., Unger, C., Cimiano, P.: Dblexipedia: A nucleus for a multilingual lexical semanticweb. In: Proceedings of 3th International Workshop on NLP and DBpedia, co-located withthe ISWC 2015
50. ISWC 2017 Vienna, Austria
Addressing The Lexical Gap
50
Word Embeddings
• Skip-gram model with 100 dimensions trained on Wikipedia text (for 3
languages)
• Cosine similarity between rdfs:label and mention text
52. ISWC 2017 Vienna, Austria
Evaluation -Lexicon
- Gold standard : manually annotated lexicon (only DBpedia Ontology properties and classes)
52
53. ISWC 2017 Vienna, Austria
Evaluation -Lexicon
- Gold standard : manually annotated lexicon (only DBpedia Ontology properties and classes)
53
English German Spanish
54. ISWC 2017 Vienna, Austria
Evaluation -Question Answering
- Model trained and tested on QALD-6
- Evaluated L2KB and QC separately
54
55. ISWC 2017 Vienna, Austria
Evaluation -Question Answering
55DBlex : MATOLL lexica, Dict: Manually created lexica
56. ISWC 2017 Vienna, Austria
Evaluation -Question Answering
56DBlex : MATOLL lexica, Dict: Manually created lexica
57. ISWC 2017 Vienna, Austria
Evaluation -Question Answering
57DBlex : MATOLL lexica, Dict: Manually created lexica
58. ISWC 2017 Vienna, Austria
System Errors
58
• Property (%48)
Who wrote the song Hotel California? - dbo:musicalArtist for song instead of the dbo:writer
• Resource (%30)
When did the Boston Tea Party take place? - The resource wasn’t found
• Query Type (%12)
Where does Piccadilly start? - wrongly infers that this is an ASK-query
• Slot (%10)
How many people live in Poland? - Poland is inferred to fill the 2nd slot instead of the 1st slot of dbo:populationTotal
59. ISWC 2017 Vienna, Austria
Conclusion
- Multilingual Semantic Parsing approach based on factor graphs
- Model generalises well even trained with only 161 instances
- Language independent
59
60. ISWC 2017 Vienna, Austria
Future directions
- Improve results by adding additional inference layers e.g. Query Type Classification
- Apply different ranking functions e.g. Regression, Pair-wise State comparison
- Add more lexical knowledge from other sources
- Paraphrasing the training questions to learn from more data
60
61. ISWC 2017 Vienna, Austria
Thanks!
61
@sherzodhakimov
shakimov AT techfak.uni-bielefeld.de
63. Factor Graphs
Formal Definition:
! A factor graph G consists of variables V and factors Ψ . Variables can be subdivided into observed variables
xi and hidden variables yi.
! A factor connects subsets of observed and hidden variables and computes a scalar score based on a
feature vector fi(xi,yi) and a set of parameters θi:
● The probability for y for a given input x can be calculated as:
[1] Hakimov, S., ter Horst, H., Jebbara, S., Hartung, M., Cimiano, P.: Combining Textual and Graph-based Features for Named Entity Disambiguation Using Undirected Probabilistic Graphical Models.
[2] Kschischang, F.R., Frey, B.J., Loeliger, H.A.: Factor Graphs and Sum Product Algorithm.
[3] ter Horst, H., Hartung, M. and Cimiano, P. :Joint Entity Recognition and Linking in Technical Domains Using Undirected Probabilistic Graphical Models. LKD 2017
63
What is the goal ?
Find y that maximizes the posterior distribution p(y|x; θ)