SlideShare una empresa de Scribd logo
1 de 14
Descargar para leer sin conexión
Formalising the Swedish Constructicon
in Grammatical Framework
Normunds Grūzītis1,3, Dana Dannélls2, Benjamin Lyngfelt2, Aarne Ranta1
1University of Gothenburg, Department of Computer Science and Engineering
2University of Gothenburg, Department of Swedish
3University of Latvia, Institute of Mathematics and Computer Science
ACL/IJCNLP Workshop on Grammar Engineering Across Frameworks
Beijing, China, July 30, 2015
Constructicon
• A collection of conventionalized (learned) pairings of form and meaning
(or function), typically based on principles of Construction Grammar, CxG
(e.g. Fillmore et al. 1988, Goldberg 1995)
– Semantics is associated directly with the surface form
– vs. Lexical units in a dictionary: pairings of word and meaning (frame)
• Including fixed multi-word units
• Each construction (cx) contains at least one variable element
– Often at least one fixed element as well
– Thus, “somewhere” in-between the syntax and the lexicon
• An example from Berkeley Constructicon: “make one’s way”
– Structure: {Motion verb [Verb] [PossNP]}
– Frame: MOTION
• [ThemeThey] {hacked their way} [Sourceout] [Goalinto the open].
• [ThemeWe] {sang our way} [Pathacross Europe].
Constructicons
• Berkeley Constructicon (BCxn) for English
– A pilot project (around 70 cx), linked to Berkeley FrameNet
• Swedish Constructicon (SweCcn)
– An ongoing project (nearly 400 cx so far), partially linked to FrameNet
• ToDo: links to BCxn
• Brazilian Portuguese Constructicon
– An ongoing project
• ...
• A multilingual (interlingual) constructicon would allow for non-
compositional translation in a compositional way
– Constructions with a referential meaning may be linked via FrameNet frames,
while those with a more abstract grammatical function may be related in terms
of their grammatical properties
[Bäckström L., Lyngfelt B., Sköldberg E. (2014) Towards interlingual constructicography]
http://spraakbanken.gu.se/eng/sweccn
SweCcn
• Partially schematic multi-word units/expressions
• Particularly addresses constructions of relevance for second-language
learning, but also covers argument structure constructions
• Descriptions are manually derived from corpus examples
• Construction elements (CE):
– Internal CEs are a part of the cx
– External CEs are a part of
the valency of the cx
– Described in more detail by
attribute-value matrices specifying
their syntactic and semantic features
• A central part of cx descriptions
is the free text definitions
– ‘eat himself full’ vs. ‘feel himself tired’
(äta sig mätt vs. känna sig trött)
SweCcn → GF
• Task: convert the semi-formal SweCcn into a computational CxG
– Test Grammatical Framework (GF) as a framework for implementing CxG
• Why GF?
– There is no formal distinction between lexical and syntactic functions in GF –
fits the nature of constructicons
– The potential support for multilinguality
– Based on GF Resource Grammar Library (RGL) / an extension to RGL
– An extension to a FrameNet-based grammar and lexicon in GF
• Goals:
– From the linguistic point of view
• Improve insights into the interaction between the lexicon and the grammar
• Allow for testing the linguistic descriptions of constructions
– From the language technology point of view:
• Facilitate the language processing in both mono- and multilingual settings
– e.g. Information Extraction, Machine Translation
Conversion steps
• Preprocessing:
– Automatic normalization and consistency checking
– Automatic rewriting of the original structures in case of optional CEs and
alternative types of CEs, so that each combination has a separate GF function
• Does not apply to alternative LUs (either free variants or should be split into
alternative constructions, or the CE should be made more general)
– Automatic conversion of SweCcn categories to RGL categories
• May result in more rewriting
• Automatic generation of the abstract syntax
• Automatic generation of the concrete syntax
– By systematically applying the high-level RGL constructors
• And limited low-level means
• Manual verification and completion (ToDo)
– Requires a good knowledge and linguistic intuition of the language
Preprocessing examples
• behöva NP1 till NP2|VP →
behövaV NP1 tillPrep NP2 | behövaV NP tillPrep VP
• snacka|prata|tala NPindef → (~synonyms of “to talk”)
snackaV|prataV|talaV aSg_Det CN |
snackaV|prataV|talaV aPl_Det CN |
snackaV|prataV|talaV CN
• V av Pnrefl (NP) →
V avPrep reflPron NP | V avPrep reflPron
• N|Adj+städa → (compounds)
N + städaV | A + städaV
Abstract syntax
• Each construction is represented by one or more functions
depending on how many alternative structures are produced in the
preprocessing steps
• Each function takes one or more arguments that correspond to the
variable CEs of the respective alternative construction
• behöva_något_till_något_VP1 : NP -> NP -> VP
behöva_något_till_något_VP2 : NP -> VP -> VP
• snacka_NP1: CN -> VP
snacka_NP2: CN -> VP
snacka_NP3: CN -> VP
• verba_av_sig_transitiv1: V -> NP -> VP
verba_av_sig_transitiv2: V -> VP
• x_städa1: N -> VP
x_städa2: A -> VP
Concrete syntax
Construction Elements Patterns
behöva_något_till_något_VP_1 behöva_V NP_1 till_Prep NP_2 {V} NP {Prep} NP
behöva_något_till_något_VP_2 behöva_V NP_1 till_Prep VP {V} NP {Prep} VP
Code template
1. mkVP (mkVP (mkV2 mkV) NP) (mkAdv mkPrep NP)
2. The parser failed at token VP
• Many constructions can be implemented by systematically applying
the high-level RGL constructors
– A parsing problem: which constructors in which order?
A simple GF grammar
Final code (by automatic post-processing)
lin behöva_något_till_något_VP_1 np_1 np_2 = mkVP
(mkVP (mkV2 (mkV "behöver")) np_1)
(SyntaxSwe.mkAdv (mkPrep "till") np_2) ;
GF RGL API
Code-generating grammar
A simplified fragment of the abstract syntax
A simplified fragment of the concrete syntax
parse -cat=VP "{V} {Prep} NP"
mkVP__V2_NP
(mkV2__V (partV _mkV___V
(toStr__Prep _mkPrep_))) _NP_
mkVP__V2_NP (mkV2__V_Prep
_mkV___V _mkPrep_) _NP_
mkVP__VP_Adv (mkVP__V _mkV___V)
(mkAdv _mkPrep_ _NP_)
Running examples
• parse "jag behöver något till något"
– PredVP (UsePron i_Pron)
(behöva_något_till_något_1 (DetNP someSg_Det) (DetNP someSg_Det))
– PredVP (UsePron i_Pron)
(behöva_något_till_något_1 (DetNP someSg_Det) something_NP)
– PredVP (UsePron i_Pron)
(behöva_något_till_något_1 something_NP (DetNP someSg_Det))
– PredVP (UsePron i_Pron)
(behöva_något_till_något_1 something_NP something_NP)
• parse "han äter sig mätt"
– PredVP (UsePron he_Pron)
(reflexiv_resultativ aeta_vb_1_1_V (PositA maett_av_1_1_A))
– PredVP (UsePron he_Pron)
(AdvVP (SI_refl aeta_vb_1_1_V) (PositAdvAdj maett_av_1_1_A))
– PredVP (UsePron he_Pron)
(AdvVP (reciprok_refl aeta_vb_1_1_V) (PositAdvAdj maett_av_1_1_A))
– PredVP (UsePron he_Pron)
(AdvVP (trans_refl aeta_vb_1_1_V) (PositAdvAdj maett_av_1_1_A))
– PredVP (UsePron he_Pron)
(V_refl_rörelse aeta_vb_1_1_V (PositAdvAdj maett_av_1_1_A))
Results
• In the current experiment, we have considered only the 96 VP
constructions which resulted in 127 functions
– Dominating in SweCcn; have the most complex internal structure
• Given the 127 functions, we have automatically generated the
implementation for 98 functions (77%) achieving a 70–90% accuracy
– There is clear space for improvement
• Manual completion postponed because of the active development of
SweCcn (changes → synchronization)
• https://github.com/GrammaticalFramework/gf-contrib (SweCcn)
• A methodology on how to systematically formalise the semi-formal
representation of SweCcn in GF, showing that a GF construction grammar
can be, to a large extent, acquired automatically
• Consequence: feedback to SweCcn developers on how to improve the
annotation consistency and adequacy of the original construction resource

Más contenido relacionado

Similar a Formalising the Swedish Constructicon in Grammatical Framework

Grammatical Framework for implementing multilingual frames and constructions
Grammatical Framework for implementing multilingual frames and constructionsGrammatical Framework for implementing multilingual frames and constructions
Grammatical Framework for implementing multilingual frames and constructionsNormunds Grūzītis
 
Nlp and transformer (v3s)
Nlp and transformer (v3s)Nlp and transformer (v3s)
Nlp and transformer (v3s)H K Yoon
 
Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...Edmond Lepedus
 
MOLTO poster for ACL 2010, Uppsala Sweden
MOLTO poster for ACL 2010, Uppsala SwedenMOLTO poster for ACL 2010, Uppsala Sweden
MOLTO poster for ACL 2010, Uppsala SwedenOlga Caprotti
 
System Programming Unit III
System Programming Unit IIISystem Programming Unit III
System Programming Unit IIIManoj Patil
 
Elena Bolshakova and Natalia Efremova - A Heuristic Strategy for Extracting T...
Elena Bolshakova and Natalia Efremova - A Heuristic Strategy for Extracting T...Elena Bolshakova and Natalia Efremova - A Heuristic Strategy for Extracting T...
Elena Bolshakova and Natalia Efremova - A Heuristic Strategy for Extracting T...AIST
 
Encode, tag, realize high precision text editing
Encode, tag, realize high precision text editingEncode, tag, realize high precision text editing
Encode, tag, realize high precision text editingtaeseon ryu
 
BERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersBERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersLiangqun Lu
 
Definite Clause Grammars For Language Analysis
Definite Clause Grammars For Language AnalysisDefinite Clause Grammars For Language Analysis
Definite Clause Grammars For Language AnalysisRePierre
 
Controlled Natural Language Generation from a Multilingual FrameNet-based Gra...
Controlled Natural Language Generation from a Multilingual FrameNet-based Gra...Controlled Natural Language Generation from a Multilingual FrameNet-based Gra...
Controlled Natural Language Generation from a Multilingual FrameNet-based Gra...Normunds Grūzītis
 
PL Lecture 01 - preliminaries
PL Lecture 01 - preliminariesPL Lecture 01 - preliminaries
PL Lecture 01 - preliminariesSchwannden Kuo
 
Functional Programming in JavaScript & ESNext
Functional Programming in JavaScript & ESNextFunctional Programming in JavaScript & ESNext
Functional Programming in JavaScript & ESNextUnfold UI
 
PyData Los Angeles 2020 (Abhilash Majumder)
PyData Los Angeles 2020 (Abhilash Majumder)PyData Los Angeles 2020 (Abhilash Majumder)
PyData Los Angeles 2020 (Abhilash Majumder)Abhilash Majumder
 
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...Domain-Specific Term Extraction for Concept Identification in Ontology Constr...
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...Innovation Quotient Pvt Ltd
 
OWLGrEd/CNL: a Graphical Editor for OWL with Multilingual CNL Support
OWLGrEd/CNL: a Graphical Editor for OWL with Multilingual CNL SupportOWLGrEd/CNL: a Graphical Editor for OWL with Multilingual CNL Support
OWLGrEd/CNL: a Graphical Editor for OWL with Multilingual CNL SupportNormunds Grūzītis
 
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFEnd-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFJayavardhan Reddy Peddamail
 
Introduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKennaIntroduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKennaopenseesdays
 
ECO_TEXT_CLUSTERING
ECO_TEXT_CLUSTERINGECO_TEXT_CLUSTERING
ECO_TEXT_CLUSTERINGGeorge Simov
 

Similar a Formalising the Swedish Constructicon in Grammatical Framework (20)

Grammatical Framework for implementing multilingual frames and constructions
Grammatical Framework for implementing multilingual frames and constructionsGrammatical Framework for implementing multilingual frames and constructions
Grammatical Framework for implementing multilingual frames and constructions
 
Nlp and transformer (v3s)
Nlp and transformer (v3s)Nlp and transformer (v3s)
Nlp and transformer (v3s)
 
Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...
 
MOLTO poster for ACL 2010, Uppsala Sweden
MOLTO poster for ACL 2010, Uppsala SwedenMOLTO poster for ACL 2010, Uppsala Sweden
MOLTO poster for ACL 2010, Uppsala Sweden
 
System Programming Unit III
System Programming Unit IIISystem Programming Unit III
System Programming Unit III
 
Elena Bolshakova and Natalia Efremova - A Heuristic Strategy for Extracting T...
Elena Bolshakova and Natalia Efremova - A Heuristic Strategy for Extracting T...Elena Bolshakova and Natalia Efremova - A Heuristic Strategy for Extracting T...
Elena Bolshakova and Natalia Efremova - A Heuristic Strategy for Extracting T...
 
Encode, tag, realize high precision text editing
Encode, tag, realize high precision text editingEncode, tag, realize high precision text editing
Encode, tag, realize high precision text editing
 
BERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersBERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from Transformers
 
Deep Learning for Machine Translation
Deep Learning for Machine TranslationDeep Learning for Machine Translation
Deep Learning for Machine Translation
 
Definite Clause Grammars For Language Analysis
Definite Clause Grammars For Language AnalysisDefinite Clause Grammars For Language Analysis
Definite Clause Grammars For Language Analysis
 
Controlled Natural Language Generation from a Multilingual FrameNet-based Gra...
Controlled Natural Language Generation from a Multilingual FrameNet-based Gra...Controlled Natural Language Generation from a Multilingual FrameNet-based Gra...
Controlled Natural Language Generation from a Multilingual FrameNet-based Gra...
 
PL Lecture 01 - preliminaries
PL Lecture 01 - preliminariesPL Lecture 01 - preliminaries
PL Lecture 01 - preliminaries
 
Functional Programming in JavaScript & ESNext
Functional Programming in JavaScript & ESNextFunctional Programming in JavaScript & ESNext
Functional Programming in JavaScript & ESNext
 
PyData Los Angeles 2020 (Abhilash Majumder)
PyData Los Angeles 2020 (Abhilash Majumder)PyData Los Angeles 2020 (Abhilash Majumder)
PyData Los Angeles 2020 (Abhilash Majumder)
 
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...Domain-Specific Term Extraction for Concept Identification in Ontology Constr...
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...
 
OWLGrEd/CNL: a Graphical Editor for OWL with Multilingual CNL Support
OWLGrEd/CNL: a Graphical Editor for OWL with Multilingual CNL SupportOWLGrEd/CNL: a Graphical Editor for OWL with Multilingual CNL Support
OWLGrEd/CNL: a Graphical Editor for OWL with Multilingual CNL Support
 
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFEnd-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
 
Introduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKennaIntroduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKenna
 
ECO_TEXT_CLUSTERING
ECO_TEXT_CLUSTERINGECO_TEXT_CLUSTERING
ECO_TEXT_CLUSTERING
 
Programming in c++
Programming in c++Programming in c++
Programming in c++
 

Último

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 

Último (20)

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 

Formalising the Swedish Constructicon in Grammatical Framework

  • 1. Formalising the Swedish Constructicon in Grammatical Framework Normunds Grūzītis1,3, Dana Dannélls2, Benjamin Lyngfelt2, Aarne Ranta1 1University of Gothenburg, Department of Computer Science and Engineering 2University of Gothenburg, Department of Swedish 3University of Latvia, Institute of Mathematics and Computer Science ACL/IJCNLP Workshop on Grammar Engineering Across Frameworks Beijing, China, July 30, 2015
  • 2. Constructicon • A collection of conventionalized (learned) pairings of form and meaning (or function), typically based on principles of Construction Grammar, CxG (e.g. Fillmore et al. 1988, Goldberg 1995) – Semantics is associated directly with the surface form – vs. Lexical units in a dictionary: pairings of word and meaning (frame) • Including fixed multi-word units • Each construction (cx) contains at least one variable element – Often at least one fixed element as well – Thus, “somewhere” in-between the syntax and the lexicon • An example from Berkeley Constructicon: “make one’s way” – Structure: {Motion verb [Verb] [PossNP]} – Frame: MOTION • [ThemeThey] {hacked their way} [Sourceout] [Goalinto the open]. • [ThemeWe] {sang our way} [Pathacross Europe].
  • 3. Constructicons • Berkeley Constructicon (BCxn) for English – A pilot project (around 70 cx), linked to Berkeley FrameNet • Swedish Constructicon (SweCcn) – An ongoing project (nearly 400 cx so far), partially linked to FrameNet • ToDo: links to BCxn • Brazilian Portuguese Constructicon – An ongoing project • ... • A multilingual (interlingual) constructicon would allow for non- compositional translation in a compositional way – Constructions with a referential meaning may be linked via FrameNet frames, while those with a more abstract grammatical function may be related in terms of their grammatical properties [Bäckström L., Lyngfelt B., Sköldberg E. (2014) Towards interlingual constructicography]
  • 5. SweCcn • Partially schematic multi-word units/expressions • Particularly addresses constructions of relevance for second-language learning, but also covers argument structure constructions • Descriptions are manually derived from corpus examples • Construction elements (CE): – Internal CEs are a part of the cx – External CEs are a part of the valency of the cx – Described in more detail by attribute-value matrices specifying their syntactic and semantic features • A central part of cx descriptions is the free text definitions – ‘eat himself full’ vs. ‘feel himself tired’ (äta sig mätt vs. känna sig trött)
  • 6. SweCcn → GF • Task: convert the semi-formal SweCcn into a computational CxG – Test Grammatical Framework (GF) as a framework for implementing CxG • Why GF? – There is no formal distinction between lexical and syntactic functions in GF – fits the nature of constructicons – The potential support for multilinguality – Based on GF Resource Grammar Library (RGL) / an extension to RGL – An extension to a FrameNet-based grammar and lexicon in GF • Goals: – From the linguistic point of view • Improve insights into the interaction between the lexicon and the grammar • Allow for testing the linguistic descriptions of constructions – From the language technology point of view: • Facilitate the language processing in both mono- and multilingual settings – e.g. Information Extraction, Machine Translation
  • 7. Conversion steps • Preprocessing: – Automatic normalization and consistency checking – Automatic rewriting of the original structures in case of optional CEs and alternative types of CEs, so that each combination has a separate GF function • Does not apply to alternative LUs (either free variants or should be split into alternative constructions, or the CE should be made more general) – Automatic conversion of SweCcn categories to RGL categories • May result in more rewriting • Automatic generation of the abstract syntax • Automatic generation of the concrete syntax – By systematically applying the high-level RGL constructors • And limited low-level means • Manual verification and completion (ToDo) – Requires a good knowledge and linguistic intuition of the language
  • 8. Preprocessing examples • behöva NP1 till NP2|VP → behövaV NP1 tillPrep NP2 | behövaV NP tillPrep VP • snacka|prata|tala NPindef → (~synonyms of “to talk”) snackaV|prataV|talaV aSg_Det CN | snackaV|prataV|talaV aPl_Det CN | snackaV|prataV|talaV CN • V av Pnrefl (NP) → V avPrep reflPron NP | V avPrep reflPron • N|Adj+städa → (compounds) N + städaV | A + städaV
  • 9. Abstract syntax • Each construction is represented by one or more functions depending on how many alternative structures are produced in the preprocessing steps • Each function takes one or more arguments that correspond to the variable CEs of the respective alternative construction • behöva_något_till_något_VP1 : NP -> NP -> VP behöva_något_till_något_VP2 : NP -> VP -> VP • snacka_NP1: CN -> VP snacka_NP2: CN -> VP snacka_NP3: CN -> VP • verba_av_sig_transitiv1: V -> NP -> VP verba_av_sig_transitiv2: V -> VP • x_städa1: N -> VP x_städa2: A -> VP
  • 10. Concrete syntax Construction Elements Patterns behöva_något_till_något_VP_1 behöva_V NP_1 till_Prep NP_2 {V} NP {Prep} NP behöva_något_till_något_VP_2 behöva_V NP_1 till_Prep VP {V} NP {Prep} VP Code template 1. mkVP (mkVP (mkV2 mkV) NP) (mkAdv mkPrep NP) 2. The parser failed at token VP • Many constructions can be implemented by systematically applying the high-level RGL constructors – A parsing problem: which constructors in which order? A simple GF grammar Final code (by automatic post-processing) lin behöva_något_till_något_VP_1 np_1 np_2 = mkVP (mkVP (mkV2 (mkV "behöver")) np_1) (SyntaxSwe.mkAdv (mkPrep "till") np_2) ;
  • 12. Code-generating grammar A simplified fragment of the abstract syntax A simplified fragment of the concrete syntax parse -cat=VP "{V} {Prep} NP" mkVP__V2_NP (mkV2__V (partV _mkV___V (toStr__Prep _mkPrep_))) _NP_ mkVP__V2_NP (mkV2__V_Prep _mkV___V _mkPrep_) _NP_ mkVP__VP_Adv (mkVP__V _mkV___V) (mkAdv _mkPrep_ _NP_)
  • 13. Running examples • parse "jag behöver något till något" – PredVP (UsePron i_Pron) (behöva_något_till_något_1 (DetNP someSg_Det) (DetNP someSg_Det)) – PredVP (UsePron i_Pron) (behöva_något_till_något_1 (DetNP someSg_Det) something_NP) – PredVP (UsePron i_Pron) (behöva_något_till_något_1 something_NP (DetNP someSg_Det)) – PredVP (UsePron i_Pron) (behöva_något_till_något_1 something_NP something_NP) • parse "han äter sig mätt" – PredVP (UsePron he_Pron) (reflexiv_resultativ aeta_vb_1_1_V (PositA maett_av_1_1_A)) – PredVP (UsePron he_Pron) (AdvVP (SI_refl aeta_vb_1_1_V) (PositAdvAdj maett_av_1_1_A)) – PredVP (UsePron he_Pron) (AdvVP (reciprok_refl aeta_vb_1_1_V) (PositAdvAdj maett_av_1_1_A)) – PredVP (UsePron he_Pron) (AdvVP (trans_refl aeta_vb_1_1_V) (PositAdvAdj maett_av_1_1_A)) – PredVP (UsePron he_Pron) (V_refl_rörelse aeta_vb_1_1_V (PositAdvAdj maett_av_1_1_A))
  • 14. Results • In the current experiment, we have considered only the 96 VP constructions which resulted in 127 functions – Dominating in SweCcn; have the most complex internal structure • Given the 127 functions, we have automatically generated the implementation for 98 functions (77%) achieving a 70–90% accuracy – There is clear space for improvement • Manual completion postponed because of the active development of SweCcn (changes → synchronization) • https://github.com/GrammaticalFramework/gf-contrib (SweCcn) • A methodology on how to systematically formalise the semi-formal representation of SweCcn in GF, showing that a GF construction grammar can be, to a large extent, acquired automatically • Consequence: feedback to SweCcn developers on how to improve the annotation consistency and adequacy of the original construction resource