SlideShare una empresa de Scribd logo
1 de 19
Discourse Annotation for
Arabic
Arwa Al-Zammam, Ruba Al-Homaid, Eman Al-Badr
Supervisor: Amal Al-Saif
Natural Language Processing - CS465
11-6-1434 H
Outline
• Leeds Arabic Discourse Treebank
• Discourse Annotation
• Arabic language characteristics
• Discourse relations
• Characteristics of Modern Standard Arabic
• Arabic Discourse Connectives
• Agreement Studies
• Discourse Connective Recognition
• Result of Discourse Connective Recognition
• Discourse Relation Recognition
• Result of Discourse Relation Recognition
• Conclusion
Leeds Arabic Discourse Treebank
• The Leeds Arabic Discourse Treebank LADTB v1 is the first
discourse Treebank for MSA
• LADTB has similar annotation principles as PDTB project for
English, Turkish, Hindi and Chinese discourse TB
• Although LADTB was built to be a gold standard for automatic
discourse processing studies
Discourse Annotation
• Discourse relations such as CAUSAL or CONTRAST
relations between textual units play an important role in
producing a coherent discourse.
• In defining discourse connectives as lexical expressions that
relate two text segments (arguments) that express abstract
entities such as events, belief, facts or propositions ( /lkn/but,
/Aw/or).
Contrast
Causal
Discourse Annotation
• Applications using discourse annotation:
• Automatic summarization
• Question answering
• Sentiment analysis
• Readability assessment
• Arabic discourse connectives are ambiguity.
• Explicit discourse connectives.
• The variety of Arabic discourse connectives.
• The annotation principles designed to annotate discourse
connectives in English in the PDTB2, can be applied to
reliably annotate discourse connectives in Arabic newswire.
• Machine learning models can be used to identify discourse
connectives and relations in Arabic newswire.
• Supervised machine learning models can identify Arabic
discourse connectives and their relations with high reliability.
Arabic Language Characteristics
Discourse Relations
• Explicit discourse relations:
[He took my photo,]Arg2 [while]DC [I was having dinner]Arg2
• Implicit discourse relations:
[He has to stay in bed.]Arg1 [He has the flu.]Arg2
Characteristics of Modern Standard Arabic
Characteristics of Modern Standard Arabic
Al-maSdar noun:
Characteristics of Modern Standard Arabic
EnglishAl-masdar nounMorph. PatternRoot
swimmingSbh
reflectionEks
experimentJrb
warHrb
defenceDfe
Al-maSdar noun:
• Word order in Arabic. (verb –subject –object)
• Punctuations in Arabic.
Characteristics of Modern Standard Arabic
Arabic Discourse Connectives
• Conjunctions ( /lkn/but, /Aw/or or /w/and)
• Adverbial ( /TAlmA.. f../as-long-as)
• Prepositional phrases, prepositions also can link discourse segments
when one or both arguments are al-maSdar nouns.
some nouns such as ( /ntyjp/result, /ks.yp/fear and
/bqyp/desire) are used as discourse connectives in Arabic.
The discourse connectives in Arabic might occur:
• Individually such as ( /lkn/however).
• In conjunction with other connectives using the coordinating conjunction
/w/and such as ( /lkn w qbl/however and before).
• As multiple connectives without conjunction such as ( /AlA bEd/
except after).
Agreement Studies
• TASK I :
measures whether annotators agree on the binary decision on
whether an item constitutes a discourse connective in context.
• TASK 2:
measures whether annotators agree on which discourse relation an
identified connective expresses.
The agreement was measured for the distinction of discourse vs.
non-discourse usage, relation assignment and argument
assignment:
agr(ann1||) ann2 = |ann1 matching ann2| / |ann1|
Discourse Connective Recognition
• Surface Features (SConn).
• Lexical features of surrounding words (Lex).
Arg1DCArg2
• Part of Speech features (POS).
• Syntactic category of related phrases (Syn).
non-discourse usage of w/and ( / ¯almdrsh
kbyrh w ˇgmylh/ the school is very large and beautiful).
• Al-Masdar feature.
Result
Acurr KFeatures
68.9 0Baseline (not conn)
75.7 0.48Conn onlyM1
Tokenization by white space + auto tagger
85.6 0.62Conn + SConn + LexM2
87.6 0.69Conn + SConn + Lex + POSM3
88.5 0.70Conn + SConn + Lex + POS + MasdarM4
ATB – based features
86.2 0.65Conn + SConn + LexM5
91.2 0.79Conn + SConn + Lex + Syn/POSM6
92.4 0.82Conn + Sconn + Lex + Syn/POS + MasdarM7
91.2 0.79Conn + Sconn + SynM8
91.2 0.79Sconn + Lex + Syn + MasderM9
Discourse Relation Recognition
• Words and POS of arguments.
• Masdar.
• Tense and Negation.
• Length, Distance and Order Features.
• Production Rules.
Result
Acurr KFeaturesRef
All connectives (6039)
52.5 0Baseline (CONJUNCTION)
77.2 0.60Conn only (1)M1
78.8 0.66Conn + Conn_f + Arg_f (37)M2
78.3 0.65Conn + Conn_f + Arg_f + Production
rules (1237)
M3
Excluding wa at BOP (3813)
35 0Baseline (CONJUNCTION)
74.3 0.65Conn only (1)M1
77 0.69Conn + Conn_f + Arg_f (37)M2
76.7 0.69Conn + Conn_f + Arg_f + Production
rules (1237)
M3
Result
Acurr KFeaturesRef
All connectives (6039)
62.4 0Baseline (EXPANSION)
88.7 0.78Conn only (1)M1
88.7 0.78Conn + Conn_f + Arg_f (37)M2
Excluding wa at BOP (3813)
41.8 0Baseline (EXPANSION)
82.7 0.74Conn only (1)M1
83.5 0.75Conn + Conn_f + Arg_f (37)M2
Conclusion:
We talked about Arabic discourse annotation;
discourse connective and relations. We also show
Arabic language characteristics which related to this
subject and the result.

Más contenido relacionado

Destacado

Destacado (9)

Discourse annotation for arabic 3
Discourse annotation for arabic 3Discourse annotation for arabic 3
Discourse annotation for arabic 3
 
Discourse annotation
Discourse annotationDiscourse annotation
Discourse annotation
 
Building corpus from www for arabic
Building corpus from www for arabicBuilding corpus from www for arabic
Building corpus from www for arabic
 
The named entity recognition (ner)2
The named entity recognition (ner)2The named entity recognition (ner)2
The named entity recognition (ner)2
 
Arabic to-english machine translation
Arabic to-english machine translationArabic to-english machine translation
Arabic to-english machine translation
 
Part of speech tagging for Arabic
Part of speech tagging for ArabicPart of speech tagging for Arabic
Part of speech tagging for Arabic
 
Arabic spell checking approaches
Arabic spell checking approachesArabic spell checking approaches
Arabic spell checking approaches
 
Arabic tokenization and stemming
Arabic tokenization and  stemmingArabic tokenization and  stemming
Arabic tokenization and stemming
 
Sentiment analysis of arabic,a survey
Sentiment analysis of arabic,a surveySentiment analysis of arabic,a survey
Sentiment analysis of arabic,a survey
 

Similar a Discourse annotation for arabic

DSL Construction rith Ruby
DSL Construction rith RubyDSL Construction rith Ruby
DSL Construction rith RubyThoughtWorks
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingTed Xiao
 
2023 EMNLP day_san.pptx
2023 EMNLP day_san.pptx2023 EMNLP day_san.pptx
2023 EMNLP day_san.pptxSan Kim
 
Comp app lexicography
Comp app lexicographyComp app lexicography
Comp app lexicographysyila239
 
Supporting the authoring process with linguistic software
Supporting the authoring process with linguistic softwareSupporting the authoring process with linguistic software
Supporting the authoring process with linguistic softwarevsrtwin
 
A Simple Walkthrough of Word Sense Disambiguation
A Simple Walkthrough of Word Sense DisambiguationA Simple Walkthrough of Word Sense Disambiguation
A Simple Walkthrough of Word Sense DisambiguationMaryOsborne11
 
Cork AI Meetup Number 3
Cork AI Meetup Number 3Cork AI Meetup Number 3
Cork AI Meetup Number 3Nick Grattan
 
LEXICOGRAPHY
LEXICOGRAPHY LEXICOGRAPHY
LEXICOGRAPHY mimisy
 
How to expand your nlp solution to new languages using transfer learning
How to expand your nlp solution to new languages using transfer learningHow to expand your nlp solution to new languages using transfer learning
How to expand your nlp solution to new languages using transfer learningLena Shakurova
 
2021-0509_JAECS2021_Spring
2021-0509_JAECS2021_Spring2021-0509_JAECS2021_Spring
2021-0509_JAECS2021_SpringMizumoto Atsushi
 
Context Free Grammar
Context Free GrammarContext Free Grammar
Context Free GrammarAkhil Kaushik
 
Fsmnlp presentation mohammed_attia
Fsmnlp presentation mohammed_attiaFsmnlp presentation mohammed_attia
Fsmnlp presentation mohammed_attiaMohammed Attia
 
Deep Learning and Modern Natural Language Processing (AnacondaCon2019)
Deep Learning and Modern Natural Language Processing (AnacondaCon2019)Deep Learning and Modern Natural Language Processing (AnacondaCon2019)
Deep Learning and Modern Natural Language Processing (AnacondaCon2019)Zachary S. Brown
 
Metaprogramming patterns
Metaprogramming patternsMetaprogramming patterns
Metaprogramming patternsGlenn Espinosa
 
Resource description framework
Resource description frameworkResource description framework
Resource description frameworkStanley Wang
 
Presentation curras paper-emnlp2014-final
Presentation curras paper-emnlp2014-finalPresentation curras paper-emnlp2014-final
Presentation curras paper-emnlp2014-finalMustafa Jarrar
 

Similar a Discourse annotation for arabic (20)

DSL Construction rith Ruby
DSL Construction rith RubyDSL Construction rith Ruby
DSL Construction rith Ruby
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language Processing
 
2023 EMNLP day_san.pptx
2023 EMNLP day_san.pptx2023 EMNLP day_san.pptx
2023 EMNLP day_san.pptx
 
Comp app lexicography
Comp app lexicographyComp app lexicography
Comp app lexicography
 
Supporting the authoring process with linguistic software
Supporting the authoring process with linguistic softwareSupporting the authoring process with linguistic software
Supporting the authoring process with linguistic software
 
A Simple Walkthrough of Word Sense Disambiguation
A Simple Walkthrough of Word Sense DisambiguationA Simple Walkthrough of Word Sense Disambiguation
A Simple Walkthrough of Word Sense Disambiguation
 
Cork AI Meetup Number 3
Cork AI Meetup Number 3Cork AI Meetup Number 3
Cork AI Meetup Number 3
 
LEXICOGRAPHY
LEXICOGRAPHY LEXICOGRAPHY
LEXICOGRAPHY
 
How to expand your nlp solution to new languages using transfer learning
How to expand your nlp solution to new languages using transfer learningHow to expand your nlp solution to new languages using transfer learning
How to expand your nlp solution to new languages using transfer learning
 
NLP_KASHK:Text Normalization
NLP_KASHK:Text NormalizationNLP_KASHK:Text Normalization
NLP_KASHK:Text Normalization
 
2021-0509_JAECS2021_Spring
2021-0509_JAECS2021_Spring2021-0509_JAECS2021_Spring
2021-0509_JAECS2021_Spring
 
Incrementality
IncrementalityIncrementality
Incrementality
 
OWL briefing
OWL briefingOWL briefing
OWL briefing
 
Context Free Grammar
Context Free GrammarContext Free Grammar
Context Free Grammar
 
Fsmnlp presentation mohammed_attia
Fsmnlp presentation mohammed_attiaFsmnlp presentation mohammed_attia
Fsmnlp presentation mohammed_attia
 
Deep Learning and Modern Natural Language Processing (AnacondaCon2019)
Deep Learning and Modern Natural Language Processing (AnacondaCon2019)Deep Learning and Modern Natural Language Processing (AnacondaCon2019)
Deep Learning and Modern Natural Language Processing (AnacondaCon2019)
 
Metaprogramming patterns
Metaprogramming patternsMetaprogramming patterns
Metaprogramming patterns
 
What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?
 
Resource description framework
Resource description frameworkResource description framework
Resource description framework
 
Presentation curras paper-emnlp2014-final
Presentation curras paper-emnlp2014-finalPresentation curras paper-emnlp2014-final
Presentation curras paper-emnlp2014-final
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 

Último (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 

Discourse annotation for arabic

  • 1. Discourse Annotation for Arabic Arwa Al-Zammam, Ruba Al-Homaid, Eman Al-Badr Supervisor: Amal Al-Saif Natural Language Processing - CS465 11-6-1434 H
  • 2. Outline • Leeds Arabic Discourse Treebank • Discourse Annotation • Arabic language characteristics • Discourse relations • Characteristics of Modern Standard Arabic • Arabic Discourse Connectives • Agreement Studies • Discourse Connective Recognition • Result of Discourse Connective Recognition • Discourse Relation Recognition • Result of Discourse Relation Recognition • Conclusion
  • 3. Leeds Arabic Discourse Treebank • The Leeds Arabic Discourse Treebank LADTB v1 is the first discourse Treebank for MSA • LADTB has similar annotation principles as PDTB project for English, Turkish, Hindi and Chinese discourse TB • Although LADTB was built to be a gold standard for automatic discourse processing studies
  • 4. Discourse Annotation • Discourse relations such as CAUSAL or CONTRAST relations between textual units play an important role in producing a coherent discourse. • In defining discourse connectives as lexical expressions that relate two text segments (arguments) that express abstract entities such as events, belief, facts or propositions ( /lkn/but, /Aw/or). Contrast Causal
  • 5. Discourse Annotation • Applications using discourse annotation: • Automatic summarization • Question answering • Sentiment analysis • Readability assessment
  • 6. • Arabic discourse connectives are ambiguity. • Explicit discourse connectives. • The variety of Arabic discourse connectives. • The annotation principles designed to annotate discourse connectives in English in the PDTB2, can be applied to reliably annotate discourse connectives in Arabic newswire. • Machine learning models can be used to identify discourse connectives and relations in Arabic newswire. • Supervised machine learning models can identify Arabic discourse connectives and their relations with high reliability. Arabic Language Characteristics
  • 7. Discourse Relations • Explicit discourse relations: [He took my photo,]Arg2 [while]DC [I was having dinner]Arg2 • Implicit discourse relations: [He has to stay in bed.]Arg1 [He has the flu.]Arg2
  • 8. Characteristics of Modern Standard Arabic
  • 9. Characteristics of Modern Standard Arabic Al-maSdar noun:
  • 10. Characteristics of Modern Standard Arabic EnglishAl-masdar nounMorph. PatternRoot swimmingSbh reflectionEks experimentJrb warHrb defenceDfe Al-maSdar noun:
  • 11. • Word order in Arabic. (verb –subject –object) • Punctuations in Arabic. Characteristics of Modern Standard Arabic
  • 12. Arabic Discourse Connectives • Conjunctions ( /lkn/but, /Aw/or or /w/and) • Adverbial ( /TAlmA.. f../as-long-as) • Prepositional phrases, prepositions also can link discourse segments when one or both arguments are al-maSdar nouns. some nouns such as ( /ntyjp/result, /ks.yp/fear and /bqyp/desire) are used as discourse connectives in Arabic. The discourse connectives in Arabic might occur: • Individually such as ( /lkn/however). • In conjunction with other connectives using the coordinating conjunction /w/and such as ( /lkn w qbl/however and before). • As multiple connectives without conjunction such as ( /AlA bEd/ except after).
  • 13. Agreement Studies • TASK I : measures whether annotators agree on the binary decision on whether an item constitutes a discourse connective in context. • TASK 2: measures whether annotators agree on which discourse relation an identified connective expresses. The agreement was measured for the distinction of discourse vs. non-discourse usage, relation assignment and argument assignment: agr(ann1||) ann2 = |ann1 matching ann2| / |ann1|
  • 14. Discourse Connective Recognition • Surface Features (SConn). • Lexical features of surrounding words (Lex). Arg1DCArg2 • Part of Speech features (POS). • Syntactic category of related phrases (Syn). non-discourse usage of w/and ( / ¯almdrsh kbyrh w ˇgmylh/ the school is very large and beautiful). • Al-Masdar feature.
  • 15. Result Acurr KFeatures 68.9 0Baseline (not conn) 75.7 0.48Conn onlyM1 Tokenization by white space + auto tagger 85.6 0.62Conn + SConn + LexM2 87.6 0.69Conn + SConn + Lex + POSM3 88.5 0.70Conn + SConn + Lex + POS + MasdarM4 ATB – based features 86.2 0.65Conn + SConn + LexM5 91.2 0.79Conn + SConn + Lex + Syn/POSM6 92.4 0.82Conn + Sconn + Lex + Syn/POS + MasdarM7 91.2 0.79Conn + Sconn + SynM8 91.2 0.79Sconn + Lex + Syn + MasderM9
  • 16. Discourse Relation Recognition • Words and POS of arguments. • Masdar. • Tense and Negation. • Length, Distance and Order Features. • Production Rules.
  • 17. Result Acurr KFeaturesRef All connectives (6039) 52.5 0Baseline (CONJUNCTION) 77.2 0.60Conn only (1)M1 78.8 0.66Conn + Conn_f + Arg_f (37)M2 78.3 0.65Conn + Conn_f + Arg_f + Production rules (1237) M3 Excluding wa at BOP (3813) 35 0Baseline (CONJUNCTION) 74.3 0.65Conn only (1)M1 77 0.69Conn + Conn_f + Arg_f (37)M2 76.7 0.69Conn + Conn_f + Arg_f + Production rules (1237) M3
  • 18. Result Acurr KFeaturesRef All connectives (6039) 62.4 0Baseline (EXPANSION) 88.7 0.78Conn only (1)M1 88.7 0.78Conn + Conn_f + Arg_f (37)M2 Excluding wa at BOP (3813) 41.8 0Baseline (EXPANSION) 82.7 0.74Conn only (1)M1 83.5 0.75Conn + Conn_f + Arg_f (37)M2
  • 19. Conclusion: We talked about Arabic discourse annotation; discourse connective and relations. We also show Arabic language characteristics which related to this subject and the result.