SlideShare una empresa de Scribd logo
1 de 28
Descargar para leer sin conexión
Contributions Dataset Models Experiments Conclusion
Weak Semi-Markov CRFs for NP Chunking in
Informal Text
Aldrian Obaja Muis and Wei Lu
Singapore University of Technology and Design
Contributions Dataset Models Experiments Conclusion
Paper Contributions
In this paper, we contributed:
1 Noun phrase-annotated SMS corpus1
1
Tao Chen and Min-Yen Kan (2013). “Creating a live, public short message
service corpus: the NUS SMS corpus”. In: Language Resources and
Evaluation. Vol. 47. Springer Netherlands, pp. 299–335.
2 / 13
Contributions Dataset Models Experiments Conclusion
Paper Contributions
In this paper, we contributed:
1 Noun phrase-annotated SMS corpus1
2 Weak semi-Markov CRF
1
Tao Chen and Min-Yen Kan (2013). “Creating a live, public short message
service corpus: the NUS SMS corpus”. In: Language Resources and
Evaluation. Vol. 47. Springer Netherlands, pp. 299–335.
2 / 13
Contributions Dataset Models Experiments Conclusion
NP-annotated SMS Corpus
3 / 13
Contributions Dataset Models Experiments Conclusion
NP-annotated SMS Corpus
We used Brat Rapid Annotation Tool (BRAT)2 for annotations,
recruiting undergraduate students to annotate the noun phrases.
2
http://brat.nlplab.org/
4 / 13
Contributions Dataset Models Experiments Conclusion
NP-annotated SMS Corpus
We used Brat Rapid Annotation Tool (BRAT)2 for annotations,
recruiting undergraduate students to annotate the noun phrases.
Examples:
2
http://brat.nlplab.org/
4 / 13
Contributions Dataset Models Experiments Conclusion
NP-annotated SMS Corpus
We used Brat Rapid Annotation Tool (BRAT)2 for annotations,
recruiting undergraduate students to annotate the noun phrases.
Examples:
2
http://brat.nlplab.org/
4 / 13
Contributions Dataset Models Experiments Conclusion
Annotations Statistics
64 annotators
5 / 13
Contributions Dataset Models Experiments Conclusion
Annotations Statistics
64 annotators
26,500 SMS messages
5 / 13
Contributions Dataset Models Experiments Conclusion
Annotations Statistics
64 annotators
26,500 SMS messages
76,490 noun phrases
5 / 13
Contributions Dataset Models Experiments Conclusion
Annotations Statistics
64 annotators
26,500 SMS messages
76,490 noun phrases
359,009 tokens
5 / 13
Contributions Dataset Models Experiments Conclusion
Models
6 / 13
Contributions Dataset Models Experiments Conclusion
Models Comparison
n : # words in the sentence, |Y| : # labels, L : max segment length
B B B
I I I
O O O
said Dr Teh
Fig. 1: Linear CRF: O(n|Y|2
)
7 / 13
Contributions Dataset Models Experiments Conclusion
Models Comparison
n : # words in the sentence, |Y| : # labels, L : max segment length
B B B
I I I
O O O
said Dr Teh
Fig. 1: Linear CRF: O(n|Y|2
)
7 / 13
Contributions Dataset Models Experiments Conclusion
Models Comparison
n : # words in the sentence, |Y| : # labels, L : max segment length
B B B
I I I
O O O
said Dr Teh
Fig. 1: Linear CRF: O(n|Y|2
)
7 / 13
Contributions Dataset Models Experiments Conclusion
Models Comparison
n : # words in the sentence, |Y| : # labels, L : max segment length
B B B
I I I
O O O
said Dr Teh
Fig. 1: Linear CRF: O(n|Y|2
)
N N N
O O O
said Dr Teh
Fig. 2: Semi-CRF: O(nL |Y|
2
)
7 / 13
Contributions Dataset Models Experiments Conclusion
Models Comparison
n : # words in the sentence, |Y| : # labels, L : max segment length
B B B
I I I
O O O
said Dr Teh
Fig. 1: Linear CRF: O(n|Y|2
)
N N N
O O O
said Dr Teh
Fig. 2: Semi-CRF: O(nL |Y|
2
)
7 / 13
Contributions Dataset Models Experiments Conclusion
Models Comparison
n : # words in the sentence, |Y| : # labels, L : max segment length
B B B
I I I
O O O
said Dr Teh
Fig. 1: Linear CRF: O(n|Y|2
)
N N N
O O O
said Dr Teh
Fig. 2: Semi-CRF: O(nL |Y|
2
)
N N NN N N
O O OO O O
said Dr Teh
Fig. 3: Weak Semi-CRF: O(n |Y|
2
+ nL |Y|)
7 / 13
Contributions Dataset Models Experiments Conclusion
Models Comparison
n : # words in the sentence, |Y| : # labels, L : max segment length
B B B
I I I
O O O
said Dr Teh
Fig. 1: Linear CRF: O(n|Y|2
)
N N N
O O O
said Dr Teh
Fig. 2: Semi-CRF: O(nL |Y|
2
)
N N NN N N
O O OO O O
said Dr Teh
Fig. 3: Weak Semi-CRF: O(n |Y|
2
+ nL |Y|)
7 / 13
Contributions Dataset Models Experiments Conclusion
Models Comparison
n : # words in the sentence, |Y| : # labels, L : max segment length
B B B
I I I
O O O
said Dr Teh
Fig. 1: Linear CRF: O(n|Y|2
)
N N N
O O O
said Dr Teh
Fig. 2: Semi-CRF: O(nL |Y|
2
)
N N NN N N
O O OO O O
said Dr Teh
Fig. 3: Weak Semi-CRF: O(n |Y|
2
+ nL |Y|)
7 / 13
Contributions Dataset Models Experiments Conclusion
Models Comparison
n : # words in the sentence, |Y| : # labels, L : max segment length
B B B
I I I
O O O
said Dr Teh
Fig. 1: Linear CRF: O(n|Y|2
)
N N N
O O O
said Dr Teh
Fig. 2: Semi-CRF: O(nL |Y|
2
)
N N NN N N
O O OO O O
said Dr Teh
Fig. 3: Weak Semi-CRF: O(n |Y|
2
+ nL |Y|)
7 / 13
Contributions Dataset Models Experiments Conclusion
Empirical Verification
8 / 13
Contributions Dataset Models Experiments Conclusion
F1-Score
Basic features +affixes All features
50
60
70
80
71.19
72.49 72.68
74.37 74.69 74.5874.39 74.60 74.31
F1-Score(%)
Linear CRF Semi-CRF Weak Semi-CRF
9 / 13
Contributions Dataset Models Experiments Conclusion
Training Speed
5,000 10,000 15,000 20,000
0.5
1
1.5
2
# training instances (SMS)
Avg.timeperiteration(s)
Linear-CRF
Semi-CRF
Weak Semi-CRF
10 / 13
Contributions Dataset Models Experiments Conclusion
Conclusion
11 / 13
Contributions Dataset Models Experiments Conclusion
Conclusion
We have created a new NP-annotated dataset on informal text
12 / 13
Contributions Dataset Models Experiments Conclusion
Conclusion
We have created a new NP-annotated dataset on informal text
We can split the decisions of selecting segment length and
segment type to improve the training time, while maintaining
similar accuracy
12 / 13
Contributions Dataset Models Experiments Conclusion
Thank You
Code and data available at:
http://statnlp.org/research/ie/
Aldrian Obaja Muis and Wei Lu
Singapore University of Technology and Design
13 / 13

Más contenido relacionado

Similar a Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text

OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
mathsjournal
 
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
mathsjournal
 
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
mathsjournal
 
Drosophila Three-Point Test Cross Lab Write-Up Instructions.docx
Drosophila Three-Point Test Cross Lab Write-Up Instructions.docxDrosophila Three-Point Test Cross Lab Write-Up Instructions.docx
Drosophila Three-Point Test Cross Lab Write-Up Instructions.docx
harold7fisher61282
 
A combined-conventional-and-differential-evolution-method-for-model-order-red...
A combined-conventional-and-differential-evolution-method-for-model-order-red...A combined-conventional-and-differential-evolution-method-for-model-order-red...
A combined-conventional-and-differential-evolution-method-for-model-order-red...
Cemal Ardil
 

Similar a Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text (20)

L3 v2
L3 v2L3 v2
L3 v2
 
Phonetic distance based accent
Phonetic distance based accentPhonetic distance based accent
Phonetic distance based accent
 
NAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURES
NAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURESNAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURES
NAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURES
 
Modeling Text Independent Speaker Identification with Vector Quantization
Modeling Text Independent Speaker Identification with Vector QuantizationModeling Text Independent Speaker Identification with Vector Quantization
Modeling Text Independent Speaker Identification with Vector Quantization
 
Subjective comparison of_speech_enhancement_algori (1)
Subjective comparison of_speech_enhancement_algori (1)Subjective comparison of_speech_enhancement_algori (1)
Subjective comparison of_speech_enhancement_algori (1)
 
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
 
NLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology ConstraintsNLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology Constraints
 
Automatic Profiling Of Learner Texts
Automatic Profiling Of Learner TextsAutomatic Profiling Of Learner Texts
Automatic Profiling Of Learner Texts
 
Comparative performance analysis of channel normalization techniques
Comparative performance analysis of channel normalization techniquesComparative performance analysis of channel normalization techniques
Comparative performance analysis of channel normalization techniques
 
2019 acl bio_nlp_nli_surf_poster
2019 acl bio_nlp_nli_surf_poster2019 acl bio_nlp_nli_surf_poster
2019 acl bio_nlp_nli_surf_poster
 
SBML FOR OPTIMIZING DECISION SUPPORT'S TOOLS
SBML FOR OPTIMIZING DECISION SUPPORT'S TOOLS SBML FOR OPTIMIZING DECISION SUPPORT'S TOOLS
SBML FOR OPTIMIZING DECISION SUPPORT'S TOOLS
 
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
 
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
 
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
 
Drosophila Three-Point Test Cross Lab Write-Up Instructions.docx
Drosophila Three-Point Test Cross Lab Write-Up Instructions.docxDrosophila Three-Point Test Cross Lab Write-Up Instructions.docx
Drosophila Three-Point Test Cross Lab Write-Up Instructions.docx
 
A combined-conventional-and-differential-evolution-method-for-model-order-red...
A combined-conventional-and-differential-evolution-method-for-model-order-red...A combined-conventional-and-differential-evolution-method-for-model-order-red...
A combined-conventional-and-differential-evolution-method-for-model-order-red...
 
SBML FOR OPTIMIZING DECISION SUPPORT'S TOOLS
SBML FOR OPTIMIZING DECISION SUPPORT'S TOOLSSBML FOR OPTIMIZING DECISION SUPPORT'S TOOLS
SBML FOR OPTIMIZING DECISION SUPPORT'S TOOLS
 
Genetic Programming for Generating Prototypes in Classification Problems
Genetic Programming for Generating Prototypes in Classification ProblemsGenetic Programming for Generating Prototypes in Classification Problems
Genetic Programming for Generating Prototypes in Classification Problems
 
(2013) A Trade-off Between Number of Impressions and Number of Interaction At...
(2013) A Trade-off Between Number of Impressions and Number of Interaction At...(2013) A Trade-off Between Number of Impressions and Number of Interaction At...
(2013) A Trade-off Between Number of Impressions and Number of Interaction At...
 
2-IJCSE-00536
2-IJCSE-005362-IJCSE-00536
2-IJCSE-00536
 

Más de Association for Computational Linguistics

Más de Association for Computational Linguistics (20)

Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
 
Castro - 2018 - A Crowd-Annotated Spanish Corpus for Humour Analysis
Castro - 2018 - A Crowd-Annotated Spanish Corpus for Humour AnalysisCastro - 2018 - A Crowd-Annotated Spanish Corpus for Humour Analysis
Castro - 2018 - A Crowd-Annotated Spanish Corpus for Humour Analysis
 
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
 
Elior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification
Elior Sulem - 2018 - Semantic Structural Evaluation for Text SimplificationElior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification
Elior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification
 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
 
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
 
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
 
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
 
Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
 
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
 
Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015
 
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
 
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
 

Último

1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
Chris Hunter
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 

Último (20)

General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 

Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text

  • 1. Contributions Dataset Models Experiments Conclusion Weak Semi-Markov CRFs for NP Chunking in Informal Text Aldrian Obaja Muis and Wei Lu Singapore University of Technology and Design
  • 2. Contributions Dataset Models Experiments Conclusion Paper Contributions In this paper, we contributed: 1 Noun phrase-annotated SMS corpus1 1 Tao Chen and Min-Yen Kan (2013). “Creating a live, public short message service corpus: the NUS SMS corpus”. In: Language Resources and Evaluation. Vol. 47. Springer Netherlands, pp. 299–335. 2 / 13
  • 3. Contributions Dataset Models Experiments Conclusion Paper Contributions In this paper, we contributed: 1 Noun phrase-annotated SMS corpus1 2 Weak semi-Markov CRF 1 Tao Chen and Min-Yen Kan (2013). “Creating a live, public short message service corpus: the NUS SMS corpus”. In: Language Resources and Evaluation. Vol. 47. Springer Netherlands, pp. 299–335. 2 / 13
  • 4. Contributions Dataset Models Experiments Conclusion NP-annotated SMS Corpus 3 / 13
  • 5. Contributions Dataset Models Experiments Conclusion NP-annotated SMS Corpus We used Brat Rapid Annotation Tool (BRAT)2 for annotations, recruiting undergraduate students to annotate the noun phrases. 2 http://brat.nlplab.org/ 4 / 13
  • 6. Contributions Dataset Models Experiments Conclusion NP-annotated SMS Corpus We used Brat Rapid Annotation Tool (BRAT)2 for annotations, recruiting undergraduate students to annotate the noun phrases. Examples: 2 http://brat.nlplab.org/ 4 / 13
  • 7. Contributions Dataset Models Experiments Conclusion NP-annotated SMS Corpus We used Brat Rapid Annotation Tool (BRAT)2 for annotations, recruiting undergraduate students to annotate the noun phrases. Examples: 2 http://brat.nlplab.org/ 4 / 13
  • 8. Contributions Dataset Models Experiments Conclusion Annotations Statistics 64 annotators 5 / 13
  • 9. Contributions Dataset Models Experiments Conclusion Annotations Statistics 64 annotators 26,500 SMS messages 5 / 13
  • 10. Contributions Dataset Models Experiments Conclusion Annotations Statistics 64 annotators 26,500 SMS messages 76,490 noun phrases 5 / 13
  • 11. Contributions Dataset Models Experiments Conclusion Annotations Statistics 64 annotators 26,500 SMS messages 76,490 noun phrases 359,009 tokens 5 / 13
  • 12. Contributions Dataset Models Experiments Conclusion Models 6 / 13
  • 13. Contributions Dataset Models Experiments Conclusion Models Comparison n : # words in the sentence, |Y| : # labels, L : max segment length B B B I I I O O O said Dr Teh Fig. 1: Linear CRF: O(n|Y|2 ) 7 / 13
  • 14. Contributions Dataset Models Experiments Conclusion Models Comparison n : # words in the sentence, |Y| : # labels, L : max segment length B B B I I I O O O said Dr Teh Fig. 1: Linear CRF: O(n|Y|2 ) 7 / 13
  • 15. Contributions Dataset Models Experiments Conclusion Models Comparison n : # words in the sentence, |Y| : # labels, L : max segment length B B B I I I O O O said Dr Teh Fig. 1: Linear CRF: O(n|Y|2 ) 7 / 13
  • 16. Contributions Dataset Models Experiments Conclusion Models Comparison n : # words in the sentence, |Y| : # labels, L : max segment length B B B I I I O O O said Dr Teh Fig. 1: Linear CRF: O(n|Y|2 ) N N N O O O said Dr Teh Fig. 2: Semi-CRF: O(nL |Y| 2 ) 7 / 13
  • 17. Contributions Dataset Models Experiments Conclusion Models Comparison n : # words in the sentence, |Y| : # labels, L : max segment length B B B I I I O O O said Dr Teh Fig. 1: Linear CRF: O(n|Y|2 ) N N N O O O said Dr Teh Fig. 2: Semi-CRF: O(nL |Y| 2 ) 7 / 13
  • 18. Contributions Dataset Models Experiments Conclusion Models Comparison n : # words in the sentence, |Y| : # labels, L : max segment length B B B I I I O O O said Dr Teh Fig. 1: Linear CRF: O(n|Y|2 ) N N N O O O said Dr Teh Fig. 2: Semi-CRF: O(nL |Y| 2 ) N N NN N N O O OO O O said Dr Teh Fig. 3: Weak Semi-CRF: O(n |Y| 2 + nL |Y|) 7 / 13
  • 19. Contributions Dataset Models Experiments Conclusion Models Comparison n : # words in the sentence, |Y| : # labels, L : max segment length B B B I I I O O O said Dr Teh Fig. 1: Linear CRF: O(n|Y|2 ) N N N O O O said Dr Teh Fig. 2: Semi-CRF: O(nL |Y| 2 ) N N NN N N O O OO O O said Dr Teh Fig. 3: Weak Semi-CRF: O(n |Y| 2 + nL |Y|) 7 / 13
  • 20. Contributions Dataset Models Experiments Conclusion Models Comparison n : # words in the sentence, |Y| : # labels, L : max segment length B B B I I I O O O said Dr Teh Fig. 1: Linear CRF: O(n|Y|2 ) N N N O O O said Dr Teh Fig. 2: Semi-CRF: O(nL |Y| 2 ) N N NN N N O O OO O O said Dr Teh Fig. 3: Weak Semi-CRF: O(n |Y| 2 + nL |Y|) 7 / 13
  • 21. Contributions Dataset Models Experiments Conclusion Models Comparison n : # words in the sentence, |Y| : # labels, L : max segment length B B B I I I O O O said Dr Teh Fig. 1: Linear CRF: O(n|Y|2 ) N N N O O O said Dr Teh Fig. 2: Semi-CRF: O(nL |Y| 2 ) N N NN N N O O OO O O said Dr Teh Fig. 3: Weak Semi-CRF: O(n |Y| 2 + nL |Y|) 7 / 13
  • 22. Contributions Dataset Models Experiments Conclusion Empirical Verification 8 / 13
  • 23. Contributions Dataset Models Experiments Conclusion F1-Score Basic features +affixes All features 50 60 70 80 71.19 72.49 72.68 74.37 74.69 74.5874.39 74.60 74.31 F1-Score(%) Linear CRF Semi-CRF Weak Semi-CRF 9 / 13
  • 24. Contributions Dataset Models Experiments Conclusion Training Speed 5,000 10,000 15,000 20,000 0.5 1 1.5 2 # training instances (SMS) Avg.timeperiteration(s) Linear-CRF Semi-CRF Weak Semi-CRF 10 / 13
  • 25. Contributions Dataset Models Experiments Conclusion Conclusion 11 / 13
  • 26. Contributions Dataset Models Experiments Conclusion Conclusion We have created a new NP-annotated dataset on informal text 12 / 13
  • 27. Contributions Dataset Models Experiments Conclusion Conclusion We have created a new NP-annotated dataset on informal text We can split the decisions of selecting segment length and segment type to improve the training time, while maintaining similar accuracy 12 / 13
  • 28. Contributions Dataset Models Experiments Conclusion Thank You Code and data available at: http://statnlp.org/research/ie/ Aldrian Obaja Muis and Wei Lu Singapore University of Technology and Design 13 / 13