SlideShare una empresa de Scribd logo
1 de 21
Descargar para leer sin conexión
A Low Dimensionality Representation
for Language Variety Identification
17th
International Conference on Intelligent Text Processing and Computational Linguistics
April 3–9, 2016 • Konya, Turkey
Francisco Rangel1,2
, Marc Franco-Salvador1
, Paolo Rosso1
francisco.rangel@autoritas.es, mfranco@prhlt.upv.es, prosso@dsic.upv.es
1
Universitat Politècnica de València, Spain
2
Autoritas Consulting, Spain
Language variety identification aims to detect linguistic variations in
order to classify different varieties of the same language.
Language variety identification may be considered an author profiling
task, besides a classification one, because the cultural idiosyncrasies
may influence the way users use the language (e.g. different expressions,
vocabulary…).
Introduction
- Introduction
- Related work
- Low Dimensionality Representation
- Evaluation framework
- Results and discussion
- Conclusions and future work
An example
The same sentence in different varieties of Spanish:
- Introduction
- Related work
- Low Dimensionality Representation
- Evaluation framework
- Results and discussion
- Conclusions and future work
Related Work
Tasks on language variety identification:
- Workshop on Language Technology for Closely Related Languages and
Language Variants at EMNLP 2014
- VarDial Workshop - Applying NLP Tools to Similar Languages, Varieties
and Dialects at COLING 2014
- T4VarDial - Joint Workshop on Language Technology for Closely Related
Languages, Varieties and Dialects (DSL) shared task (Zampieri et al.,
2014 and 2015) at RANLP 2015
- Introduction
- Related work
- Low Dimensionality Representation
- Evaluation framework
- Results and discussion
- Conclusions and future work
Related Work
Authors Varieties Media Features Algorithm Evaluation Accuracy
Zampieri and
Gebre (2012)
Portuguese News word and
character n-
grams
Probability
distributions
with log-
likelihood
50-50 split ~90%
Sadat et al.
(2014)
Arabic Blogs
Fora
character n-
grams
Support
Vector
Machines
10-fold cross-
validation
70-80%
Maier and Gómez-
Rodríguez (2014)
Spanish Twitter character n-
grams; LZW;
syllable-based
language
models
Meta-learning cross-validation 60-70%
- Introduction
- Related work
- Low Dimensionality Representation
- Evaluation framework
- Results and discussion
- Conclusions and future work
To discriminate between different varieties of the same language, but with
the following differences:
- We focus on different varieties of Spanish, although we tested our
approach also with a different set of languages.
- Instead of n-gram based representations, we propose a low
dimensionality representation which is helpful when dealing with big
data in social media.
- We evaluate the proposed method with an independent test set
generated from different authors in order to reduce possible overfitting.
- We make available our dataset to the research community. (https:
//github.com/autoritas/RD-Lab/tree/master/data/HispaBlogs)
Objective
Step 1. Term-frequency - inverse document frequency (tf-idf) matrix:
Step 2. Class-dependent term weighting:
Step 3. Class-dependent document representation:
Low dimensionality representation (LDR)
- Each column is a vocabulary term t
- Each row is a document d
- wij is the tf-idf weight of the term j in the document i
- represents the assigned class c to the document i
- Introduction
- Related work
- Low Dimensionality Representation
- Evaluation framework
- Results and discussion
- Conclusions and future work
Low dimensionality representation (LDR)
Meaning of the measures
- Introduction
- Related work
- Low Dimensionality Representation
- Evaluation framework
- Results and discussion
- Conclusions and future work
Alternative representations
- We use the common state-of-the-art representations based on n-grams. We
iterated n from 1 to 10, and selected the 1000, 5000 and 10000 most
frequent n-grams. The best results were obtained with:
- character 4-grams; the 10,000 most frequent
- word 1-gram (bag-of-words); the 10,000 most frequent
- word 2-grams; the 10,000 highest tf-idf
- Two variations of the continuous Skip-gram model (Mikolov et al.):
- Skip-grams
- Sentence Vectors
Maximizing the average of the log probability: Using the negative sampling estimator:
- Introduction
- Related work
- Low Dimensionality Representation
- Evaluation framework
- Results and discussion
- Conclusions and future work
Hispablogs dataset
Hispablogs dataset
https://github.com/autoritas/RD-Lab/tree/master/data/HispaBlogs
- - Completely
independent authors
between training and
test sets
- - Manually collected
by social media
experts of Autoritas
- Introduction
- Related work
- Low Dimensionality Representation
- Evaluation framework
- Results and discussion
- Conclusions and future work
Machine learning algorithms comparison
Accuracy results with different machine learning algorithms
Significance of the results wrt. the two
systems with the highest performance
- Introduction
- Related work
- Low Dimensionality Representation
- Evaluation framework
- Results and discussion
- Conclusions and future work
Preprocessing impact
Accuracy obtained after removing words with frequency equal or lower than n
(a) Continuous scale (b) Non-continuous scale
- Introduction
- Related work
- Low Dimensionality Representation
- Evaluation framework
- Results and discussion
- Conclusions and future work
Preprocessing impact
Number of words after removing those with frequency equal or lower than n, and
some examples of very infrequent words.
- Introduction
- Related work
- Low Dimensionality Representation
- Evaluation framework
- Results and discussion
- Conclusions and future work
Classification results
Accuracy results per representation
*
**
*
**
- Introduction
- Related work
- Low Dimensionality Representation
- Evaluation framework
- Results and discussion
- Conclusions and future work
Error analysis
Confusion matrix of the 5-class
classification
F1 values for identification as the
corresponding language variety vs. others
- Introduction
- Related work
- Low Dimensionality Representation
- Evaluation framework
- Results and discussion
- Conclusions and future work
Most discriminating features
Features sorted by Information Gain
- Introduction
- Related work
- Low Dimensionality Representation
- Evaluation framework
- Results and discussion
- Conclusions and future work
Most discriminating features
Accuracy obtained with different combinations of features
- Introduction
- Related work
- Low Dimensionality Representation
- Evaluation framework
- Results and discussion
- Conclusions and future work
Complexity of obtaining the features:
Number of features:
Cost analysis
Representation # Features
LDR 30
Skip-gram 300
SenVec 300
BOW 10,000
Char 4-grams 10,000
tf-idf 2-grams 10,000
l: number of varieties
n: number of terms of the document
m: number of terms in the document that
coincides with some term in the vocabulary
n m & l<<n
- Introduction
- Related work
- Low Dimensionality Representation
- Evaluation framework
- Results and discussion
- Conclusions and future work
Results obtained with
the development set of
the DSLCC corpus
from the
Discriminating
between Similar
Languages task
(2015)
NOTE: Significant results in bold
Robustness
Conclusions and future work
LDR outperforms common state-of-the-art representations by 35%
increase in accuracy.
LDR obtains competitive results compared with two distributed
representation-based approaches that employed the popular continuous
Skip-gram model.
LDR remains competitive with different languages and media (DSLCC).
The dimensionality reduction is from thousands to only 6 features per
language variety. This allows to deal with big data in social media.
We have applied LDR to age and gender identification and we plan to
apply LDR to personality recognition.
- Introduction
- Related work
- Low Dimensionality Representation
- Evaluation framework
- Results and discussion
- Conclusions and future work
Thank you very much!
Francisco Rangel1,2
, Marc Franco-Salvador1
, Paolo Rosso1
francisco.rangel@autoritas.es, mfranco@prhlt.upv.es, prosso@dsic.upv.es
Interested in digital text forensics (author profiling, authorship
identification, author obfuscation)?
Do not hesitate and participate in the PAN laboratory!!
http://pan.webis.de/
1
Universitat Politècnica de València, Spain
2
Autoritas Consulting, Spain

Más contenido relacionado

La actualidad más candente

Detecting and Describing Historical Periods in a Large Corpora
Detecting and Describing Historical Periods in a Large CorporaDetecting and Describing Historical Periods in a Large Corpora
Detecting and Describing Historical Periods in a Large CorporaTraian Rebedea
 
"Thinking in English" information structures task array
"Thinking in English" information structures task array"Thinking in English" information structures task array
"Thinking in English" information structures task arrayLawrie Hunter
 
Publish perish as an instruction-end learning opportunity
Publish perish as an instruction-end learning opportunityPublish perish as an instruction-end learning opportunity
Publish perish as an instruction-end learning opportunityLawrie Hunter
 
SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection
SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological InflectionSIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection
SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological InflectionKaterina Vylomova
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingSeonghyun Kim
 
Lecture 2: Computational Semantics
Lecture 2: Computational SemanticsLecture 2: Computational Semantics
Lecture 2: Computational SemanticsMarina Santini
 
Word representations in vector space
Word representations in vector spaceWord representations in vector space
Word representations in vector spaceAbdullah Khan Zehady
 
Languages, Ontologies and Automatic Grammar Generation - Prof. Pedro Rangel H...
Languages, Ontologies and Automatic Grammar Generation - Prof. Pedro Rangel H...Languages, Ontologies and Automatic Grammar Generation - Prof. Pedro Rangel H...
Languages, Ontologies and Automatic Grammar Generation - Prof. Pedro Rangel H...Facultad de Informática UCM
 
Invisible structures of technical writing
Invisible structures of technical writingInvisible structures of technical writing
Invisible structures of technical writingLawrie Hunter
 
Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...
Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...
Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...Marcin Junczys-Dowmunt
 
Enriching Word Vectors with Subword Information
Enriching Word Vectors with Subword InformationEnriching Word Vectors with Subword Information
Enriching Word Vectors with Subword InformationSeonghyun Kim
 
Introduction to Ontology Engineering with Fluent Editor 2014
Introduction to Ontology Engineering with Fluent Editor 2014Introduction to Ontology Engineering with Fluent Editor 2014
Introduction to Ontology Engineering with Fluent Editor 2014Cognitum
 
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...Lviv Data Science Summer School
 
What can typological knowledge bases and language representations tell us abo...
What can typological knowledge bases and language representations tell us abo...What can typological knowledge bases and language representations tell us abo...
What can typological knowledge bases and language representations tell us abo...Isabelle Augenstein
 
Lecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language TechnologyLecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language TechnologyMarina Santini
 
PPT slides
PPT slidesPPT slides
PPT slidesbutest
 
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...Lifeng (Aaron) Han
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role LabelingMarina Santini
 

La actualidad más candente (20)

Detecting and Describing Historical Periods in a Large Corpora
Detecting and Describing Historical Periods in a Large CorporaDetecting and Describing Historical Periods in a Large Corpora
Detecting and Describing Historical Periods in a Large Corpora
 
"Thinking in English" information structures task array
"Thinking in English" information structures task array"Thinking in English" information structures task array
"Thinking in English" information structures task array
 
AINL 2016: Yagunova
AINL 2016: YagunovaAINL 2016: Yagunova
AINL 2016: Yagunova
 
Publish perish as an instruction-end learning opportunity
Publish perish as an instruction-end learning opportunityPublish perish as an instruction-end learning opportunity
Publish perish as an instruction-end learning opportunity
 
SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection
SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological InflectionSIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection
SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Lecture 2: Computational Semantics
Lecture 2: Computational SemanticsLecture 2: Computational Semantics
Lecture 2: Computational Semantics
 
Word representations in vector space
Word representations in vector spaceWord representations in vector space
Word representations in vector space
 
Languages, Ontologies and Automatic Grammar Generation - Prof. Pedro Rangel H...
Languages, Ontologies and Automatic Grammar Generation - Prof. Pedro Rangel H...Languages, Ontologies and Automatic Grammar Generation - Prof. Pedro Rangel H...
Languages, Ontologies and Automatic Grammar Generation - Prof. Pedro Rangel H...
 
Invisible structures of technical writing
Invisible structures of technical writingInvisible structures of technical writing
Invisible structures of technical writing
 
Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...
Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...
Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...
 
Lecture: Word Senses
Lecture: Word SensesLecture: Word Senses
Lecture: Word Senses
 
Enriching Word Vectors with Subword Information
Enriching Word Vectors with Subword InformationEnriching Word Vectors with Subword Information
Enriching Word Vectors with Subword Information
 
Introduction to Ontology Engineering with Fluent Editor 2014
Introduction to Ontology Engineering with Fluent Editor 2014Introduction to Ontology Engineering with Fluent Editor 2014
Introduction to Ontology Engineering with Fluent Editor 2014
 
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
 
What can typological knowledge bases and language representations tell us abo...
What can typological knowledge bases and language representations tell us abo...What can typological knowledge bases and language representations tell us abo...
What can typological knowledge bases and language representations tell us abo...
 
Lecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language TechnologyLecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language Technology
 
PPT slides
PPT slidesPPT slides
PPT slides
 
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role Labeling
 

Destacado

UNIT # 6 LANGUAGE AND CULTURAL IDENTITY __STANDARD LANGUAGE / TOTEM CULTURE
UNIT # 6 LANGUAGE AND CULTURAL IDENTITY __STANDARD LANGUAGE / TOTEM CULTUREUNIT # 6 LANGUAGE AND CULTURAL IDENTITY __STANDARD LANGUAGE / TOTEM CULTURE
UNIT # 6 LANGUAGE AND CULTURAL IDENTITY __STANDARD LANGUAGE / TOTEM CULTUREKarlaQuishpe
 
Language Change Part 1
Language Change Part 1Language Change Part 1
Language Change Part 1suasenglish
 
Language change timeline
Language change timelineLanguage change timeline
Language change timelineRobertagillum
 
Language Change Part 2: Labov Studies
Language Change Part 2: Labov StudiesLanguage Change Part 2: Labov Studies
Language Change Part 2: Labov Studiessuasenglish
 
Style Register and Dialect
Style Register and DialectStyle Register and Dialect
Style Register and DialectSidra Shahid
 
Types of language change
Types of language changeTypes of language change
Types of language changeMariam Bedraoui
 
Language varieties, dialect, register and style
Language varieties, dialect, register and styleLanguage varieties, dialect, register and style
Language varieties, dialect, register and styleM Ahlan Firdaus
 

Destacado (12)

Language variety #1
Language variety #1Language variety #1
Language variety #1
 
UNIT # 6 LANGUAGE AND CULTURAL IDENTITY __STANDARD LANGUAGE / TOTEM CULTURE
UNIT # 6 LANGUAGE AND CULTURAL IDENTITY __STANDARD LANGUAGE / TOTEM CULTUREUNIT # 6 LANGUAGE AND CULTURAL IDENTITY __STANDARD LANGUAGE / TOTEM CULTURE
UNIT # 6 LANGUAGE AND CULTURAL IDENTITY __STANDARD LANGUAGE / TOTEM CULTURE
 
Language Change Part 1
Language Change Part 1Language Change Part 1
Language Change Part 1
 
Language change timeline
Language change timelineLanguage change timeline
Language change timeline
 
language, dialect, varietes
language, dialect, varieteslanguage, dialect, varietes
language, dialect, varietes
 
Language Change Part 2: Labov Studies
Language Change Part 2: Labov StudiesLanguage Change Part 2: Labov Studies
Language Change Part 2: Labov Studies
 
Language Change
Language ChangeLanguage Change
Language Change
 
Style Register and Dialect
Style Register and DialectStyle Register and Dialect
Style Register and Dialect
 
secret of words
secret of wordssecret of words
secret of words
 
Types of language change
Types of language changeTypes of language change
Types of language change
 
Language varieties, dialect, register and style
Language varieties, dialect, register and styleLanguage varieties, dialect, register and style
Language varieties, dialect, register and style
 
Language change
Language changeLanguage change
Language change
 

Similar a A Low Dimensionality Representation for Language Variety Identification (CICLing 2016)

Modern Programming Languages classification Poster
Modern Programming Languages classification PosterModern Programming Languages classification Poster
Modern Programming Languages classification PosterSaulo Aguiar
 
Applying static code analysis for domain-specific languages
Applying static code analysis for domain-specific languagesApplying static code analysis for domain-specific languages
Applying static code analysis for domain-specific languagesIván Ruiz-Rube
 
ODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPindico data
 
PL Lecture 01 - preliminaries
PL Lecture 01 - preliminariesPL Lecture 01 - preliminaries
PL Lecture 01 - preliminariesSchwannden Kuo
 
Bridging the gap between AI and UI - DSI Vienna - full version
Bridging the gap between AI and UI - DSI Vienna - full versionBridging the gap between AI and UI - DSI Vienna - full version
Bridging the gap between AI and UI - DSI Vienna - full versionLiad Magen
 
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.Lifeng (Aaron) Han
 
Analysing the concept of quality in model-driven engineering literature: a sy...
Analysing the concept of quality in model-driven engineering literature: a sy...Analysing the concept of quality in model-driven engineering literature: a sy...
Analysing the concept of quality in model-driven engineering literature: a sy...Fáber D. Giraldo
 
Meta-evaluation of machine translation evaluation methods
Meta-evaluation of machine translation evaluation methodsMeta-evaluation of machine translation evaluation methods
Meta-evaluation of machine translation evaluation methodsLifeng (Aaron) Han
 
Towards Language-Oriented Modeling (HDR Defense)
Towards Language-Oriented Modeling (HDR Defense)Towards Language-Oriented Modeling (HDR Defense)
Towards Language-Oriented Modeling (HDR Defense)Benoit Combemale
 
Learning Content and Usage Factors Simultaneously
Learning Content and Usage Factors SimultaneouslyLearning Content and Usage Factors Simultaneously
Learning Content and Usage Factors SimultaneouslyArnab Bhadury
 
Requirements-Collector: Automating Requirements Specification from Elicitatio...
Requirements-Collector: Automating Requirements Specification from Elicitatio...Requirements-Collector: Automating Requirements Specification from Elicitatio...
Requirements-Collector: Automating Requirements Specification from Elicitatio...Sebastiano Panichella
 
NLP Tasks and Applications.ppt useful in
NLP Tasks and Applications.ppt useful inNLP Tasks and Applications.ppt useful in
NLP Tasks and Applications.ppt useful inKumari Naveen
 
lect36-tasks.ppt
lect36-tasks.pptlect36-tasks.ppt
lect36-tasks.pptHaHa501620
 
A Dialogue System for Telugu, a Resource-Poor Language
A Dialogue System for Telugu, a Resource-Poor LanguageA Dialogue System for Telugu, a Resource-Poor Language
A Dialogue System for Telugu, a Resource-Poor LanguageSravanthi Mullapudi
 
Usability evaluation of Domain-Specific Languages
Usability evaluation of Domain-Specific LanguagesUsability evaluation of Domain-Specific Languages
Usability evaluation of Domain-Specific LanguagesAnkica Barisic
 
Topic Modeling for Learning Analytics Researchers LAK15 Tutorial
Topic Modeling for Learning Analytics Researchers LAK15 TutorialTopic Modeling for Learning Analytics Researchers LAK15 Tutorial
Topic Modeling for Learning Analytics Researchers LAK15 TutorialVitomir Kovanovic
 
Combining Textual and Graph-based Features for Entity Disambiguation
Combining Textual and Graph-based Features for Entity DisambiguationCombining Textual and Graph-based Features for Entity Disambiguation
Combining Textual and Graph-based Features for Entity Disambiguationshakimov
 
Vectorized Intent of Multilingual Large Language Models.pptx
Vectorized Intent of Multilingual Large Language Models.pptxVectorized Intent of Multilingual Large Language Models.pptx
Vectorized Intent of Multilingual Large Language Models.pptxSachinAngre3
 
Experimenting with eXtreme Design (EKAW2010)
Experimenting with eXtreme Design (EKAW2010)Experimenting with eXtreme Design (EKAW2010)
Experimenting with eXtreme Design (EKAW2010)evabl444
 

Similar a A Low Dimensionality Representation for Language Variety Identification (CICLing 2016) (20)

Modern Programming Languages classification Poster
Modern Programming Languages classification PosterModern Programming Languages classification Poster
Modern Programming Languages classification Poster
 
Applying static code analysis for domain-specific languages
Applying static code analysis for domain-specific languagesApplying static code analysis for domain-specific languages
Applying static code analysis for domain-specific languages
 
ODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLP
 
PL Lecture 01 - preliminaries
PL Lecture 01 - preliminariesPL Lecture 01 - preliminaries
PL Lecture 01 - preliminaries
 
Bridging the gap between AI and UI - DSI Vienna - full version
Bridging the gap between AI and UI - DSI Vienna - full versionBridging the gap between AI and UI - DSI Vienna - full version
Bridging the gap between AI and UI - DSI Vienna - full version
 
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
 
Analysing the concept of quality in model-driven engineering literature: a sy...
Analysing the concept of quality in model-driven engineering literature: a sy...Analysing the concept of quality in model-driven engineering literature: a sy...
Analysing the concept of quality in model-driven engineering literature: a sy...
 
Meta-evaluation of machine translation evaluation methods
Meta-evaluation of machine translation evaluation methodsMeta-evaluation of machine translation evaluation methods
Meta-evaluation of machine translation evaluation methods
 
Towards Language-Oriented Modeling (HDR Defense)
Towards Language-Oriented Modeling (HDR Defense)Towards Language-Oriented Modeling (HDR Defense)
Towards Language-Oriented Modeling (HDR Defense)
 
Learning Content and Usage Factors Simultaneously
Learning Content and Usage Factors SimultaneouslyLearning Content and Usage Factors Simultaneously
Learning Content and Usage Factors Simultaneously
 
Requirements-Collector: Automating Requirements Specification from Elicitatio...
Requirements-Collector: Automating Requirements Specification from Elicitatio...Requirements-Collector: Automating Requirements Specification from Elicitatio...
Requirements-Collector: Automating Requirements Specification from Elicitatio...
 
NLP Tasks and Applications.ppt useful in
NLP Tasks and Applications.ppt useful inNLP Tasks and Applications.ppt useful in
NLP Tasks and Applications.ppt useful in
 
lect36-tasks.ppt
lect36-tasks.pptlect36-tasks.ppt
lect36-tasks.ppt
 
A Dialogue System for Telugu, a Resource-Poor Language
A Dialogue System for Telugu, a Resource-Poor LanguageA Dialogue System for Telugu, a Resource-Poor Language
A Dialogue System for Telugu, a Resource-Poor Language
 
Usability evaluation of Domain-Specific Languages
Usability evaluation of Domain-Specific LanguagesUsability evaluation of Domain-Specific Languages
Usability evaluation of Domain-Specific Languages
 
Topic Modeling for Learning Analytics Researchers LAK15 Tutorial
Topic Modeling for Learning Analytics Researchers LAK15 TutorialTopic Modeling for Learning Analytics Researchers LAK15 Tutorial
Topic Modeling for Learning Analytics Researchers LAK15 Tutorial
 
Digital repertoires of poetry metrics: towards a Linked Open Data ecosystem
Digital repertoires of poetry metrics: towards a Linked Open Data ecosystemDigital repertoires of poetry metrics: towards a Linked Open Data ecosystem
Digital repertoires of poetry metrics: towards a Linked Open Data ecosystem
 
Combining Textual and Graph-based Features for Entity Disambiguation
Combining Textual and Graph-based Features for Entity DisambiguationCombining Textual and Graph-based Features for Entity Disambiguation
Combining Textual and Graph-based Features for Entity Disambiguation
 
Vectorized Intent of Multilingual Large Language Models.pptx
Vectorized Intent of Multilingual Large Language Models.pptxVectorized Intent of Multilingual Large Language Models.pptx
Vectorized Intent of Multilingual Large Language Models.pptx
 
Experimenting with eXtreme Design (EKAW2010)
Experimenting with eXtreme Design (EKAW2010)Experimenting with eXtreme Design (EKAW2010)
Experimenting with eXtreme Design (EKAW2010)
 

Más de Francisco Manuel Rangel Pardo

Profiling Irony and Stereotype Spreaders on Twitter (IROSTEREO)
Profiling Irony and Stereotype Spreaders on Twitter (IROSTEREO)Profiling Irony and Stereotype Spreaders on Twitter (IROSTEREO)
Profiling Irony and Stereotype Spreaders on Twitter (IROSTEREO)Francisco Manuel Rangel Pardo
 
Overview of the 9th Author Profiling task at PAN: Profiling Hate Speech Sprea...
Overview of the 9th Author Profiling task at PAN: Profiling Hate Speech Sprea...Overview of the 9th Author Profiling task at PAN: Profiling Hate Speech Sprea...
Overview of the 9th Author Profiling task at PAN: Profiling Hate Speech Sprea...Francisco Manuel Rangel Pardo
 
Overview of the 8th Author Profiling task at PAN: Profiling Fake News Spreade...
Overview of the 8th Author Profiling task at PAN: Profiling Fake News Spreade...Overview of the 8th Author Profiling task at PAN: Profiling Fake News Spreade...
Overview of the 8th Author Profiling task at PAN: Profiling Fake News Spreade...Francisco Manuel Rangel Pardo
 
Overview of the 7th Author Profiling task at PAN: Bots and Gender Profiling ...
Overview of the 7th Author Profiling task at PAN: Bots and Gender Profiling  ...Overview of the 7th Author Profiling task at PAN: Bots and Gender Profiling  ...
Overview of the 7th Author Profiling task at PAN: Bots and Gender Profiling ...Francisco Manuel Rangel Pardo
 
AL4Trust - Artificial Intelligence for Building Trust 2019
AL4Trust - Artificial Intelligence for Building Trust 2019AL4Trust - Artificial Intelligence for Building Trust 2019
AL4Trust - Artificial Intelligence for Building Trust 2019Francisco Manuel Rangel Pardo
 
Author Profiling en Social Media. En la Academia... y en la Industria.
Author Profiling en Social Media. En la Academia... y en la Industria.Author Profiling en Social Media. En la Academia... y en la Industria.
Author Profiling en Social Media. En la Academia... y en la Industria.Francisco Manuel Rangel Pardo
 
Multimodal Stance Detection in Tweets on Catalan #1Oct Referendum @Ibereval 2...
Multimodal Stance Detection in Tweets on Catalan #1Oct Referendum @Ibereval 2...Multimodal Stance Detection in Tweets on Catalan #1Oct Referendum @Ibereval 2...
Multimodal Stance Detection in Tweets on Catalan #1Oct Referendum @Ibereval 2...Francisco Manuel Rangel Pardo
 
Overview of the 6th Author Profiling task at PAN: Multimodal Gender Identific...
Overview of the 6th Author Profiling task at PAN: Multimodal Gender Identific...Overview of the 6th Author Profiling task at PAN: Multimodal Gender Identific...
Overview of the 6th Author Profiling task at PAN: Multimodal Gender Identific...Francisco Manuel Rangel Pardo
 
RusProfiling Gender Identification in Russian Texts PAN@FIRE
RusProfiling Gender Identification in Russian Texts PAN@FIRERusProfiling Gender Identification in Russian Texts PAN@FIRE
RusProfiling Gender Identification in Russian Texts PAN@FIREFrancisco Manuel Rangel Pardo
 
Stance and Gender Detection in Tweets on Catalan Independence. Ibereval@SEPLN...
Stance and Gender Detection in Tweets on Catalan Independence. Ibereval@SEPLN...Stance and Gender Detection in Tweets on Catalan Independence. Ibereval@SEPLN...
Stance and Gender Detection in Tweets on Catalan Independence. Ibereval@SEPLN...Francisco Manuel Rangel Pardo
 
Gender and Language Variety Identification in Twitter. Overview of the 5th. A...
Gender and Language Variety Identification in Twitter. Overview of the 5th. A...Gender and Language Variety Identification in Twitter. Overview of the 5th. A...
Gender and Language Variety Identification in Twitter. Overview of the 5th. A...Francisco Manuel Rangel Pardo
 
Overview of the 4th. Author Profiling task at PAN-CLEF 2016
Overview of the 4th. Author Profiling task at PAN-CLEF 2016Overview of the 4th. Author Profiling task at PAN-CLEF 2016
Overview of the 4th. Author Profiling task at PAN-CLEF 2016Francisco Manuel Rangel Pardo
 
AL4Trust - Artificial Intelligence for Building Trust
AL4Trust - Artificial Intelligence for Building TrustAL4Trust - Artificial Intelligence for Building Trust
AL4Trust - Artificial Intelligence for Building TrustFrancisco Manuel Rangel Pardo
 
PR-SOCO Personality Recognition in SOurce COde (PAN@FIRE 2016)
PR-SOCO Personality Recognition in SOurce COde (PAN@FIRE 2016)PR-SOCO Personality Recognition in SOurce COde (PAN@FIRE 2016)
PR-SOCO Personality Recognition in SOurce COde (PAN@FIRE 2016)Francisco Manuel Rangel Pardo
 
Overview of PAN'16 - New challenges for Authorship Analysis: Cross-genre prof...
Overview of PAN'16 - New challenges for Authorship Analysis: Cross-genre prof...Overview of PAN'16 - New challenges for Authorship Analysis: Cross-genre prof...
Overview of PAN'16 - New challenges for Authorship Analysis: Cross-genre prof...Francisco Manuel Rangel Pardo
 
El Futuro de las Comunicaciones Personales a Través de los Dispositivos Móvil...
El Futuro de las Comunicaciones Personales a Través de los Dispositivos Móvil...El Futuro de las Comunicaciones Personales a Través de los Dispositivos Móvil...
El Futuro de las Comunicaciones Personales a Través de los Dispositivos Móvil...Francisco Manuel Rangel Pardo
 

Más de Francisco Manuel Rangel Pardo (20)

Profiling Irony and Stereotype Spreaders on Twitter (IROSTEREO)
Profiling Irony and Stereotype Spreaders on Twitter (IROSTEREO)Profiling Irony and Stereotype Spreaders on Twitter (IROSTEREO)
Profiling Irony and Stereotype Spreaders on Twitter (IROSTEREO)
 
Overview of the 9th Author Profiling task at PAN: Profiling Hate Speech Sprea...
Overview of the 9th Author Profiling task at PAN: Profiling Hate Speech Sprea...Overview of the 9th Author Profiling task at PAN: Profiling Hate Speech Sprea...
Overview of the 9th Author Profiling task at PAN: Profiling Hate Speech Sprea...
 
Overview of the 8th Author Profiling task at PAN: Profiling Fake News Spreade...
Overview of the 8th Author Profiling task at PAN: Profiling Fake News Spreade...Overview of the 8th Author Profiling task at PAN: Profiling Fake News Spreade...
Overview of the 8th Author Profiling task at PAN: Profiling Fake News Spreade...
 
Overview of the 7th Author Profiling task at PAN: Bots and Gender Profiling ...
Overview of the 7th Author Profiling task at PAN: Bots and Gender Profiling  ...Overview of the 7th Author Profiling task at PAN: Bots and Gender Profiling  ...
Overview of the 7th Author Profiling task at PAN: Bots and Gender Profiling ...
 
AL4Trust - Artificial Intelligence for Building Trust 2019
AL4Trust - Artificial Intelligence for Building Trust 2019AL4Trust - Artificial Intelligence for Building Trust 2019
AL4Trust - Artificial Intelligence for Building Trust 2019
 
Author Profiling en Social Media. En la Academia... y en la Industria.
Author Profiling en Social Media. En la Academia... y en la Industria.Author Profiling en Social Media. En la Academia... y en la Industria.
Author Profiling en Social Media. En la Academia... y en la Industria.
 
Multimodal Stance Detection in Tweets on Catalan #1Oct Referendum @Ibereval 2...
Multimodal Stance Detection in Tweets on Catalan #1Oct Referendum @Ibereval 2...Multimodal Stance Detection in Tweets on Catalan #1Oct Referendum @Ibereval 2...
Multimodal Stance Detection in Tweets on Catalan #1Oct Referendum @Ibereval 2...
 
Overview of the 6th Author Profiling task at PAN: Multimodal Gender Identific...
Overview of the 6th Author Profiling task at PAN: Multimodal Gender Identific...Overview of the 6th Author Profiling task at PAN: Multimodal Gender Identific...
Overview of the 6th Author Profiling task at PAN: Multimodal Gender Identific...
 
RusProfiling Gender Identification in Russian Texts PAN@FIRE
RusProfiling Gender Identification in Russian Texts PAN@FIRERusProfiling Gender Identification in Russian Texts PAN@FIRE
RusProfiling Gender Identification in Russian Texts PAN@FIRE
 
Stance and Gender Detection in Tweets on Catalan Independence. Ibereval@SEPLN...
Stance and Gender Detection in Tweets on Catalan Independence. Ibereval@SEPLN...Stance and Gender Detection in Tweets on Catalan Independence. Ibereval@SEPLN...
Stance and Gender Detection in Tweets on Catalan Independence. Ibereval@SEPLN...
 
Gender and Language Variety Identification in Twitter. Overview of the 5th. A...
Gender and Language Variety Identification in Twitter. Overview of the 5th. A...Gender and Language Variety Identification in Twitter. Overview of the 5th. A...
Gender and Language Variety Identification in Twitter. Overview of the 5th. A...
 
Overview of the 4th. Author Profiling task at PAN-CLEF 2016
Overview of the 4th. Author Profiling task at PAN-CLEF 2016Overview of the 4th. Author Profiling task at PAN-CLEF 2016
Overview of the 4th. Author Profiling task at PAN-CLEF 2016
 
Redes sociales y preadolescentes
Redes sociales y preadolescentesRedes sociales y preadolescentes
Redes sociales y preadolescentes
 
AL4Trust - Artificial Intelligence for Building Trust
AL4Trust - Artificial Intelligence for Building TrustAL4Trust - Artificial Intelligence for Building Trust
AL4Trust - Artificial Intelligence for Building Trust
 
PR-SOCO Personality Recognition in SOurce COde (PAN@FIRE 2016)
PR-SOCO Personality Recognition in SOurce COde (PAN@FIRE 2016)PR-SOCO Personality Recognition in SOurce COde (PAN@FIRE 2016)
PR-SOCO Personality Recognition in SOurce COde (PAN@FIRE 2016)
 
Overview of PAN'16 - New challenges for Authorship Analysis: Cross-genre prof...
Overview of PAN'16 - New challenges for Authorship Analysis: Cross-genre prof...Overview of PAN'16 - New challenges for Authorship Analysis: Cross-genre prof...
Overview of PAN'16 - New challenges for Authorship Analysis: Cross-genre prof...
 
El Futuro de las Comunicaciones Personales a Través de los Dispositivos Móvil...
El Futuro de las Comunicaciones Personales a Través de los Dispositivos Móvil...El Futuro de las Comunicaciones Personales a Través de los Dispositivos Móvil...
El Futuro de las Comunicaciones Personales a Través de los Dispositivos Móvil...
 
Smart Listening - MUIinf
Smart Listening - MUIinfSmart Listening - MUIinf
Smart Listening - MUIinf
 
IA + Big Data = problema + oportunidad
IA + Big Data = problema + oportunidadIA + Big Data = problema + oportunidad
IA + Big Data = problema + oportunidad
 
Author Profiling task at PAN Lab at CLEF 2015
Author Profiling task at PAN Lab at CLEF 2015Author Profiling task at PAN Lab at CLEF 2015
Author Profiling task at PAN Lab at CLEF 2015
 

Último

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxolyaivanovalion
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 

Último (20)

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 

A Low Dimensionality Representation for Language Variety Identification (CICLing 2016)

  • 1. A Low Dimensionality Representation for Language Variety Identification 17th International Conference on Intelligent Text Processing and Computational Linguistics April 3–9, 2016 • Konya, Turkey Francisco Rangel1,2 , Marc Franco-Salvador1 , Paolo Rosso1 francisco.rangel@autoritas.es, mfranco@prhlt.upv.es, prosso@dsic.upv.es 1 Universitat Politècnica de València, Spain 2 Autoritas Consulting, Spain
  • 2. Language variety identification aims to detect linguistic variations in order to classify different varieties of the same language. Language variety identification may be considered an author profiling task, besides a classification one, because the cultural idiosyncrasies may influence the way users use the language (e.g. different expressions, vocabulary…). Introduction - Introduction - Related work - Low Dimensionality Representation - Evaluation framework - Results and discussion - Conclusions and future work
  • 3. An example The same sentence in different varieties of Spanish: - Introduction - Related work - Low Dimensionality Representation - Evaluation framework - Results and discussion - Conclusions and future work
  • 4. Related Work Tasks on language variety identification: - Workshop on Language Technology for Closely Related Languages and Language Variants at EMNLP 2014 - VarDial Workshop - Applying NLP Tools to Similar Languages, Varieties and Dialects at COLING 2014 - T4VarDial - Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects (DSL) shared task (Zampieri et al., 2014 and 2015) at RANLP 2015 - Introduction - Related work - Low Dimensionality Representation - Evaluation framework - Results and discussion - Conclusions and future work
  • 5. Related Work Authors Varieties Media Features Algorithm Evaluation Accuracy Zampieri and Gebre (2012) Portuguese News word and character n- grams Probability distributions with log- likelihood 50-50 split ~90% Sadat et al. (2014) Arabic Blogs Fora character n- grams Support Vector Machines 10-fold cross- validation 70-80% Maier and Gómez- Rodríguez (2014) Spanish Twitter character n- grams; LZW; syllable-based language models Meta-learning cross-validation 60-70% - Introduction - Related work - Low Dimensionality Representation - Evaluation framework - Results and discussion - Conclusions and future work
  • 6. To discriminate between different varieties of the same language, but with the following differences: - We focus on different varieties of Spanish, although we tested our approach also with a different set of languages. - Instead of n-gram based representations, we propose a low dimensionality representation which is helpful when dealing with big data in social media. - We evaluate the proposed method with an independent test set generated from different authors in order to reduce possible overfitting. - We make available our dataset to the research community. (https: //github.com/autoritas/RD-Lab/tree/master/data/HispaBlogs) Objective
  • 7. Step 1. Term-frequency - inverse document frequency (tf-idf) matrix: Step 2. Class-dependent term weighting: Step 3. Class-dependent document representation: Low dimensionality representation (LDR) - Each column is a vocabulary term t - Each row is a document d - wij is the tf-idf weight of the term j in the document i - represents the assigned class c to the document i - Introduction - Related work - Low Dimensionality Representation - Evaluation framework - Results and discussion - Conclusions and future work
  • 8. Low dimensionality representation (LDR) Meaning of the measures - Introduction - Related work - Low Dimensionality Representation - Evaluation framework - Results and discussion - Conclusions and future work
  • 9. Alternative representations - We use the common state-of-the-art representations based on n-grams. We iterated n from 1 to 10, and selected the 1000, 5000 and 10000 most frequent n-grams. The best results were obtained with: - character 4-grams; the 10,000 most frequent - word 1-gram (bag-of-words); the 10,000 most frequent - word 2-grams; the 10,000 highest tf-idf - Two variations of the continuous Skip-gram model (Mikolov et al.): - Skip-grams - Sentence Vectors Maximizing the average of the log probability: Using the negative sampling estimator: - Introduction - Related work - Low Dimensionality Representation - Evaluation framework - Results and discussion - Conclusions and future work
  • 10. Hispablogs dataset Hispablogs dataset https://github.com/autoritas/RD-Lab/tree/master/data/HispaBlogs - - Completely independent authors between training and test sets - - Manually collected by social media experts of Autoritas - Introduction - Related work - Low Dimensionality Representation - Evaluation framework - Results and discussion - Conclusions and future work
  • 11. Machine learning algorithms comparison Accuracy results with different machine learning algorithms Significance of the results wrt. the two systems with the highest performance - Introduction - Related work - Low Dimensionality Representation - Evaluation framework - Results and discussion - Conclusions and future work
  • 12. Preprocessing impact Accuracy obtained after removing words with frequency equal or lower than n (a) Continuous scale (b) Non-continuous scale - Introduction - Related work - Low Dimensionality Representation - Evaluation framework - Results and discussion - Conclusions and future work
  • 13. Preprocessing impact Number of words after removing those with frequency equal or lower than n, and some examples of very infrequent words. - Introduction - Related work - Low Dimensionality Representation - Evaluation framework - Results and discussion - Conclusions and future work
  • 14. Classification results Accuracy results per representation * ** * ** - Introduction - Related work - Low Dimensionality Representation - Evaluation framework - Results and discussion - Conclusions and future work
  • 15. Error analysis Confusion matrix of the 5-class classification F1 values for identification as the corresponding language variety vs. others - Introduction - Related work - Low Dimensionality Representation - Evaluation framework - Results and discussion - Conclusions and future work
  • 16. Most discriminating features Features sorted by Information Gain - Introduction - Related work - Low Dimensionality Representation - Evaluation framework - Results and discussion - Conclusions and future work
  • 17. Most discriminating features Accuracy obtained with different combinations of features - Introduction - Related work - Low Dimensionality Representation - Evaluation framework - Results and discussion - Conclusions and future work
  • 18. Complexity of obtaining the features: Number of features: Cost analysis Representation # Features LDR 30 Skip-gram 300 SenVec 300 BOW 10,000 Char 4-grams 10,000 tf-idf 2-grams 10,000 l: number of varieties n: number of terms of the document m: number of terms in the document that coincides with some term in the vocabulary n m & l<<n - Introduction - Related work - Low Dimensionality Representation - Evaluation framework - Results and discussion - Conclusions and future work
  • 19. Results obtained with the development set of the DSLCC corpus from the Discriminating between Similar Languages task (2015) NOTE: Significant results in bold Robustness
  • 20. Conclusions and future work LDR outperforms common state-of-the-art representations by 35% increase in accuracy. LDR obtains competitive results compared with two distributed representation-based approaches that employed the popular continuous Skip-gram model. LDR remains competitive with different languages and media (DSLCC). The dimensionality reduction is from thousands to only 6 features per language variety. This allows to deal with big data in social media. We have applied LDR to age and gender identification and we plan to apply LDR to personality recognition. - Introduction - Related work - Low Dimensionality Representation - Evaluation framework - Results and discussion - Conclusions and future work
  • 21. Thank you very much! Francisco Rangel1,2 , Marc Franco-Salvador1 , Paolo Rosso1 francisco.rangel@autoritas.es, mfranco@prhlt.upv.es, prosso@dsic.upv.es Interested in digital text forensics (author profiling, authorship identification, author obfuscation)? Do not hesitate and participate in the PAN laboratory!! http://pan.webis.de/ 1 Universitat Politècnica de València, Spain 2 Autoritas Consulting, Spain