SlideShare una empresa de Scribd logo
1 de 18
Authors
UNIVERSITY
POLITEHNICA
OF BUCHAREST
Opinion Mining for Social Media
and News Items in Romanian
Claudia Cârdei
Filip Manișor
Traian Rebedea traian.rebedea@cs.pub.ro
Overview
• Introduction
• Previous Work
– English
– Romanian
• Proposed Solutions
• Opinionated Corpus
• Results and Comparisons
• Conclusions
22.09.13 Sesiunea de Licenţe - Iulie 2012 2
Introduction
• Sentiment analysis and opinion mining research
has mainly concentrated on English and other
important languages (Spanish, Chinese, etc.)
– Various commercial and open-source solutions exist
mainly for English
– Corpora of opinionated texts and databases of
affective words (general or domain specific) also exist
for these languages
• Objective: develop an opinion mining solution for
Romanian texts gathered from a wide range of
online sources (mostly social media and news
items)
22.09.13
ICSCS 2013 . K-TEAMS 2013 Workshop
Opinion Mining for Social Media and News Items in Romanian 3
Introduction
• Popular research domain in the last years
• Sentiment, subjectivity, opinion, publicity
– Related, but somewhat different
• Sentiment or subjectivity in a text:
– Positive, negative or neutral
– Subjective or objective
• Opinionated text
– Opinion author
– Opinion target (subject)
– Opinion (affective) words
– Opinion polarity
E.g. President Obama declared that the US immigration system is broken.
22.09.13
ICSCS 2013 . K-TEAMS 2013 Workshop
Opinion Mining for Social Media and News Items in Romanian 4
Previous Work - English
22.09.13
ICSCS 2013 . K-TEAMS 2013 Workshop
Opinion Mining for Social Media and News Items in Romanian 5
Previous Work - English
• Lots of studies and corpora in different domains
• The movie reviews dataset – very popular
• Initial results using BoW, punctuation, etc.
– Accuracy ≈ 80%
• Improvement to find relations/dependencies
between opinion targets and affective words
– Accuracy ≈ 84%
• Mining frequent dependency subtrees for
positive and negative reviews and using a SVM
with these subtrees as features
– Accuracy ≈ 88%
22.09.13
ICSCS 2013 . K-TEAMS 2013 Workshop
Opinion Mining for Social Media and News Items in Romanian 6
Previous Work - Romanian
• Use machine translation to generate English
texts, then apply opinion mining
• Translate affective words databases in
Romanian (e.g. WordNet Affect)
• Developing new affective words lists
• Training and evaluation on specific corpora in
Romanian
• Problems with NER, dependency parsing,
affective words scores
22.09.13
ICSCS 2013 . K-TEAMS 2013 Workshop
Opinion Mining for Social Media and News Items in Romanian 7
Proposed Solutions
• Supervised solution trained for several
different opinion subjects (entities)
• Three approaches
– Bag of words
– Affective words and dependency parsing
– N-grams probabilities
22.09.13
ICSCS 2013 . K-TEAMS 2013 Workshop
Opinion Mining for Social Media and News Items in Romanian 8
Bag of Words
• Bag of words model:
– Tokenization, diacritics restoration, lemmatization
– Distinct lemmas selected as features
– Improvements: POS filter, word n-grams filter
– Used both binary features and TF-IDF
22.09.13
ICSCS 2013 . K-TEAMS 2013 Workshop
Opinion Mining for Social Media and News Items in Romanian 9
Affective Scores & Dependency Parsing
• Compute affective word scores in Romanian:
– Translate all the adjectives and adverbs from the English WordNet
into Romanian using Google Translate
– Uses the probability of each translation pair
• Several affective score databases have been translated:
SentiWordNet, SenticNet 2 and ANEW
• Used the UAIC Romanian FDG parser to identify dependencies
between the subject entity and adjectives or adverbs
22.09.13
ICSCS 2013 . K-TEAMS 2013 Workshop
Opinion Mining for Social Media and News Items in Romanian 10
N-grams Probabilities
• Compute the conditional probability for each
n-gram in the corpus given that the document
is either positive or negative
• Then use the following score for each n-gram
(feature f):
• The score of a new text is computed by
summing the scores for each of the n-grams
existing in that text
22.09.13
ICSCS 2013 . K-TEAMS 2013 Workshop
Opinion Mining for Social Media and News Items in Romanian 11
Opinionated Corpus
• Corpus manually annotated by analysts for their
customers (created by Treeworks for their
product ZeList, www.zelist.ro)
• ZeList indexes most of the texts published in
Romanian in most popular social networks, blogs,
online forums, news websites, etc.
• Used data for seven different entities (companies
or brands) ranging from banks and beer brands
and going to web publishers and media
corporations
• The name of the entities have been anonymized
22.09.13
ICSCS 2013 . K-TEAMS 2013 Workshop
Opinion Mining for Social Media and News Items in Romanian 12
Opinionated Corpus
• Problems:
– These texts are very noisy, very heterogeneous,
from a wide range of sources and with different
writing styles (e.g. Twitter vs. news items)
– Some of them also might express positive and
negative publicity rather than opinions
22.09.13
ICSCS 2013 . K-TEAMS 2013 Workshop
Opinion Mining for Social Media and News Items in Romanian 13
Opinionated Corpus
• Data about the first version of the corpus
• Data collection ranged from a couple of months to a couple of
years, depending on the entity
• The second version contained a larger export of data for each
entity
22.09.13
ICSCS 2013 . K-TEAMS 2013 Workshop
Opinion Mining for Social Media and News Items in Romanian 14
Entity Total items Neutral Opinionated Positive Negative
Ent1 6055 5853 202 29 173
Ent2 2240 1961 279 222 57
Ent3 343 260 83 64 19
Ent4 1168 876 292 120 172
Ent5 539 520 19 17 2
Ent6 1025 570 455 330 125
Ent7 3787 3016 771 593 178
Results - Outline
• Results obtained for the first version of the corpus, for all
entities
• Accuracy positive-negative should be more relevant
• Good results for entities with more data, poor results for the
ones with a small number of opinionated texts
22.09.13
ICSCS 2013 . K-TEAMS 2013 Workshop
Opinion Mining for Social Media and News Items in Romanian 15
Entity
Total
items
Neutral Opinionated
Accuracy
opinion-neutral
Accuracy
positive-
negative
Ent1 6055 5853 202 97.01% 92.07%
Ent2 2240 1961 279 91.79% 87.81%
Ent3 343 260 83 84.84% 89.15%
Ent4 1168 876 292 86.22% 82.19%
Ent5 539 520 19 97.40% 57.89%
Ent6 1025 570 455 76.20% 84.17%
Ent7 3787 3016 771 81.75% 83.65%
Results - Comparison
• Comparison of the above presented solutions using the
second (larger) version of the corpus
• Only for one entity by extracting a balanced dataset with 700
positive and 700 negative opinionated texts
22.09.13
ICSCS 2013 . K-TEAMS 2013 Workshop
Opinion Mining for Social Media and News Items in Romanian 16
Method Accuracy
BoW + POS filter 81.31%
BoW only adj. 70.89%
BoW only adj. & adv. 76.60%
Frequent bigrams 80.88%
Frequent trigrams 76.60%
Affective scores + dependency parsing 52.18%
Affective scores (comparison with 0 decision) 55.35%
Trigrams probabilities 88.44%
Bigrams probabilities 72.54%
Conclusions
• Several alternatives for determining the opinion
polarity have been evaluated on a corpus manually
annotated for different Romanian entities
• Best results obtained at this moment: BoW plus a POS
filter or a frequent bigrams approach + SVM classifier
• Romanian FDG parser does not provide a good
accuracy for the dependency parsing task, especially
for texts from social media
– Texts are somewhat freely written, with little regards to
usual form or structure
– Improvement of this method & the affective words
database are still possible
22.09.13
ICSCS 2013 . K-TEAMS 2013 Workshop
Opinion Mining for Social Media and News Items in Romanian 17
Thank you!
• Questions?
• Discussions
22.09.13 CSCS 2013 – Bucharest, Romania 18

Más contenido relacionado

La actualidad más candente

Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)Sebastian Ruder
 
LDA Beginner's Tutorial
LDA Beginner's TutorialLDA Beginner's Tutorial
LDA Beginner's TutorialWayne Lee
 
Admixture of Poisson MRFs: A New Topic Model with Word Dependencies
Admixture of Poisson MRFs: A New Topic Model with Word DependenciesAdmixture of Poisson MRFs: A New Topic Model with Word Dependencies
Admixture of Poisson MRFs: A New Topic Model with Word DependenciesDavid Inouye
 
Chinese Character Decomposition for Neural MT with Multi-Word Expressions
Chinese Character Decomposition for  Neural MT with Multi-Word ExpressionsChinese Character Decomposition for  Neural MT with Multi-Word Expressions
Chinese Character Decomposition for Neural MT with Multi-Word ExpressionsLifeng (Aaron) Han
 
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.Lifeng (Aaron) Han
 
Practical machine learning - Part 1
Practical machine learning - Part 1Practical machine learning - Part 1
Practical machine learning - Part 1Traian Rebedea
 
Topic Modeling for Learning Analytics Researchers LAK15 Tutorial
Topic Modeling for Learning Analytics Researchers LAK15 TutorialTopic Modeling for Learning Analytics Researchers LAK15 Tutorial
Topic Modeling for Learning Analytics Researchers LAK15 TutorialVitomir Kovanovic
 
Apply chinese radicals into neural machine translation: deeper than character...
Apply chinese radicals into neural machine translation: deeper than character...Apply chinese radicals into neural machine translation: deeper than character...
Apply chinese radicals into neural machine translation: deeper than character...Lifeng (Aaron) Han
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataAndre Freitas
 
Tutorial on Question Answering Systems
Tutorial on Question Answering Systems Tutorial on Question Answering Systems
Tutorial on Question Answering Systems Saeedeh Shekarpour
 
Frontiers of Natural Language Processing
Frontiers of Natural Language ProcessingFrontiers of Natural Language Processing
Frontiers of Natural Language ProcessingSebastian Ruder
 
Question Answering - Application and Challenges
Question Answering - Application and ChallengesQuestion Answering - Application and Challenges
Question Answering - Application and ChallengesJens Lehmann
 
PubhD talk: MT serving the society
PubhD talk: MT serving the societyPubhD talk: MT serving the society
PubhD talk: MT serving the societyLifeng (Aaron) Han
 
Open domain Question Answering System - Research project in NLP
Open domain  Question Answering System - Research project in NLPOpen domain  Question Answering System - Research project in NLP
Open domain Question Answering System - Research project in NLPGVS Chaitanya
 
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Andre Freitas
 
Semantic Interpretation of User Query for Question Answering on Interlinked Data
Semantic Interpretation of User Query for Question Answering on Interlinked DataSemantic Interpretation of User Query for Question Answering on Interlinked Data
Semantic Interpretation of User Query for Question Answering on Interlinked DataSaeedeh Shekarpour
 
Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...
Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...
Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...Lifeng (Aaron) Han
 
Lecture 2: Computational Semantics
Lecture 2: Computational SemanticsLecture 2: Computational Semantics
Lecture 2: Computational SemanticsMarina Santini
 

La actualidad más candente (20)

Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
 
LDA Beginner's Tutorial
LDA Beginner's TutorialLDA Beginner's Tutorial
LDA Beginner's Tutorial
 
Admixture of Poisson MRFs: A New Topic Model with Word Dependencies
Admixture of Poisson MRFs: A New Topic Model with Word DependenciesAdmixture of Poisson MRFs: A New Topic Model with Word Dependencies
Admixture of Poisson MRFs: A New Topic Model with Word Dependencies
 
Chinese Character Decomposition for Neural MT with Multi-Word Expressions
Chinese Character Decomposition for  Neural MT with Multi-Word ExpressionsChinese Character Decomposition for  Neural MT with Multi-Word Expressions
Chinese Character Decomposition for Neural MT with Multi-Word Expressions
 
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
 
Practical machine learning - Part 1
Practical machine learning - Part 1Practical machine learning - Part 1
Practical machine learning - Part 1
 
Topic Modeling for Learning Analytics Researchers LAK15 Tutorial
Topic Modeling for Learning Analytics Researchers LAK15 TutorialTopic Modeling for Learning Analytics Researchers LAK15 Tutorial
Topic Modeling for Learning Analytics Researchers LAK15 Tutorial
 
Question answering
Question answeringQuestion answering
Question answering
 
Apply chinese radicals into neural machine translation: deeper than character...
Apply chinese radicals into neural machine translation: deeper than character...Apply chinese radicals into neural machine translation: deeper than character...
Apply chinese radicals into neural machine translation: deeper than character...
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big data
 
Tutorial on Question Answering Systems
Tutorial on Question Answering Systems Tutorial on Question Answering Systems
Tutorial on Question Answering Systems
 
Frontiers of Natural Language Processing
Frontiers of Natural Language ProcessingFrontiers of Natural Language Processing
Frontiers of Natural Language Processing
 
Question Answering - Application and Challenges
Question Answering - Application and ChallengesQuestion Answering - Application and Challenges
Question Answering - Application and Challenges
 
PubhD talk: MT serving the society
PubhD talk: MT serving the societyPubhD talk: MT serving the society
PubhD talk: MT serving the society
 
Open domain Question Answering System - Research project in NLP
Open domain  Question Answering System - Research project in NLPOpen domain  Question Answering System - Research project in NLP
Open domain Question Answering System - Research project in NLP
 
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
 
Semantic Interpretation of User Query for Question Answering on Interlinked Data
Semantic Interpretation of User Query for Question Answering on Interlinked DataSemantic Interpretation of User Query for Question Answering on Interlinked Data
Semantic Interpretation of User Query for Question Answering on Interlinked Data
 
Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...
Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...
Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...
 
Information Extraction
Information ExtractionInformation Extraction
Information Extraction
 
Lecture 2: Computational Semantics
Lecture 2: Computational SemanticsLecture 2: Computational Semantics
Lecture 2: Computational Semantics
 

Similar a Opinion mining for social media and news items in Romanian

A tailor-made one-size-fits-all approach to sentiment analysis
A tailor-made one-size-fits-all approach to sentiment analysisA tailor-made one-size-fits-all approach to sentiment analysis
A tailor-made one-size-fits-all approach to sentiment analysisDiana Maynard
 
Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...
Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...
Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...Timo Wandhoefer
 
Eric Mayer and Kathryn Eccles, Oxford Internet Institute
Eric Mayer and Kathryn Eccles, Oxford Internet InstituteEric Mayer and Kathryn Eccles, Oxford Internet Institute
Eric Mayer and Kathryn Eccles, Oxford Internet InstituteSarahFahmy
 
SoundSoftware: Software Sustainability for audio and Music Researchers
SoundSoftware: Software Sustainability for audio and Music Researchers SoundSoftware: Software Sustainability for audio and Music Researchers
SoundSoftware: Software Sustainability for audio and Music Researchers SoundSoftware ac.uk
 
Online08 stm market-outlook-vcamlek finalv1 (2)
Online08 stm market-outlook-vcamlek finalv1 (2)Online08 stm market-outlook-vcamlek finalv1 (2)
Online08 stm market-outlook-vcamlek finalv1 (2)rotciv
 
The Student's and Researcher's Guide to Discovery: Exploring Scientific Field...
The Student's and Researcher's Guide to Discovery: Exploring Scientific Field...The Student's and Researcher's Guide to Discovery: Exploring Scientific Field...
The Student's and Researcher's Guide to Discovery: Exploring Scientific Field...Open Knowledge Maps
 
Dictionary self-assessment test: a way to complete on-line dictionaries. Jor...
Dictionary self-assessment test: a way  to complete on-line dictionaries. Jor...Dictionary self-assessment test: a way  to complete on-line dictionaries. Jor...
Dictionary self-assessment test: a way to complete on-line dictionaries. Jor...TERMCAT
 
New Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the Map
New Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the MapNew Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the Map
New Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the MapAxel Bruns
 
20190527_Karen Hytteballe Ibanez _ The OPERA project
 20190527_Karen Hytteballe Ibanez _ The OPERA project 20190527_Karen Hytteballe Ibanez _ The OPERA project
20190527_Karen Hytteballe Ibanez _ The OPERA projectOpenAIRE
 
I vox presentation esomar conference innovate barcelona 2010
I vox presentation esomar conference innovate barcelona 2010I vox presentation esomar conference innovate barcelona 2010
I vox presentation esomar conference innovate barcelona 2010iVOX
 
A pedagogic assessment of mobile learning applications
A pedagogic assessment of mobile learning applicationsA pedagogic assessment of mobile learning applications
A pedagogic assessment of mobile learning applicationsAtlas Uned
 
Engagement handouts
Engagement handoutsEngagement handouts
Engagement handoutsSTIinnsbruck
 
Industry-Academia Communication In Empirical Software Engineering
Industry-Academia Communication In Empirical Software EngineeringIndustry-Academia Communication In Empirical Software Engineering
Industry-Academia Communication In Empirical Software EngineeringPer Runeson
 
9 wietse hermanns
9  wietse hermanns9  wietse hermanns
9 wietse hermannsFEST
 
Observations on Annotations – From Computational Linguistics and the World Wi...
Observations on Annotations – From Computational Linguistics and the World Wi...Observations on Annotations – From Computational Linguistics and the World Wi...
Observations on Annotations – From Computational Linguistics and the World Wi...Georg Rehm
 
The MyRI project presentation
The MyRI project presentationThe MyRI project presentation
The MyRI project presentationRos Pan
 
Best practices on co-design and research communication from finland
Best practices on co-design and research communication from finlandBest practices on co-design and research communication from finland
Best practices on co-design and research communication from finlandtyndallcentreuea
 
How to measure the impact of Research ?
How to measure the impact of Research ?How to measure the impact of Research ?
How to measure the impact of Research ?Le_GFII
 

Similar a Opinion mining for social media and news items in Romanian (20)

A tailor-made one-size-fits-all approach to sentiment analysis
A tailor-made one-size-fits-all approach to sentiment analysisA tailor-made one-size-fits-all approach to sentiment analysis
A tailor-made one-size-fits-all approach to sentiment analysis
 
Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...
Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...
Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...
 
JISC-WW1
JISC-WW1JISC-WW1
JISC-WW1
 
Eric Mayer and Kathryn Eccles, Oxford Internet Institute
Eric Mayer and Kathryn Eccles, Oxford Internet InstituteEric Mayer and Kathryn Eccles, Oxford Internet Institute
Eric Mayer and Kathryn Eccles, Oxford Internet Institute
 
SoundSoftware: Software Sustainability for audio and Music Researchers
SoundSoftware: Software Sustainability for audio and Music Researchers SoundSoftware: Software Sustainability for audio and Music Researchers
SoundSoftware: Software Sustainability for audio and Music Researchers
 
Online08 stm market-outlook-vcamlek finalv1 (2)
Online08 stm market-outlook-vcamlek finalv1 (2)Online08 stm market-outlook-vcamlek finalv1 (2)
Online08 stm market-outlook-vcamlek finalv1 (2)
 
The Student's and Researcher's Guide to Discovery: Exploring Scientific Field...
The Student's and Researcher's Guide to Discovery: Exploring Scientific Field...The Student's and Researcher's Guide to Discovery: Exploring Scientific Field...
The Student's and Researcher's Guide to Discovery: Exploring Scientific Field...
 
Analyzing User Reviews in Tourism with Topic Models
Analyzing User Reviews in Tourism with Topic ModelsAnalyzing User Reviews in Tourism with Topic Models
Analyzing User Reviews in Tourism with Topic Models
 
Dictionary self-assessment test: a way to complete on-line dictionaries. Jor...
Dictionary self-assessment test: a way  to complete on-line dictionaries. Jor...Dictionary self-assessment test: a way  to complete on-line dictionaries. Jor...
Dictionary self-assessment test: a way to complete on-line dictionaries. Jor...
 
New Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the Map
New Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the MapNew Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the Map
New Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the Map
 
20190527_Karen Hytteballe Ibanez _ The OPERA project
 20190527_Karen Hytteballe Ibanez _ The OPERA project 20190527_Karen Hytteballe Ibanez _ The OPERA project
20190527_Karen Hytteballe Ibanez _ The OPERA project
 
I vox presentation esomar conference innovate barcelona 2010
I vox presentation esomar conference innovate barcelona 2010I vox presentation esomar conference innovate barcelona 2010
I vox presentation esomar conference innovate barcelona 2010
 
A pedagogic assessment of mobile learning applications
A pedagogic assessment of mobile learning applicationsA pedagogic assessment of mobile learning applications
A pedagogic assessment of mobile learning applications
 
Engagement handouts
Engagement handoutsEngagement handouts
Engagement handouts
 
Industry-Academia Communication In Empirical Software Engineering
Industry-Academia Communication In Empirical Software EngineeringIndustry-Academia Communication In Empirical Software Engineering
Industry-Academia Communication In Empirical Software Engineering
 
9 wietse hermanns
9  wietse hermanns9  wietse hermanns
9 wietse hermanns
 
Observations on Annotations – From Computational Linguistics and the World Wi...
Observations on Annotations – From Computational Linguistics and the World Wi...Observations on Annotations – From Computational Linguistics and the World Wi...
Observations on Annotations – From Computational Linguistics and the World Wi...
 
The MyRI project presentation
The MyRI project presentationThe MyRI project presentation
The MyRI project presentation
 
Best practices on co-design and research communication from finland
Best practices on co-design and research communication from finlandBest practices on co-design and research communication from finland
Best practices on co-design and research communication from finland
 
How to measure the impact of Research ?
How to measure the impact of Research ?How to measure the impact of Research ?
How to measure the impact of Research ?
 

Más de Traian Rebedea

AI @ Wholi - Bucharest.AI Meetup #5
AI @ Wholi - Bucharest.AI Meetup #5AI @ Wholi - Bucharest.AI Meetup #5
AI @ Wholi - Bucharest.AI Meetup #5Traian Rebedea
 
Deep neural networks for matching online social networking profiles
Deep neural networks for matching online social networking profilesDeep neural networks for matching online social networking profiles
Deep neural networks for matching online social networking profilesTraian Rebedea
 
Intro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringIntro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringTraian Rebedea
 
Propunere de dezvoltare a carierei universitare
Propunere de dezvoltare a carierei universitarePropunere de dezvoltare a carierei universitare
Propunere de dezvoltare a carierei universitareTraian Rebedea
 
Automatic plagiarism detection system for specialized corpora
Automatic plagiarism detection system for specialized corporaAutomatic plagiarism detection system for specialized corpora
Automatic plagiarism detection system for specialized corporaTraian Rebedea
 
Importanța algoritmilor pentru problemele de la interviuri
Importanța algoritmilor pentru problemele de la interviuriImportanța algoritmilor pentru problemele de la interviuri
Importanța algoritmilor pentru problemele de la interviuriTraian Rebedea
 
Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...
Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...
Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...Traian Rebedea
 
Conclusions and Recommendations of the Romanian ICT RTD Survey
Conclusions and Recommendations of the Romanian ICT RTD SurveyConclusions and Recommendations of the Romanian ICT RTD Survey
Conclusions and Recommendations of the Romanian ICT RTD SurveyTraian Rebedea
 
Istoria Web-ului - part 2 - tentativ How to Web 2009
Istoria Web-ului - part 2 - tentativ How to Web 2009Istoria Web-ului - part 2 - tentativ How to Web 2009
Istoria Web-ului - part 2 - tentativ How to Web 2009Traian Rebedea
 
Istoria Web-ului - part 1 (2) - tentativ How to Web 2009
Istoria Web-ului - part 1 (2) - tentativ How to Web 2009Istoria Web-ului - part 1 (2) - tentativ How to Web 2009
Istoria Web-ului - part 1 (2) - tentativ How to Web 2009Traian Rebedea
 
Istoria Web-ului - part 1 - tentativ How to Web 2009
Istoria Web-ului - part 1 - tentativ How to Web 2009Istoria Web-ului - part 1 - tentativ How to Web 2009
Istoria Web-ului - part 1 - tentativ How to Web 2009Traian Rebedea
 
Algorithm Design and Complexity - Course 12
Algorithm Design and Complexity - Course 12Algorithm Design and Complexity - Course 12
Algorithm Design and Complexity - Course 12Traian Rebedea
 
Algorithm Design and Complexity - Course 11
Algorithm Design and Complexity - Course 11Algorithm Design and Complexity - Course 11
Algorithm Design and Complexity - Course 11Traian Rebedea
 
Algorithm Design and Complexity - Course 10
Algorithm Design and Complexity - Course 10Algorithm Design and Complexity - Course 10
Algorithm Design and Complexity - Course 10Traian Rebedea
 
Algorithm Design and Complexity - Course 9
Algorithm Design and Complexity - Course 9Algorithm Design and Complexity - Course 9
Algorithm Design and Complexity - Course 9Traian Rebedea
 
Algorithm Design and Complexity - Course 8
Algorithm Design and Complexity - Course 8Algorithm Design and Complexity - Course 8
Algorithm Design and Complexity - Course 8Traian Rebedea
 
Algorithm Design and Complexity - Course 7
Algorithm Design and Complexity - Course 7Algorithm Design and Complexity - Course 7
Algorithm Design and Complexity - Course 7Traian Rebedea
 
Algorithm Design and Complexity - Course 6
Algorithm Design and Complexity - Course 6Algorithm Design and Complexity - Course 6
Algorithm Design and Complexity - Course 6Traian Rebedea
 
Algorithm Design and Complexity - Course 5
Algorithm Design and Complexity - Course 5Algorithm Design and Complexity - Course 5
Algorithm Design and Complexity - Course 5Traian Rebedea
 

Más de Traian Rebedea (20)

AI @ Wholi - Bucharest.AI Meetup #5
AI @ Wholi - Bucharest.AI Meetup #5AI @ Wholi - Bucharest.AI Meetup #5
AI @ Wholi - Bucharest.AI Meetup #5
 
Deep neural networks for matching online social networking profiles
Deep neural networks for matching online social networking profilesDeep neural networks for matching online social networking profiles
Deep neural networks for matching online social networking profiles
 
Intro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringIntro to Deep Learning for Question Answering
Intro to Deep Learning for Question Answering
 
What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?
 
Propunere de dezvoltare a carierei universitare
Propunere de dezvoltare a carierei universitarePropunere de dezvoltare a carierei universitare
Propunere de dezvoltare a carierei universitare
 
Automatic plagiarism detection system for specialized corpora
Automatic plagiarism detection system for specialized corporaAutomatic plagiarism detection system for specialized corpora
Automatic plagiarism detection system for specialized corpora
 
Importanța algoritmilor pentru problemele de la interviuri
Importanța algoritmilor pentru problemele de la interviuriImportanța algoritmilor pentru problemele de la interviuri
Importanța algoritmilor pentru problemele de la interviuri
 
Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...
Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...
Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...
 
Conclusions and Recommendations of the Romanian ICT RTD Survey
Conclusions and Recommendations of the Romanian ICT RTD SurveyConclusions and Recommendations of the Romanian ICT RTD Survey
Conclusions and Recommendations of the Romanian ICT RTD Survey
 
Istoria Web-ului - part 2 - tentativ How to Web 2009
Istoria Web-ului - part 2 - tentativ How to Web 2009Istoria Web-ului - part 2 - tentativ How to Web 2009
Istoria Web-ului - part 2 - tentativ How to Web 2009
 
Istoria Web-ului - part 1 (2) - tentativ How to Web 2009
Istoria Web-ului - part 1 (2) - tentativ How to Web 2009Istoria Web-ului - part 1 (2) - tentativ How to Web 2009
Istoria Web-ului - part 1 (2) - tentativ How to Web 2009
 
Istoria Web-ului - part 1 - tentativ How to Web 2009
Istoria Web-ului - part 1 - tentativ How to Web 2009Istoria Web-ului - part 1 - tentativ How to Web 2009
Istoria Web-ului - part 1 - tentativ How to Web 2009
 
Algorithm Design and Complexity - Course 12
Algorithm Design and Complexity - Course 12Algorithm Design and Complexity - Course 12
Algorithm Design and Complexity - Course 12
 
Algorithm Design and Complexity - Course 11
Algorithm Design and Complexity - Course 11Algorithm Design and Complexity - Course 11
Algorithm Design and Complexity - Course 11
 
Algorithm Design and Complexity - Course 10
Algorithm Design and Complexity - Course 10Algorithm Design and Complexity - Course 10
Algorithm Design and Complexity - Course 10
 
Algorithm Design and Complexity - Course 9
Algorithm Design and Complexity - Course 9Algorithm Design and Complexity - Course 9
Algorithm Design and Complexity - Course 9
 
Algorithm Design and Complexity - Course 8
Algorithm Design and Complexity - Course 8Algorithm Design and Complexity - Course 8
Algorithm Design and Complexity - Course 8
 
Algorithm Design and Complexity - Course 7
Algorithm Design and Complexity - Course 7Algorithm Design and Complexity - Course 7
Algorithm Design and Complexity - Course 7
 
Algorithm Design and Complexity - Course 6
Algorithm Design and Complexity - Course 6Algorithm Design and Complexity - Course 6
Algorithm Design and Complexity - Course 6
 
Algorithm Design and Complexity - Course 5
Algorithm Design and Complexity - Course 5Algorithm Design and Complexity - Course 5
Algorithm Design and Complexity - Course 5
 

Último

Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 

Último (20)

Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 

Opinion mining for social media and news items in Romanian

  • 1. Authors UNIVERSITY POLITEHNICA OF BUCHAREST Opinion Mining for Social Media and News Items in Romanian Claudia Cârdei Filip Manișor Traian Rebedea traian.rebedea@cs.pub.ro
  • 2. Overview • Introduction • Previous Work – English – Romanian • Proposed Solutions • Opinionated Corpus • Results and Comparisons • Conclusions 22.09.13 Sesiunea de Licenţe - Iulie 2012 2
  • 3. Introduction • Sentiment analysis and opinion mining research has mainly concentrated on English and other important languages (Spanish, Chinese, etc.) – Various commercial and open-source solutions exist mainly for English – Corpora of opinionated texts and databases of affective words (general or domain specific) also exist for these languages • Objective: develop an opinion mining solution for Romanian texts gathered from a wide range of online sources (mostly social media and news items) 22.09.13 ICSCS 2013 . K-TEAMS 2013 Workshop Opinion Mining for Social Media and News Items in Romanian 3
  • 4. Introduction • Popular research domain in the last years • Sentiment, subjectivity, opinion, publicity – Related, but somewhat different • Sentiment or subjectivity in a text: – Positive, negative or neutral – Subjective or objective • Opinionated text – Opinion author – Opinion target (subject) – Opinion (affective) words – Opinion polarity E.g. President Obama declared that the US immigration system is broken. 22.09.13 ICSCS 2013 . K-TEAMS 2013 Workshop Opinion Mining for Social Media and News Items in Romanian 4
  • 5. Previous Work - English 22.09.13 ICSCS 2013 . K-TEAMS 2013 Workshop Opinion Mining for Social Media and News Items in Romanian 5
  • 6. Previous Work - English • Lots of studies and corpora in different domains • The movie reviews dataset – very popular • Initial results using BoW, punctuation, etc. – Accuracy ≈ 80% • Improvement to find relations/dependencies between opinion targets and affective words – Accuracy ≈ 84% • Mining frequent dependency subtrees for positive and negative reviews and using a SVM with these subtrees as features – Accuracy ≈ 88% 22.09.13 ICSCS 2013 . K-TEAMS 2013 Workshop Opinion Mining for Social Media and News Items in Romanian 6
  • 7. Previous Work - Romanian • Use machine translation to generate English texts, then apply opinion mining • Translate affective words databases in Romanian (e.g. WordNet Affect) • Developing new affective words lists • Training and evaluation on specific corpora in Romanian • Problems with NER, dependency parsing, affective words scores 22.09.13 ICSCS 2013 . K-TEAMS 2013 Workshop Opinion Mining for Social Media and News Items in Romanian 7
  • 8. Proposed Solutions • Supervised solution trained for several different opinion subjects (entities) • Three approaches – Bag of words – Affective words and dependency parsing – N-grams probabilities 22.09.13 ICSCS 2013 . K-TEAMS 2013 Workshop Opinion Mining for Social Media and News Items in Romanian 8
  • 9. Bag of Words • Bag of words model: – Tokenization, diacritics restoration, lemmatization – Distinct lemmas selected as features – Improvements: POS filter, word n-grams filter – Used both binary features and TF-IDF 22.09.13 ICSCS 2013 . K-TEAMS 2013 Workshop Opinion Mining for Social Media and News Items in Romanian 9
  • 10. Affective Scores & Dependency Parsing • Compute affective word scores in Romanian: – Translate all the adjectives and adverbs from the English WordNet into Romanian using Google Translate – Uses the probability of each translation pair • Several affective score databases have been translated: SentiWordNet, SenticNet 2 and ANEW • Used the UAIC Romanian FDG parser to identify dependencies between the subject entity and adjectives or adverbs 22.09.13 ICSCS 2013 . K-TEAMS 2013 Workshop Opinion Mining for Social Media and News Items in Romanian 10
  • 11. N-grams Probabilities • Compute the conditional probability for each n-gram in the corpus given that the document is either positive or negative • Then use the following score for each n-gram (feature f): • The score of a new text is computed by summing the scores for each of the n-grams existing in that text 22.09.13 ICSCS 2013 . K-TEAMS 2013 Workshop Opinion Mining for Social Media and News Items in Romanian 11
  • 12. Opinionated Corpus • Corpus manually annotated by analysts for their customers (created by Treeworks for their product ZeList, www.zelist.ro) • ZeList indexes most of the texts published in Romanian in most popular social networks, blogs, online forums, news websites, etc. • Used data for seven different entities (companies or brands) ranging from banks and beer brands and going to web publishers and media corporations • The name of the entities have been anonymized 22.09.13 ICSCS 2013 . K-TEAMS 2013 Workshop Opinion Mining for Social Media and News Items in Romanian 12
  • 13. Opinionated Corpus • Problems: – These texts are very noisy, very heterogeneous, from a wide range of sources and with different writing styles (e.g. Twitter vs. news items) – Some of them also might express positive and negative publicity rather than opinions 22.09.13 ICSCS 2013 . K-TEAMS 2013 Workshop Opinion Mining for Social Media and News Items in Romanian 13
  • 14. Opinionated Corpus • Data about the first version of the corpus • Data collection ranged from a couple of months to a couple of years, depending on the entity • The second version contained a larger export of data for each entity 22.09.13 ICSCS 2013 . K-TEAMS 2013 Workshop Opinion Mining for Social Media and News Items in Romanian 14 Entity Total items Neutral Opinionated Positive Negative Ent1 6055 5853 202 29 173 Ent2 2240 1961 279 222 57 Ent3 343 260 83 64 19 Ent4 1168 876 292 120 172 Ent5 539 520 19 17 2 Ent6 1025 570 455 330 125 Ent7 3787 3016 771 593 178
  • 15. Results - Outline • Results obtained for the first version of the corpus, for all entities • Accuracy positive-negative should be more relevant • Good results for entities with more data, poor results for the ones with a small number of opinionated texts 22.09.13 ICSCS 2013 . K-TEAMS 2013 Workshop Opinion Mining for Social Media and News Items in Romanian 15 Entity Total items Neutral Opinionated Accuracy opinion-neutral Accuracy positive- negative Ent1 6055 5853 202 97.01% 92.07% Ent2 2240 1961 279 91.79% 87.81% Ent3 343 260 83 84.84% 89.15% Ent4 1168 876 292 86.22% 82.19% Ent5 539 520 19 97.40% 57.89% Ent6 1025 570 455 76.20% 84.17% Ent7 3787 3016 771 81.75% 83.65%
  • 16. Results - Comparison • Comparison of the above presented solutions using the second (larger) version of the corpus • Only for one entity by extracting a balanced dataset with 700 positive and 700 negative opinionated texts 22.09.13 ICSCS 2013 . K-TEAMS 2013 Workshop Opinion Mining for Social Media and News Items in Romanian 16 Method Accuracy BoW + POS filter 81.31% BoW only adj. 70.89% BoW only adj. & adv. 76.60% Frequent bigrams 80.88% Frequent trigrams 76.60% Affective scores + dependency parsing 52.18% Affective scores (comparison with 0 decision) 55.35% Trigrams probabilities 88.44% Bigrams probabilities 72.54%
  • 17. Conclusions • Several alternatives for determining the opinion polarity have been evaluated on a corpus manually annotated for different Romanian entities • Best results obtained at this moment: BoW plus a POS filter or a frequent bigrams approach + SVM classifier • Romanian FDG parser does not provide a good accuracy for the dependency parsing task, especially for texts from social media – Texts are somewhat freely written, with little regards to usual form or structure – Improvement of this method & the affective words database are still possible 22.09.13 ICSCS 2013 . K-TEAMS 2013 Workshop Opinion Mining for Social Media and News Items in Romanian 17
  • 18. Thank you! • Questions? • Discussions 22.09.13 CSCS 2013 – Bucharest, Romania 18