We describe a language-independent approach to sentiment analysis (positive or negative emotions) in tweets. We also present our evaluation dataset of human-annotated sentiments in tweets, collected using Amazon Mechanical Turk.
This is the presentation I held at KDML, LWA 2012, Dortmund, Germany.
Visit http://irml.dai-labor.de/ for more information.
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
Â
Language-Independent Twitter Sentiment Analysis
1. Language-Independent Twitter Sentiment Analysis
Sascha Narr, Michael HĂĽlfenhaus, Sahin Albayrak
Sascha Narr
Competence Center Information Retrieval & Machine Learning
KDML 2012, LWA, Dortmund, Germany
2. Overview
â–ş1. Sentiment analysis on social media
â–ş2. Creation of a multilingual evaluation dataset of
tweets
â–ş3. A language-independent sentiment labeling
heuristic for semi-supervised learning
â–ş4. Experiments on the multilingual dataset
18. September 2012 Language-Independent Twitter Sentiment Analysis 2
3. Overview
â–ş1. Sentiment analysis on social media
â–ş2. Creation of a multilingual evaluation dataset of
tweets
â–ş3. A language-independent sentiment labeling
heuristic for semi-supervised learning
â–ş4. Experiments on the multilingual dataset
18. September 2012 Language-Independent Twitter Sentiment Analysis 3
4. 1. Sentiment Analysis on Social Media
â–ş Why Sentiment Analysis?
 People’s opinions and sentiments about products and events
in large numbers are invaluable:
 Market research, product feedback and more
 Sentiment Analysis allows to automatically collect such data
â–ş Why Twitter?
 400 Million tweets posted each day[1]
 Shorter text lengths encourage people to
“just write” what they think
 Tweets are often informal and contain lots of opinions
[1]: http://news.cnet.com/8301-1023 3-57448388-93/twitter-hits-400-million-tweets-per-day-mostly-mobile/
18. September 2012 Language-Independent Twitter Sentiment Analysis 4
5. 1. Methods for Sentiment Classification
â–ş Sentiment classification goals:
 Subjectivity: “Does the tweet contain an opinion?”
 Polarity: “Is the expressed opinion positive or negative?”
â–ş Classifiers used:
 Naive Bayes, Maximum Entropy, Support Vector Machines
â–ş Features used:
 n-grams, WordNet semantics, part-of-speech information
â–ş Tweet texts have unique properties:
 Informal, contain slang, emoticons, misspellings
18. September 2012 Language-Independent Twitter Sentiment Analysis 5
6. 1. Multilingual Sentiment Analysis
â–şLess than 40% of tweets are English [1]
â–şNatural language processing methods are often
designed specifically for one language
â–ş Increase coverage of sentiment analysis by using a
language-independent approach:
No extra effort for additional languages
Is the approach really effective for all languages?
[1] http://semiocast.com/publications/2011_11_24_Arabic_highest_growth_on_Twitter
18. September 2012 Language-Independent Twitter Sentiment Analysis 6
7. Overview
â–ş1. Sentiment analysis on social media
â–ş2. Creation of a multilingual evaluation dataset of
tweets
â–ş3. A language-independent sentiment labeling
heuristic for semi-supervised learning
â–ş4. Experiments on the multilingual dataset
18. September 2012 Language-Independent Twitter Sentiment Analysis 7
8. 2. Creation of a Multilingual Evaluation Dataset
â–ş We created a hand-annotated sentiment evaluation
dataset of over 12000 tweets
 4 languages: English, German, French, Portuguese
â–şUsed the Amazon Mechanical Turk platform for
annotation
â–şEach tweet was annotated by 3 different workers:
 Labels: “positive”, “neutral”, “negative”
 Added validation tweets to try to ensure the quality of the
annotations
18. September 2012 Language-Independent Twitter Sentiment Analysis 8
9. 2. Our Multilingual Evaluation Dataset
â–ş Observed a low inter-annotator agreement in our dataset
 Sentiment classification is a hard task, even for humans
 Tweets that humans disagree on are harder to classify as
well
â–ş The dataset is publicly available for research purposes
Table 1: Tweet counts for the complete annotated dataset
18. September 2012 Language-Independent Twitter Sentiment Analysis 9
10. Overview
â–ş1. Sentiment analysis on social media
â–ş2. Creation of a multilingual evaluation dataset of
tweets
â–ş3. A language-independent sentiment labeling
heuristic for semi-supervised learning
â–ş4. Experiments on the multilingual dataset
18. September 2012 Language-Independent Twitter Sentiment Analysis 10
11. 3. A Language-Independent Heuristic
â–ş To train a sentiment classifier, a large amount of labeled
training data is needed
 Can be obtained without human effort using a previously
proposed heuristic
â–ş The heuristic uses emoticons in tweets as noisy labels
â–ş Heuristic: If a tweet contains only positive emoticons, label its
whole text as positive (and vice versa for negative).
â–ş Examples of emoticons we used:
 Positive: :) :-) =) ;) :] :D ˆ-ˆ ˆ_ˆ
 Negative: :( :-( :(( -.- >:-( D: :/
18. September 2012 Language-Independent Twitter Sentiment Analysis 11
12. 3. Heuristic for Semi-Supervised Learning
â–ş Heuristic can be applied to almost any language, since
emoticons are used extensively on Twitter
â–ş Amount of tweets with emoticons differs among languages
 Caused by many factors like language-specific ways to
express sentiments or different distributions of “formal”
tweets
Table 2: Number of tweets containing emoticons for each language
18. September 2012 Language-Independent Twitter Sentiment Analysis 12
13. Overview
â–ş1. Sentiment analysis on social media
â–ş2. Creation of a multilingual evaluation dataset of
tweets
â–ş3. A language-independent sentiment labeling
heuristic for semi-supervised learning
â–ş4. Experiments on the multilingual dataset
18. September 2012 Language-Independent Twitter Sentiment Analysis 13
14. 4. Experiments – Sentiment Classification
â–ş Data:
 Training: From ~ 800M random tweets of mixed languages:
 Filter for languages: English, German, French, Portuguese
 Use emoticon heuristic to select and label training data
 Evaluation: 12597 hand-annotated tweets (4 languages)
â–ş Setup:
 Classification: Sentiment polarity only
 Classifier: Naive Bayes
 Features: 1-grams and 1, 2-grams
 Trained 4 classifiers for en, de, fr, pt
1 classifier for combined en+de+fr+pt
18. September 2012 Language-Independent Twitter Sentiment Analysis 14
15. 4. Experiments: Evaluation Dataset
â–ş 2 variations of our evaluation set for the experiments:
 agree-3: Tweets all 3 annotators agreed on for a sentiment
 agree-2: Tweets at least 2 annotators agreed on
► Baseline: always guess “positive” (more pos. tweets than neg.)
Table 3: Tweet counts for the evaluation datasets
18. September 2012 Language-Independent Twitter Sentiment Analysis 15
16. 4. Results – English Classifier
â–ş Best results: English classifier using 1-grams, on the 3-agree set
 81.3% accuracy (500k trained tweets)
â–ş Performance on 2-agree set constantly lower than 3-agree
en
18. September 2012 Language-Independent Twitter Sentiment Analysis 16
17. 4. Results – All Languages
en de
fr pt
18. September 2012 Language-Independent Twitter Sentiment Analysis 17
18. 4. Evaluation – All Languages Compared
en de
â–ş Strong differences
between languages
â–ş Differences do not
correlate with number
of emoticons in each fr pt
language
â–ş Emoticon heuristic better
fit for some languages,
may depend on the style of
expressing sentiment in it
► “muito engraçado kkkkkkkk”
Table3: Tweet counts containing emoticons for each language
18. September 2012 Language-Independent Twitter Sentiment Analysis 18
19. 4. Evaluation – Multi-language Classifier
â–ş Tested on combined 4 language evaluation set
â–ş Highest Performance: 71.5% accuracy
 Slightly less than using 4 individual classifiers (73.9% accuracy)
â–ş Usefulness of combined classifier can outweigh performance
degradation
en+de+fr+pt
18. September 2012 Language-Independent Twitter Sentiment Analysis 19
20. Conclusions
â–ş We presented and evaluated a language-independent
sentiment classification approach on 4 languages
 A language-independent classifier can be trained given only
raw tweets, using a noisy label heuristic
 Good performances across languages, varies for each
 Classifiers need a very large number of tweets for training
 Mixed-language classifiers are viable
â–ş Future work:
 Currently we only classify sentiment polarity
 Classifying subjectivity in tweets is important, but finding a
good heuristic to label “neutral” tweets is a challenge
18. September 2012 Language-Independent Twitter Sentiment Analysis 20
21. Language-Independent Twitter Sentiment Analysis
Thanks for your attention!
Questions?
18. September 2012 Language-Independent Twitter Sentiment Analysis 21
22. Contact
Sascha Narr DAI-Labor
Dipl.-Inform. Technische Universität Berlin
Fakultät IV –
Competence Center Information Retrieval & Elektrontechnik & Informatik
Machine Learning
sascha.narr@dai-labor.de Sekretariat TEL 14
Fon +49 (0) 30 / 314 – 74 138 Ernst Reuter Platz 7
Fax +49 (0) 30 / 314 – 74 003 10587 Berlin
www.dai-labor.de
18. September 2012 Language-Independent Twitter Sentiment Analysis 22