SlideShare a Scribd company logo
1 of 22
Language-Independent Twitter Sentiment Analysis
Sascha Narr, Michael HĂĽlfenhaus, Sahin Albayrak


Sascha Narr
Competence Center Information Retrieval & Machine Learning


KDML 2012, LWA, Dortmund, Germany
Overview



â–ş1. Sentiment analysis on social media
â–ş2. Creation of a multilingual evaluation dataset of

 tweets
â–ş3. A language-independent sentiment labeling

 heuristic for semi-supervised learning
â–ş4. Experiments on the multilingual dataset




           18. September 2012   Language-Independent Twitter Sentiment Analysis   2
Overview



â–ş1. Sentiment analysis on social media
â–ş2. Creation of a multilingual evaluation dataset of

 tweets
â–ş3. A language-independent sentiment labeling

 heuristic for semi-supervised learning
â–ş4. Experiments on the multilingual dataset




           18. September 2012   Language-Independent Twitter Sentiment Analysis   3
1. Sentiment Analysis on Social Media


â–ş   Why Sentiment Analysis?
       People’s opinions and sentiments about products and events
        in large numbers are invaluable:
       Market research, product feedback and more
       Sentiment Analysis allows to automatically collect such data

â–ş   Why Twitter?
       400 Million tweets posted each day[1]
       Shorter text lengths encourage people to
        “just write” what they think
       Tweets are often informal and contain lots of opinions


                      [1]: http://news.cnet.com/8301-1023 3-57448388-93/twitter-hits-400-million-tweets-per-day-mostly-mobile/

              18. September 2012         Language-Independent Twitter Sentiment Analysis                                    4
1. Methods for Sentiment Classification

â–ş Sentiment classification goals:
      Subjectivity: “Does the tweet contain an opinion?”
      Polarity: “Is the expressed opinion positive or negative?”
â–ş Classifiers used:

      Naive Bayes, Maximum Entropy, Support Vector Machines
â–ş Features used:

      n-grams, WordNet semantics, part-of-speech information

â–ş   Tweet texts have unique properties:
       Informal, contain slang, emoticons, misspellings



              18. September 2012   Language-Independent Twitter Sentiment Analysis   5
1. Multilingual Sentiment Analysis

â–şLess than 40% of tweets are English [1]
â–şNatural language processing methods are often

 designed specifically for one language

â–ş   Increase coverage of sentiment analysis by using a
    language-independent approach:
       No extra effort for additional languages
       Is the approach really effective for all languages?



                                  [1] http://semiocast.com/publications/2011_11_24_Arabic_highest_growth_on_Twitter


             18. September 2012      Language-Independent Twitter Sentiment Analysis                        6
Overview



â–ş1. Sentiment analysis on social media
â–ş2. Creation of a multilingual evaluation dataset of

 tweets
â–ş3. A language-independent sentiment labeling

 heuristic for semi-supervised learning
â–ş4. Experiments on the multilingual dataset




           18. September 2012   Language-Independent Twitter Sentiment Analysis   7
2. Creation of a Multilingual Evaluation Dataset


â–ş   We created a hand-annotated sentiment evaluation
    dataset of over 12000 tweets
       4 languages: English, German, French, Portuguese
â–şUsed the Amazon Mechanical Turk platform for
 annotation
â–şEach tweet was annotated by 3 different workers:

       Labels: “positive”, “neutral”, “negative”
       Added validation tweets to try to ensure the quality of the
        annotations




             18. September 2012   Language-Independent Twitter Sentiment Analysis   8
2. Our Multilingual Evaluation Dataset

â–ş   Observed a low inter-annotator agreement in our dataset
       Sentiment classification is a hard task, even for humans
       Tweets that humans disagree on are harder to classify as
        well
â–ş   The dataset is publicly available for research purposes




              Table 1: Tweet counts for the complete annotated dataset




             18. September 2012   Language-Independent Twitter Sentiment Analysis   9
Overview



â–ş1. Sentiment analysis on social media
â–ş2. Creation of a multilingual evaluation dataset of

 tweets
â–ş3. A language-independent sentiment labeling

 heuristic for semi-supervised learning
â–ş4. Experiments on the multilingual dataset




           18. September 2012   Language-Independent Twitter Sentiment Analysis   10
3. A Language-Independent Heuristic

â–ş To train a sentiment classifier, a large amount of labeled
  training data is needed
      Can be obtained without human effort using a previously
       proposed heuristic
â–ş The heuristic uses emoticons in tweets as noisy labels




â–ş   Heuristic: If a tweet contains only positive emoticons, label its
    whole text as positive (and vice versa for negative).

â–ş   Examples of emoticons we used:
           Positive:       :) :-) =) ;) :] :D ˆ-ˆ ˆ_ˆ
           Negative:       :( :-( :(( -.- >:-( D: :/


              18. September 2012   Language-Independent Twitter Sentiment Analysis   11
3. Heuristic for Semi-Supervised Learning

â–ş Heuristic can be applied to almost any language, since
  emoticons are used extensively on Twitter
â–ş Amount of tweets with emoticons differs among languages

     Caused by many factors like language-specific ways to
      express sentiments or different distributions of “formal”
      tweets




            Table 2: Number of tweets containing emoticons for each language




            18. September 2012   Language-Independent Twitter Sentiment Analysis   12
Overview



â–ş1. Sentiment analysis on social media
â–ş2. Creation of a multilingual evaluation dataset of

 tweets
â–ş3. A language-independent sentiment labeling

 heuristic for semi-supervised learning
â–ş4. Experiments on the multilingual dataset




           18. September 2012   Language-Independent Twitter Sentiment Analysis   13
4. Experiments – Sentiment Classification

â–ş   Data:
       Training: From ~ 800M random tweets of mixed languages:
           Filter for languages: English, German, French, Portuguese
           Use emoticon heuristic to select and label training data
        Evaluation: 12597 hand-annotated tweets (4 languages)

â–ş   Setup:
        Classification: Sentiment polarity only
        Classifier: Naive Bayes
        Features: 1-grams and 1, 2-grams
        Trained 4 classifiers for en, de, fr, pt
                  1 classifier for combined en+de+fr+pt


              18. September 2012   Language-Independent Twitter Sentiment Analysis   14
4. Experiments: Evaluation Dataset

â–ş 2 variations of our evaluation set for the experiments:
      agree-3: Tweets all 3 annotators agreed on for a sentiment
      agree-2: Tweets at least 2 annotators agreed on
► Baseline: always guess “positive” (more pos. tweets than neg.)




               Table 3: Tweet counts for the evaluation datasets



           18. September 2012   Language-Independent Twitter Sentiment Analysis   15
4. Results – English Classifier

â–ş Best results: English classifier using 1-grams, on the 3-agree set
      81.3% accuracy (500k trained tweets)
â–ş Performance on 2-agree set constantly lower than 3-agree



                                                                en




            18. September 2012   Language-Independent Twitter Sentiment Analysis   16
4. Results – All Languages
                              en                                                de




                              fr                                                pt




         18. September 2012   Language-Independent Twitter Sentiment Analysis        17
4. Evaluation – All Languages Compared
                                                                 en                                 de
â–ş Strong differences
  between languages
â–ş Differences do not

  correlate with number
  of emoticons in each                                             fr                                   pt
  language

â–ş Emoticon heuristic better
  fit for some languages,
  may depend on the style of
  expressing sentiment in it
► “muito engraçado kkkkkkkk”

                                          Table3: Tweet counts containing emoticons for each language



           18. September 2012   Language-Independent Twitter Sentiment Analysis                         18
4. Evaluation – Multi-language Classifier
â–ş Tested on combined 4 language evaluation set
â–ş Highest Performance: 71.5% accuracy

      Slightly less than using 4 individual classifiers (73.9% accuracy)
â–ş Usefulness of combined classifier can outweigh performance

  degradation
                                                   en+de+fr+pt




            18. September 2012   Language-Independent Twitter Sentiment Analysis   19
Conclusions

â–ş   We presented and evaluated a language-independent
    sentiment classification approach on 4 languages
        A language-independent classifier can be trained given only
         raw tweets, using a noisy label heuristic
        Good performances across languages, varies for each
        Classifiers need a very large number of tweets for training
        Mixed-language classifiers are viable

â–ş   Future work:
        Currently we only classify sentiment polarity
        Classifying subjectivity in tweets is important, but finding a
         good heuristic to label “neutral” tweets is a challenge

               18. September 2012   Language-Independent Twitter Sentiment Analysis   20
Language-Independent Twitter Sentiment Analysis




         Thanks for your attention!

                            Questions?



           18. September 2012   Language-Independent Twitter Sentiment Analysis   21
Contact


Sascha Narr                                            DAI-Labor
Dipl.-Inform.                                          Technische Universität Berlin




                                                       Fakultät IV –
Competence Center Information Retrieval &              Elektrontechnik & Informatik
Machine Learning

sascha.narr@dai-labor.de                               Sekretariat TEL 14
Fon +49 (0) 30 / 314 – 74 138                          Ernst Reuter Platz 7
Fax +49 (0) 30 / 314 – 74 003                          10587 Berlin




                                                        www.dai-labor.de

                18. September 2012   Language-Independent Twitter Sentiment Analysis   22

More Related Content

Viewers also liked

Text Classification, Sentiment Analysis, and Opinion Mining
Text Classification, Sentiment Analysis, and Opinion MiningText Classification, Sentiment Analysis, and Opinion Mining
Text Classification, Sentiment Analysis, and Opinion MiningFabrizio Sebastiani
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSumit Raj
 
Algorithm Name Detection & Extraction
Algorithm Name Detection & ExtractionAlgorithm Name Detection & Extraction
Algorithm Name Detection & ExtractionDeeksha thakur
 
Unsupervised Sentiment Analysis
Unsupervised Sentiment AnalysisUnsupervised Sentiment Analysis
Unsupervised Sentiment AnalysisTaras Zagibalov
 
Twitter Sentiment Analysis
Twitter Sentiment AnalysisTwitter Sentiment Analysis
Twitter Sentiment AnalysisAyush Khandelwal
 
Sentiment analysis
Sentiment analysisSentiment analysis
Sentiment analysisMakrand Patil
 
Text classification & sentiment analysis
Text classification & sentiment analysisText classification & sentiment analysis
Text classification & sentiment analysisM. Atif Qureshi
 
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...Srivatsan Ramanujam
 
Best Practices for Sentiment Analysis Webinar
Best Practices for Sentiment Analysis Webinar Best Practices for Sentiment Analysis Webinar
Best Practices for Sentiment Analysis Webinar Mechanical Turk
 
Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms
Sentiment Analysis Using Hybrid Structure of Machine Learning AlgorithmsSentiment Analysis Using Hybrid Structure of Machine Learning Algorithms
Sentiment Analysis Using Hybrid Structure of Machine Learning AlgorithmsSangeeth Nagarajan
 
Sentiment Analysis with NVivo 11 Plus
Sentiment Analysis with NVivo 11 PlusSentiment Analysis with NVivo 11 Plus
Sentiment Analysis with NVivo 11 PlusShalin Hai-Jew
 
Practical Sentiment Analysis
Practical Sentiment AnalysisPractical Sentiment Analysis
Practical Sentiment AnalysisPeople Pattern
 
Can Deep Learning solve the Sentiment Analysis Problem
Can Deep Learning solve the Sentiment Analysis ProblemCan Deep Learning solve the Sentiment Analysis Problem
Can Deep Learning solve the Sentiment Analysis ProblemMark Cieliebak
 
Sentiment Analysis via R Programming
Sentiment Analysis via R ProgrammingSentiment Analysis via R Programming
Sentiment Analysis via R ProgrammingSkillspeed
 
Sentiment analysis-by-nltk
Sentiment analysis-by-nltkSentiment analysis-by-nltk
Sentiment analysis-by-nltkWei-Ting Kuo
 
MTech Seminar Presentation [IIT-Bombay]
MTech Seminar Presentation [IIT-Bombay]MTech Seminar Presentation [IIT-Bombay]
MTech Seminar Presentation [IIT-Bombay]Sagar Ahire
 
R by example: mining Twitter for consumer attitudes towards airlines
R by example: mining Twitter for consumer attitudes towards airlinesR by example: mining Twitter for consumer attitudes towards airlines
R by example: mining Twitter for consumer attitudes towards airlinesJeffrey Breen
 

Viewers also liked (20)

Text Classification, Sentiment Analysis, and Opinion Mining
Text Classification, Sentiment Analysis, and Opinion MiningText Classification, Sentiment Analysis, and Opinion Mining
Text Classification, Sentiment Analysis, and Opinion Mining
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
 
P3
P3P3
P3
 
Algorithm Name Detection & Extraction
Algorithm Name Detection & ExtractionAlgorithm Name Detection & Extraction
Algorithm Name Detection & Extraction
 
Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new datase...
Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new datase...Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new datase...
Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new datase...
 
Unsupervised Sentiment Analysis
Unsupervised Sentiment AnalysisUnsupervised Sentiment Analysis
Unsupervised Sentiment Analysis
 
Twitter Sentiment Analysis
Twitter Sentiment AnalysisTwitter Sentiment Analysis
Twitter Sentiment Analysis
 
Sentiment analysis
Sentiment analysisSentiment analysis
Sentiment analysis
 
Text classification & sentiment analysis
Text classification & sentiment analysisText classification & sentiment analysis
Text classification & sentiment analysis
 
SentiCheNews - Sentiment Analysis on Newspapers and Tweets
SentiCheNews - Sentiment Analysis on Newspapers and TweetsSentiCheNews - Sentiment Analysis on Newspapers and Tweets
SentiCheNews - Sentiment Analysis on Newspapers and Tweets
 
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
 
Best Practices for Sentiment Analysis Webinar
Best Practices for Sentiment Analysis Webinar Best Practices for Sentiment Analysis Webinar
Best Practices for Sentiment Analysis Webinar
 
Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms
Sentiment Analysis Using Hybrid Structure of Machine Learning AlgorithmsSentiment Analysis Using Hybrid Structure of Machine Learning Algorithms
Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms
 
Sentiment Analysis with NVivo 11 Plus
Sentiment Analysis with NVivo 11 PlusSentiment Analysis with NVivo 11 Plus
Sentiment Analysis with NVivo 11 Plus
 
Practical Sentiment Analysis
Practical Sentiment AnalysisPractical Sentiment Analysis
Practical Sentiment Analysis
 
Can Deep Learning solve the Sentiment Analysis Problem
Can Deep Learning solve the Sentiment Analysis ProblemCan Deep Learning solve the Sentiment Analysis Problem
Can Deep Learning solve the Sentiment Analysis Problem
 
Sentiment Analysis via R Programming
Sentiment Analysis via R ProgrammingSentiment Analysis via R Programming
Sentiment Analysis via R Programming
 
Sentiment analysis-by-nltk
Sentiment analysis-by-nltkSentiment analysis-by-nltk
Sentiment analysis-by-nltk
 
MTech Seminar Presentation [IIT-Bombay]
MTech Seminar Presentation [IIT-Bombay]MTech Seminar Presentation [IIT-Bombay]
MTech Seminar Presentation [IIT-Bombay]
 
R by example: mining Twitter for consumer attitudes towards airlines
R by example: mining Twitter for consumer attitudes towards airlinesR by example: mining Twitter for consumer attitudes towards airlines
R by example: mining Twitter for consumer attitudes towards airlines
 

Similar to Language-Independent Twitter Sentiment Analysis

Sentiment Analysis and Political Disaffection in Italy
Sentiment Analysis and Political Disaffection in ItalySentiment Analysis and Political Disaffection in Italy
Sentiment Analysis and Political Disaffection in ItalyCorrado Monti
 
D. Zardetto, Using Twitter data for the Social Mood on Economy Index
D. Zardetto, Using Twitter data for the Social Mood on Economy Index D. Zardetto, Using Twitter data for the Social Mood on Economy Index
D. Zardetto, Using Twitter data for the Social Mood on Economy Index Istituto nazionale di statistica
 
Affect Level Opinion Mining
Affect Level Opinion MiningAffect Level Opinion Mining
Affect Level Opinion MiningYasas Senarath
 
Rethinking Social Media Measurement
Rethinking Social Media MeasurementRethinking Social Media Measurement
Rethinking Social Media MeasurementMasood Akhtar
 
A tailor-made one-size-fits-all approach to sentiment analysis
A tailor-made one-size-fits-all approach to sentiment analysisA tailor-made one-size-fits-all approach to sentiment analysis
A tailor-made one-size-fits-all approach to sentiment analysisDiana Maynard
 
Project report
Project reportProject report
Project reportUtkarsh Soni
 
Sentiment Analysis of Social Media Content: A multi-tool for listening to you...
Sentiment Analysis of Social Media Content: A multi-tool for listening to you...Sentiment Analysis of Social Media Content: A multi-tool for listening to you...
Sentiment Analysis of Social Media Content: A multi-tool for listening to you...Eirini Ntoutsi
 
This assignment allows you to demonstrate mastery of outcome # 2.docx
This assignment allows you to demonstrate mastery of outcome # 2.docxThis assignment allows you to demonstrate mastery of outcome # 2.docx
This assignment allows you to demonstrate mastery of outcome # 2.docxhowardh5
 
IRJET- Real Time Sentiment Analysis of Political Twitter Data using Machi...
IRJET-  	  Real Time Sentiment Analysis of Political Twitter Data using Machi...IRJET-  	  Real Time Sentiment Analysis of Political Twitter Data using Machi...
IRJET- Real Time Sentiment Analysis of Political Twitter Data using Machi...IRJET Journal
 
Detecting insults in social media conversations
Detecting insults in social media conversationsDetecting insults in social media conversations
Detecting insults in social media conversationsraj
 
Sentiment analysis - Our approach and use cases
Sentiment analysis - Our approach and use casesSentiment analysis - Our approach and use cases
Sentiment analysis - Our approach and use casesKarol Chlasta
 
Intellexy social media analysis solutions d2011
Intellexy social media analysis solutions d2011Intellexy social media analysis solutions d2011
Intellexy social media analysis solutions d2011Maya Marashlian
 
Intellexy Social Media Monitoring and Analysis Solutions D2011
Intellexy Social Media Monitoring and Analysis Solutions D2011Intellexy Social Media Monitoring and Analysis Solutions D2011
Intellexy Social Media Monitoring and Analysis Solutions D2011MayaMar
 
A User Modeling Oriented Analysis of Cultural Backgrounds in Microblogging
A User Modeling Oriented Analysis of Cultural Backgrounds in MicrobloggingA User Modeling Oriented Analysis of Cultural Backgrounds in Microblogging
A User Modeling Oriented Analysis of Cultural Backgrounds in MicrobloggingElena Daehnhardt
 
To Label or Not? Advances and Open Challenges in SE-specific Sentiment Analysis
To Label or Not? Advances and Open Challenges in SE-specific Sentiment AnalysisTo Label or Not? Advances and Open Challenges in SE-specific Sentiment Analysis
To Label or Not? Advances and Open Challenges in SE-specific Sentiment AnalysisNicole Novielli
 
Exciting Strategies for GED Test Preparation Instruction
Exciting Strategies for GED Test Preparation InstructionExciting Strategies for GED Test Preparation Instruction
Exciting Strategies for GED Test Preparation InstructionMeagen Farrell
 
VenTESOL Social Media for Effective Teacher Development
VenTESOL Social Media for Effective Teacher DevelopmentVenTESOL Social Media for Effective Teacher Development
VenTESOL Social Media for Effective Teacher DevelopmentAndrés Ramos
 
Twitter, sentiment and finance: how qualitative information and markets are r...
Twitter, sentiment and finance: how qualitative information and markets are r...Twitter, sentiment and finance: how qualitative information and markets are r...
Twitter, sentiment and finance: how qualitative information and markets are r...Giacomo Carozza
 
Multi-lingual Twitter sentiment analysis using machine learning
Multi-lingual Twitter sentiment analysis using machine learning Multi-lingual Twitter sentiment analysis using machine learning
Multi-lingual Twitter sentiment analysis using machine learning IJECEIAES
 

Similar to Language-Independent Twitter Sentiment Analysis (20)

Sentiment Analysis and Political Disaffection in Italy
Sentiment Analysis and Political Disaffection in ItalySentiment Analysis and Political Disaffection in Italy
Sentiment Analysis and Political Disaffection in Italy
 
D. Zardetto, Using Twitter data for the Social Mood on Economy Index
D. Zardetto, Using Twitter data for the Social Mood on Economy Index D. Zardetto, Using Twitter data for the Social Mood on Economy Index
D. Zardetto, Using Twitter data for the Social Mood on Economy Index
 
Affect Level Opinion Mining
Affect Level Opinion MiningAffect Level Opinion Mining
Affect Level Opinion Mining
 
Rethinking Social Media Measurement
Rethinking Social Media MeasurementRethinking Social Media Measurement
Rethinking Social Media Measurement
 
A tailor-made one-size-fits-all approach to sentiment analysis
A tailor-made one-size-fits-all approach to sentiment analysisA tailor-made one-size-fits-all approach to sentiment analysis
A tailor-made one-size-fits-all approach to sentiment analysis
 
Project report
Project reportProject report
Project report
 
Perspective pitch
Perspective pitchPerspective pitch
Perspective pitch
 
Sentiment Analysis of Social Media Content: A multi-tool for listening to you...
Sentiment Analysis of Social Media Content: A multi-tool for listening to you...Sentiment Analysis of Social Media Content: A multi-tool for listening to you...
Sentiment Analysis of Social Media Content: A multi-tool for listening to you...
 
This assignment allows you to demonstrate mastery of outcome # 2.docx
This assignment allows you to demonstrate mastery of outcome # 2.docxThis assignment allows you to demonstrate mastery of outcome # 2.docx
This assignment allows you to demonstrate mastery of outcome # 2.docx
 
IRJET- Real Time Sentiment Analysis of Political Twitter Data using Machi...
IRJET-  	  Real Time Sentiment Analysis of Political Twitter Data using Machi...IRJET-  	  Real Time Sentiment Analysis of Political Twitter Data using Machi...
IRJET- Real Time Sentiment Analysis of Political Twitter Data using Machi...
 
Detecting insults in social media conversations
Detecting insults in social media conversationsDetecting insults in social media conversations
Detecting insults in social media conversations
 
Sentiment analysis - Our approach and use cases
Sentiment analysis - Our approach and use casesSentiment analysis - Our approach and use cases
Sentiment analysis - Our approach and use cases
 
Intellexy social media analysis solutions d2011
Intellexy social media analysis solutions d2011Intellexy social media analysis solutions d2011
Intellexy social media analysis solutions d2011
 
Intellexy Social Media Monitoring and Analysis Solutions D2011
Intellexy Social Media Monitoring and Analysis Solutions D2011Intellexy Social Media Monitoring and Analysis Solutions D2011
Intellexy Social Media Monitoring and Analysis Solutions D2011
 
A User Modeling Oriented Analysis of Cultural Backgrounds in Microblogging
A User Modeling Oriented Analysis of Cultural Backgrounds in MicrobloggingA User Modeling Oriented Analysis of Cultural Backgrounds in Microblogging
A User Modeling Oriented Analysis of Cultural Backgrounds in Microblogging
 
To Label or Not? Advances and Open Challenges in SE-specific Sentiment Analysis
To Label or Not? Advances and Open Challenges in SE-specific Sentiment AnalysisTo Label or Not? Advances and Open Challenges in SE-specific Sentiment Analysis
To Label or Not? Advances and Open Challenges in SE-specific Sentiment Analysis
 
Exciting Strategies for GED Test Preparation Instruction
Exciting Strategies for GED Test Preparation InstructionExciting Strategies for GED Test Preparation Instruction
Exciting Strategies for GED Test Preparation Instruction
 
VenTESOL Social Media for Effective Teacher Development
VenTESOL Social Media for Effective Teacher DevelopmentVenTESOL Social Media for Effective Teacher Development
VenTESOL Social Media for Effective Teacher Development
 
Twitter, sentiment and finance: how qualitative information and markets are r...
Twitter, sentiment and finance: how qualitative information and markets are r...Twitter, sentiment and finance: how qualitative information and markets are r...
Twitter, sentiment and finance: how qualitative information and markets are r...
 
Multi-lingual Twitter sentiment analysis using machine learning
Multi-lingual Twitter sentiment analysis using machine learning Multi-lingual Twitter sentiment analysis using machine learning
Multi-lingual Twitter sentiment analysis using machine learning
 

Recently uploaded

UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 

Recently uploaded (20)

UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 

Language-Independent Twitter Sentiment Analysis

  • 1. Language-Independent Twitter Sentiment Analysis Sascha Narr, Michael HĂĽlfenhaus, Sahin Albayrak Sascha Narr Competence Center Information Retrieval & Machine Learning KDML 2012, LWA, Dortmund, Germany
  • 2. Overview â–ş1. Sentiment analysis on social media â–ş2. Creation of a multilingual evaluation dataset of tweets â–ş3. A language-independent sentiment labeling heuristic for semi-supervised learning â–ş4. Experiments on the multilingual dataset 18. September 2012 Language-Independent Twitter Sentiment Analysis 2
  • 3. Overview â–ş1. Sentiment analysis on social media â–ş2. Creation of a multilingual evaluation dataset of tweets â–ş3. A language-independent sentiment labeling heuristic for semi-supervised learning â–ş4. Experiments on the multilingual dataset 18. September 2012 Language-Independent Twitter Sentiment Analysis 3
  • 4. 1. Sentiment Analysis on Social Media â–ş Why Sentiment Analysis?  People’s opinions and sentiments about products and events in large numbers are invaluable:  Market research, product feedback and more  Sentiment Analysis allows to automatically collect such data â–ş Why Twitter?  400 Million tweets posted each day[1]  Shorter text lengths encourage people to “just write” what they think  Tweets are often informal and contain lots of opinions [1]: http://news.cnet.com/8301-1023 3-57448388-93/twitter-hits-400-million-tweets-per-day-mostly-mobile/ 18. September 2012 Language-Independent Twitter Sentiment Analysis 4
  • 5. 1. Methods for Sentiment Classification â–ş Sentiment classification goals:  Subjectivity: “Does the tweet contain an opinion?”  Polarity: “Is the expressed opinion positive or negative?” â–ş Classifiers used:  Naive Bayes, Maximum Entropy, Support Vector Machines â–ş Features used:  n-grams, WordNet semantics, part-of-speech information â–ş Tweet texts have unique properties:  Informal, contain slang, emoticons, misspellings 18. September 2012 Language-Independent Twitter Sentiment Analysis 5
  • 6. 1. Multilingual Sentiment Analysis â–şLess than 40% of tweets are English [1] â–şNatural language processing methods are often designed specifically for one language â–ş Increase coverage of sentiment analysis by using a language-independent approach: No extra effort for additional languages Is the approach really effective for all languages? [1] http://semiocast.com/publications/2011_11_24_Arabic_highest_growth_on_Twitter 18. September 2012 Language-Independent Twitter Sentiment Analysis 6
  • 7. Overview â–ş1. Sentiment analysis on social media â–ş2. Creation of a multilingual evaluation dataset of tweets â–ş3. A language-independent sentiment labeling heuristic for semi-supervised learning â–ş4. Experiments on the multilingual dataset 18. September 2012 Language-Independent Twitter Sentiment Analysis 7
  • 8. 2. Creation of a Multilingual Evaluation Dataset â–ş We created a hand-annotated sentiment evaluation dataset of over 12000 tweets  4 languages: English, German, French, Portuguese â–şUsed the Amazon Mechanical Turk platform for annotation â–şEach tweet was annotated by 3 different workers:  Labels: “positive”, “neutral”, “negative”  Added validation tweets to try to ensure the quality of the annotations 18. September 2012 Language-Independent Twitter Sentiment Analysis 8
  • 9. 2. Our Multilingual Evaluation Dataset â–ş Observed a low inter-annotator agreement in our dataset  Sentiment classification is a hard task, even for humans  Tweets that humans disagree on are harder to classify as well â–ş The dataset is publicly available for research purposes Table 1: Tweet counts for the complete annotated dataset 18. September 2012 Language-Independent Twitter Sentiment Analysis 9
  • 10. Overview â–ş1. Sentiment analysis on social media â–ş2. Creation of a multilingual evaluation dataset of tweets â–ş3. A language-independent sentiment labeling heuristic for semi-supervised learning â–ş4. Experiments on the multilingual dataset 18. September 2012 Language-Independent Twitter Sentiment Analysis 10
  • 11. 3. A Language-Independent Heuristic â–ş To train a sentiment classifier, a large amount of labeled training data is needed  Can be obtained without human effort using a previously proposed heuristic â–ş The heuristic uses emoticons in tweets as noisy labels â–ş Heuristic: If a tweet contains only positive emoticons, label its whole text as positive (and vice versa for negative). â–ş Examples of emoticons we used:  Positive: :) :-) =) ;) :] :D ˆ-ˆ ˆ_ˆ  Negative: :( :-( :(( -.- >:-( D: :/ 18. September 2012 Language-Independent Twitter Sentiment Analysis 11
  • 12. 3. Heuristic for Semi-Supervised Learning â–ş Heuristic can be applied to almost any language, since emoticons are used extensively on Twitter â–ş Amount of tweets with emoticons differs among languages  Caused by many factors like language-specific ways to express sentiments or different distributions of “formal” tweets Table 2: Number of tweets containing emoticons for each language 18. September 2012 Language-Independent Twitter Sentiment Analysis 12
  • 13. Overview â–ş1. Sentiment analysis on social media â–ş2. Creation of a multilingual evaluation dataset of tweets â–ş3. A language-independent sentiment labeling heuristic for semi-supervised learning â–ş4. Experiments on the multilingual dataset 18. September 2012 Language-Independent Twitter Sentiment Analysis 13
  • 14. 4. Experiments – Sentiment Classification â–ş Data:  Training: From ~ 800M random tweets of mixed languages:  Filter for languages: English, German, French, Portuguese  Use emoticon heuristic to select and label training data  Evaluation: 12597 hand-annotated tweets (4 languages) â–ş Setup:  Classification: Sentiment polarity only  Classifier: Naive Bayes  Features: 1-grams and 1, 2-grams  Trained 4 classifiers for en, de, fr, pt 1 classifier for combined en+de+fr+pt 18. September 2012 Language-Independent Twitter Sentiment Analysis 14
  • 15. 4. Experiments: Evaluation Dataset â–ş 2 variations of our evaluation set for the experiments:  agree-3: Tweets all 3 annotators agreed on for a sentiment  agree-2: Tweets at least 2 annotators agreed on â–ş Baseline: always guess “positive” (more pos. tweets than neg.) Table 3: Tweet counts for the evaluation datasets 18. September 2012 Language-Independent Twitter Sentiment Analysis 15
  • 16. 4. Results – English Classifier â–ş Best results: English classifier using 1-grams, on the 3-agree set  81.3% accuracy (500k trained tweets) â–ş Performance on 2-agree set constantly lower than 3-agree en 18. September 2012 Language-Independent Twitter Sentiment Analysis 16
  • 17. 4. Results – All Languages en de fr pt 18. September 2012 Language-Independent Twitter Sentiment Analysis 17
  • 18. 4. Evaluation – All Languages Compared en de â–ş Strong differences between languages â–ş Differences do not correlate with number of emoticons in each fr pt language â–ş Emoticon heuristic better fit for some languages, may depend on the style of expressing sentiment in it â–ş “muito engraçado kkkkkkkk” Table3: Tweet counts containing emoticons for each language 18. September 2012 Language-Independent Twitter Sentiment Analysis 18
  • 19. 4. Evaluation – Multi-language Classifier â–ş Tested on combined 4 language evaluation set â–ş Highest Performance: 71.5% accuracy  Slightly less than using 4 individual classifiers (73.9% accuracy) â–ş Usefulness of combined classifier can outweigh performance degradation en+de+fr+pt 18. September 2012 Language-Independent Twitter Sentiment Analysis 19
  • 20. Conclusions â–ş We presented and evaluated a language-independent sentiment classification approach on 4 languages  A language-independent classifier can be trained given only raw tweets, using a noisy label heuristic  Good performances across languages, varies for each  Classifiers need a very large number of tweets for training  Mixed-language classifiers are viable â–ş Future work:  Currently we only classify sentiment polarity  Classifying subjectivity in tweets is important, but finding a good heuristic to label “neutral” tweets is a challenge 18. September 2012 Language-Independent Twitter Sentiment Analysis 20
  • 21. Language-Independent Twitter Sentiment Analysis Thanks for your attention! Questions? 18. September 2012 Language-Independent Twitter Sentiment Analysis 21
  • 22. Contact Sascha Narr DAI-Labor Dipl.-Inform. Technische Universität Berlin Fakultät IV – Competence Center Information Retrieval & Elektrontechnik & Informatik Machine Learning sascha.narr@dai-labor.de Sekretariat TEL 14 Fon +49 (0) 30 / 314 – 74 138 Ernst Reuter Platz 7 Fax +49 (0) 30 / 314 – 74 003 10587 Berlin www.dai-labor.de 18. September 2012 Language-Independent Twitter Sentiment Analysis 22