This document provides an overview of sentiment analysis and describes a project to create a world sentiment indicator. It discusses how sentiment analysis works, including feature extraction and machine learning classifiers. It also describes building training corpora and testing accuracy. A key part is the Splunk sentiment analysis app, which performs analysis on tweets. The world sentiment indicator project aims to analyze news headlines using sentiment analysis tools and visualize the results. Accuracy depends highly on the quality and size of the training corpus matching the data.
5. Sentiment Analysis
Is the process of examining text or speech to find out
the opinions, views or feelings of the author or speaker
This definition applies to a computer system
When a human does this, it s called reading
The words in the title describe highly subjective and
ambiguous concepts for a human
Even more challenging for a computer program
Opinions, Views, Beliefs, Convictions
6. Words or expressions have different meanings
depending on the knowledge domain (domain of
expertise)
Example: Go Around
Sarcasm, jokes, etc.
Domains of expertise usually have slang
Conclusion:
Sentiment is contextual and domain dependent
Opinions, Views, Beliefs, Convictions
7. Analysis tends to be done by
Domain of expertise
Media channel
Newspaper articles follow grammar rules, use proper words,
no orthographical mistakes
Tweets lack sentence structure, likely use slang, include
emoticons ( , ) and sometimes words are lengthened ( I
looooooove chocolate )
Sentiment Analysis
8. Companies want to know what their
Customers
Competitors
General public
Think about their
Products
Services
Brands
Usually associated with marketing and public relations
Commercial Uses
9. When done correctly, sentiment analysis is powerful
From Tweets to Polls: Linking Text Sentiment to Public
Opinion Time Series , O'Connor et al. 2010
Analysis of surveys on consumer confidence and political
opinion correlate to sentiment word frequencies in Twitter
by as much as 80%
These results highlight the potential of text streams as a
substitute and supplement for traditional polling.
Commercial Uses
10. When not well done
"The Hathaway Effect: How Anne Gives Warren Buffet a
Rise", Dan Mirvish, Huffington Post, 2011
Suspicions that some robotics trading programs in Wall
Street include sentiment analysis
Every time Anne Hathaway makes the headlines, the
stock of Warren Buffet s company Berkshire-Hathaway
goes up
Commercial Uses
12. Sentiment Analysis is text categorization
The results fall into two categories
Polarity
Positive, negative, neutral
Range of polarity
Ratings or rankings
Example: 1 to 5 stars for movie reviews
The Technical Side
13. Extracting and categorizing sentiment is based on features
Frequency: Words that appear most often decide the polarity
Term Presence: Most unique words define polarity
N-Grams: The position of a word determines polarity
Parts of Speech: Adjectives define the polarity
Syntax: Attempts to analyze syntactic relations haven t been very
successful
Negation: Explicit negation terms reverse polarity
Text classifiers tend to use combinations of features
The Technical Side
14. To assign contextual polarity, you need a base
polarity
Use a lexicon, which provides a polarity for each word
Word Phrase Sentence Document
Use training documents
Preferred
The Technical Side
15. Training documents
Contain a number of sentences
Are classified with a specific polarity
Polarity for each word is based on a combination of
feature extractors and its appearance in the different
classifications
The more sentences, the more accurate
Results are placed in a model
The Technical Side
16. Machine learning tools
Naïve Bayes Classifier
Generally use N-grams, frequency, and term of presence. Sometimes
part of speech
Maximum Entropy
Bayes assumes each feature is independent, ME does not
Allows for overlap of words
Support Vector Machines
One vector per feature
Linear, polynomials, sigmoid and other functions are applied to the
vectors
The Technical Side
19. Based on the Naïve Bayes Classifier
Has three commands
Sentiment
Language
Token
Includes a training/testing program and two models
Twitter: 190,862 positive and 37,469
IMDb
Range of polarity from 1 to 10
Each ranking has 11 movie reviews, averaging 200 words
The Splunk Sentiment Analysis App
20. index=twitter lang=en
| where like(text, %love% )
| sentiment twitter text
| stats avg(sentiment)
The Splunk Sentiment Analysis App
22. index=twitter lang=en
| rename entities.hashtags{}.text as hashtags
| fields text, hashtags
| mvexpand hashtags
| where like(hastags, Beliebers )
| sentiment twitter text
| stats avg(sentiment)
The Beliebers Search
23. index=twitter lang=en
| rename entities.hashtags{}.text as hashtags
| fields text, hashtags
| mvexpand hashtags
| where like(hastags, Beliebers )
| sentiment twitter text
| stats avg(sentiment)
The Beliebers Search
So that we don t have to type
entities.hashtags{x}.text everytime we
want to refer to a hashtag, rename this
multi-value field to hashtags
24. index=twitter lang=en
| rename entities.hashtags{}.text as hashtags
| fields text, hashtags
| mvexpand hashtags
| where like(hastags, Beliebers )
| sentiment twitter text
| stats avg(sentiment)
The Beliebers Search
We only want the fields that contain the
tweet and the hashtags
25. index=twitter lang=en
| rename entities.hashtags{}.text as hashtags
| fields text, hashtags
| mvexpand hashtags
| where like(hastags, Beliebers )
| sentiment twitter text
| stats avg(sentiment)
The Beliebers Search
Expand the values of this multi-value
field into separate Splunk events
26. index=twitter lang=en
| rename entities.hashtags{}.text as hashtags
| fields text, hashtags
| mvexpand hashtags
| where like(hastags, Beliebers )
| sentiment twitter text
| stats avg(sentiment)
The Beliebers Search
27. The training corpus is key to accuracy
Beware: Naïve Bayes is not an exact algorithm
The best accuracy obtained using Naïve Bayes is
approximately 83%
Key factors to increase accuracy
Similarity to the data being analyzed
Size of the corpus
Training and Testing Data
28. Training and Testing Data
Test Data Size Accuracy Margin of
Error
University
of Michigan
1.5 Million 72.49% 1.05%
Splunk 228,000 68.79% 1.12%
Sanders 5,500 60.61% 0.76%
31. Based on news headlines
From news web sites all around the world
Collecting RSS feeds in English
The World Sentiment Indicator
32. Steps for this project
1. Collect the RSS feeds
2. Index the headlines into Splunk
3. Define the sentiment corpus
4. Create a visualization of the results
The World Sentiment Indicator
34. Create your own
Crowd-source
University of Michigan ‒ Kaggle competition
Bootstrap
Twitter Sentiment Classification Using Distant Supervision , Go et al,
2010
Uses emoticons to classify tweets
Accuracy for unigrams and bigrams
Naïve Bayes 82.7%
Maximum Entropy 82.7%
Support Vector Machine 81.6%
Training Corpus Creation
35. Issues with subjectivity
Pope Benedict XVI announces resignation
Pope too frail to carry on
Pope steps down as head of Catholic church
Pope quits for health reasons
Average size of RSS headline 47.8 chars, 7.6 words
Twitter average 78 characters, 14 words
Training Corpus Considerations
36. Create a special corpus based on news headlines
Version 1: 100 positive, 100 negative, 100 neutral
Version 2: 200 positive, 200 negative, 200 neutral
Use an existing Twitter corpus
The one included with the Splunk app
University of Michigan
Use a movie review corpus
Pang & Lee: 1,000 positive, 1,000 negative
Training Corpus Strategy
37. Training Corpus Accuracy
Training Corpus Size Accuracy Margin of Error
Headlines V1 300 headlines 38.89% 1.02%
Headlines V2 600 headlines 47.22% 1.05%
Splunk Twitter 228,000 tweets 40.80% 1.16%
U of Michigan 1.5 million tweets 43.81 1.11%
Movie Reviews 2,000 reviews 36.79% 1.23%
39. The key to accuracy is the quality of the training data
Train with the same data you will analyze
Size of the training data improves accuracy
Subjectivity of crowd-sourcing tends to even out as the amount of
training data increases
All machine learning tools tend to converge to similar
levels of accuracy
Use the easiest one for you
Conclusions