Are Twitter Users Equal in Predicting Elections

Are Twitter Users Equal in
Predicting Elections?
A Study of User Groups in Predicting 2012 U.S.
Republican Presidential Primaries
1
Lu Chen, Wenbo Wang, Amit Sheth. Are Twitter Users Equal in Predicting Elections? A Study of User Groups in
Predicting 2012 U.S. Republican Presidential Primaries. The 4th International Conference on Social Informatics
(SocInfo2012), 2012.
Lu Chen
chen@knoesis.org
Wenbo Wang
wenbo@knoesis.org
Amit Sheth
amit@knoesis.org

There is a surge of interest in building systems that harness the
power of social data to predict election results.
Are Twitter Users Equal in Predicting Elections? Lu Chen, Wenbo Wang, Amit Sheth 2
# of Facebook users
talking about each
candidate; who is talking
about which candidate :
age, gender, state
Twitter users’
Positive/negative
opinions about
each candidate
Tweets from
@BarackObama and
@MittRomney organized
by engagement on Twitter
# of Facebook
“likes” & Twitter
“follower”
Real time semantic
analysis of topic,
opinion, emotion, and
popularity about each
candidate

3
One problem seems to be ignored:
Are social media users equal
in predicting elections?
They may be from different countries and states.
They may be have different political beliefs.
They may be of different ages.
They may engage in the elections in different ways
and with different levels of involvement.
……
They may be … different in predicting elections…?
Are Twitter Users Equal in Predicting Elections? Lu Chen, Wenbo Wang, Amit Sheth
WHOSE opinion really matters?

4
o We Study different groups of
social media users who engage in
the discussions of 2012 U.S.
Republican Presidential Primaries,
and compare the predictive power
among these user groups.
Data: Using Twitter Streaming API, we collected tweets that contain the words
“gingrich”, “romney”, “ron paul”, or “santorum” from 01/10/2012 to 03/05/2012 (Super
Tuesday was 03/06/2012). The dataset comprises 6,008,062 tweets from 933,343 users.

User Categorization
5
1. Engagement
Degree
2. Tweet Mode 3. Content Type 4. Political Preference

1
6
 More than half of the users posted only one tweet. Only 8% of the
users posted more than 10 tweets.
 A small group of users (0.23%) can produce a large amount of tweets
(23.73%) – Is tweet volume a reliable predictor?
 The usage of hashtags and URLs reflects the users' intent to attract
people's attention on the topic they discuss. The more engaged users
show stronger such intent and are more involved in the election event.
2

3
7
 The original tweet-dominant group accounts for the biggest
proportion of users in every user engagement group.
 A significant number of users (34.71% of all the users) belong to the
retweet -dominant group, whose voting intent might be more difficult
to detect.
Engagement
Degree
According to users' preference on generating their tweets, i.e., tweet mode, we
classified the users as original tweet-dominant, original tweet-prone, balanced,
retweet-prone and retweet-dominant.

4
8
 More engaged users tend to post a mixture of content, with similar
proportion of opinion and information, or larger proportion of
information.
Engagement
Degree
We use target-specific sentiment analysis techniques to classify each tweet as
positive or negative – whether the expressed opinion about a specific candidate is
positive or negative. The users are categorized based on whether they post more
information or more opinion.

5
9
 Right-leaning users were (as expected) more involved in republican
primaries in several ways: more users, more tweets, more original
tweets, higher usage of hashtags and URLs.
We collected a set of Twitter users with known political preference from Twellow
(http://www.twellow.com/categories/politics). Based on the assumption that a user tends
to follow others who share the same political preference as his/hers, we identified the
left-leaning and right-leaning users utilizing their following/follower relations. We
tested this method using a datasets of 3341 users, and it showed an accuracy of 0.9243.

6
10
 The Pearson's r for the correlation between the number of users/tweets
and the population is 0.9459/0.9667 (p<.0001).
 We utilized the background knowledge from LinkedGeoData to identify the
states from user location information.
 If the user's state could not be inferred from his/her location in the profile, we
utilized the geographic locations of his/her tweets. A user was recognized as from
a state if his/her tweets were from that state.

Predicting a User's Vote
• Basic idea: for which candidate the user shows the most support
– Frequent mentions
– Positive sentiment
11
Nm(c): the number of tweets mentioning the candidate c
Npos(c): the number of positive tweets about candidate c
Nneg(c): the number of negative tweets about candidate c
(0 < < 1): smoothing parameter
(0 < < 1): discounting the score when the user does not
express any opinion towards c.
The user
posted opinion
about c
The user
mentioned c but
did not post
opinion about c
More mentions,
higher score
More positive/less
negative opinions,
higher score

Prediction Results
12
We examine the predictive power of different user groups in predicting the
results of Super Tuesday races in 10 states.
To predict the election results in a state, we used only the collection of
users who are identified from that state.
The results were evaluated in two ways: (1) the accuracy of predicting
winners, and (2) the error rate between the predicted percentage of votes
and the actual percentage of votes for each candidate.
We examined four time windows -- 7 days, 14 days, 28 days and 56 days
prior to the election day. In a specific time window, a user's vote was
assessed using only the set of tweets he/she created during this time.

7
13
The prediction accuracy:
 Engagement Degree: High > Low or Very Low
 Tweet Mode: Original Tweet-Prone > Retweet-Prone
 Content Type: In a draw
 Political Preference: Right-Leaning >> Left Leaning

14
Revealing the challenge of
identifying the vote intent of “silent
majority”
Retweets may not necessarily
reflect users' attitude.
Prediction of user’s vote based on
more opinion tweets is not
necessarily more accurate than the
prediction using more information
tweets
The right-leaning user group provides
the most accurate prediction result. In
the best case (56-day time window), it
correctly predict the winners in 8 out
of 10 states with an average
prediction error of 0.1.
To some extent, it demonstrates the
importance of identifying likely voters
in electoral prediction.
8

15
Our findings
Twitter users are not “equal”
in predicting elections!
The likely voters’ opinions matter more.
Some users’ opinions are more difficult to identify because
of their lower levels of engagement
or the implicitly of their ways to express opinions.

More Work need to be
done…
• Identifying likely/actual voters
• Improving sentiment analysis
techniques
• Investigating possible data biases
(e.g., spam tweets and political
campaign tweets) and how they
might affect the results
and more …
16Are Twitter Users Equal in Predicting Elections? Lu Chen, Wenbo Wang, Amit Sheth

It is actually about tracking public opinion.
PollingorSocial Media Analysis?
1. Sample size
2. Representative of the target population
3. Accurate measure of opinions
4. Timeliness

1 Sample Size
Polling Social Media Analysis
Thousands of people Millions of people

19
2 Representative of the Target Population
[1] Can Social Media Be Used for Political Polling? http://www.radian6.com/blog/2012/07/can-social-media-be-used-for-political-polling/
 About 95% of US homes can be
reached by landline telephone and
cell phone.
 Sampling the target population
randomly.
 Weighting the sample to census
estimates for demographic
characteristics (gender, race, age,
educational attainment, and
region).
 About 60% of American adults
use social networking sites.
 Difficult to do random sampling.
 Limited demographic data
(although with some work, can be
improved).

3 Accurate measure of opinions
 Ask people what they think
 Look at what people talk about
and extract their opinions
 Not as accurate as Polling
Who will
you vote
for?
……

4 Timeliness
What is happening now
Not be able to track people’s
opinion in real time

Social Media Analysis – Promising but Very
Challenging
22
 Increasing number of social
media users
 Convenient and comfortable
way to express opinions
 The analysis can be done in real
time
 Lower cost
A great complement (if not
substitute) for polling
 Extracting demographic
information
 Identifying the target population
whose opinion matter, e.g. the
likely voters in electoral prediction
 Discriminate personal opinion
from the voice of mainstream
media and political campaign
 More accurate sentiment
analysis/opinion mining,
especially the identification of
opinions about a specific object

Subjective Information Extraction, Lu Chen 23
Our Twitris+ System kept tracking
people’s opinion on 2012 U.S.
Presidential Election in real time and this
is what we saw on the Election Day …

The screenshots of Twitris+ were taken on Nov. 6th 6 PM EST

Twitris+: http://twitris.knoesis.org/
Select event
Select date
Related tweets Reference news Wikipedia articles
N-gram summaries
Multi-faceted
Analysis

26
Sentiment change about
Barack Obama
Sentiment change about
Mitt Romney
Positive/negative topics
that contribute to such
change
Analysis can be
performed at location or
issue based level
 A key innovation in sentiment analysis, employed in Twitris+, is topic specific sentiment
analysis -- to associate sentiment with an entity. The same sentiment phrases may assigned
different polarities associated with different entities.
 Twitris+ tracks sentiment trend about different entities, and identifies topics/events that
contribute to sentiment changes. The result is updated every hour.

Twitris+ Insights in 2012 Presidential Debates
27
How was Obama doing in the first debate?

28
How was Obama doing in the second debate?
Red Color: Negative Topics
Green Color: Positive Topics

29
Obama VS Romney in the third debate
Obama
Romney

Thank you !
More about this study:
http://wiki.knoesis.org/index.php/ElectionPrediction
Kno.e.sis Center:
http://knoesis.wright.edu/
Twitris+:
http://twitris.knoesis.org/
Semantics driven Analysis of Social Media:
http://knoesis.org/research/semweb/projects/socialmedia

Are Twitter Users Equal in Predicting Elections

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (8)

Similar a Are Twitter Users Equal in Predicting Elections

Similar a Are Twitter Users Equal in Predicting Elections (20)

Último

Último (20)

Are Twitter Users Equal in Predicting Elections

Notas del editor