Using data from social media, researchers were able to accurately predict the results of Dancing with the Stars, including who would be eliminated each week. While Kate Gosselin consistently received the lowest scores from judges, sentiment analysis of Twitter and Facebook showed she had more fan support than other contestants, despite also receiving a high volume of negative comments, and was therefore safe from elimination. The one week Kate had the lowest positive sentiment and highest negative comments aligned with her finally being voted off by viewers. This case study demonstrates how social media data can be a credible way to predict future events when properly analyzed.
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Big Data Predictive Analytics
1. Big Data Predictive Analytics
Using social media to predict the results of
Dancing with the Stars
Rick Kawamura
@r_kawamura
2. The Value of Big Data
the Is unstructured, social media data credible
and can it be used to accurately predict
question future events?
3. The Value of Big Data
Collect data from twitter, Facebook, and various fan sites.
Cleanse data.
Apply sentiment analysis.
Organize, graph, and analyze.
the
Determine who will be eliminated from the show the following day.
Semantic
test
Analysis
4. Fascinating – Kate’s Story of Survival
Kate Gosselin (from Jon & Kate plus 8) was the least talented of the
12 dancers, but survived 5 weeks before being eliminated.
For 5 weeks, Kate stole the headlines – not for her dancing, but for
her meltdowns, fights with her partner, and how she continued to
survive despite poor dancing performances.
While common sense would lead one to believe she was sure to be
eliminated each week, the data revealed a completely different (more
accurate) story.
Many comments throughout twitter and facebook showed viewers
disdain for Kate and a serious credibility problem for ABC and DWTS.
How could the worst dancer continue to survive? “The show must
be fixed”. “ABC is keeping her on for the ratings”.
Yet week after week, the data showed she was safe – that America
was voting to keep her on.
5. How the data showed Kate was safe
The week before she was eliminated, and similar to most The graph below shows the percent of all negative
1. other weeks, Kate received the lowest score from the
judges. 2. comments, Kate received close to 80%. The negative
sentiment was strong.
Positive sentiment, the best predictor of fan votes, showed Combining the judges’ scores with positive fan
3. Kate clearly had more support than four other contestants
despite her large volume of negative comments.
4. sentiment, it was clear Kate would be safe.
% of Total Comments
6. The week Kate was eliminated – Data never lies
Every week, Kate had the lowest score from the judges. Kate alone received 40% of all comments in social
1. This week was no different. 2. media, but 90% of it was negative.
% of Total Comments
Judges’ Scores
50%
30
25 40%
20 30%
15 20%
10 10%
5
0%
0
In previous weeks, Kate had more positive comments than Given Kate had the lowest judges score and the lowest
3. several of her competitors. However this week, while her
total volume remained high, her percent of positive
comments dropped significantly.
4. number of positive comments, it was clear this week
that she would be eliminated.
% of Positive Comments
35%
30%
25%
20%
15%
10%
5%
0%
7. Key Takeaways
Social, Unstructured Big Data is Credible
Social data contains true sentiment that can be applied to
data models to provide insight and intelligence.
Clarity of Data
In some cases, the answer is obvious. Other times it is a
general sense or trend, but may not pinpoint the exact
target.
Sentiment Analysis
Sentiment Analysis is a valuable technology. But fine tuning the “degree
of sentiment” can be a challenge. Consider how you would rate the
following: “I love Nicole”. “I voted for Chad”. “Erin is gorgeous”.
Predicting Future Events
As evidenced with Kate, the results clearly demonstrated
the value social media data possesses to help predict future
results.
Data Veracity
Who better represents America’s sentiment? Those who cast
their votes by calling in or texting? Or those who express their
views via social media?
8. Extracting Value from Social Media – 5 Tips
Data Trumps Conventional Wisdom
Think of Kate. Despite the overwhelming volume of
negative sentiment, her percent of positive sentiment still
dwarfed many of the contestants who lacked any drama
Timing is Critical
Working with data as close to an event as possible is most
valuable. Utilizing data in real-time can provide a
competitive advantage.
Don’t be blind to the Noise Factor
There is a significant amount of non-essential noise in social
media data that needs to be cleansed. It’s not all fluff, but
may not pertain to the question you are trying to answer.
Not all Social Media Sentiment is Created Equal
Not all data is needed or equal in weight. Is one tweet
equal to one blog post? Is negative sentiment equally as
important as positive sentiment?
Don’t Look at Data in a Vacuum
Context around the question you are trying to answer plays an
important role. Knowing to disregard negative sentiment because
votes are only cast for keeping contestants on the show is critical.