Big Data Predictive Analytics

Big Data Predictive Analytics
Using social media to predict the results of
Dancing with the Stars

Rick Kawamura
@r_kawamura

The Value of Big Data

the Is unstructured, social media data credible
and can it be used to accurately predict

question future events?

The Value of Big Data

Collect data from twitter, Facebook, and various fan sites.
Cleanse data.
Apply sentiment analysis.
Organize, graph, and analyze.

the
Determine who will be eliminated from the show the following day.

Semantic

test
Analysis

Fascinating – Kate’s Story of Survival

Kate Gosselin (from Jon & Kate plus 8) was the least talented of the
12 dancers, but survived 5 weeks before being eliminated.

For 5 weeks, Kate stole the headlines – not for her dancing, but for
her meltdowns, fights with her partner, and how she continued to
survive despite poor dancing performances.

While common sense would lead one to believe she was sure to be
eliminated each week, the data revealed a completely different (more
accurate) story.

Many comments throughout twitter and facebook showed viewers
disdain for Kate and a serious credibility problem for ABC and DWTS.
How could the worst dancer continue to survive? “The show must
be fixed”. “ABC is keeping her on for the ratings”.

Yet week after week, the data showed she was safe – that America
was voting to keep her on.

How the data showed Kate was safe
The week before she was eliminated, and similar to most The graph below shows the percent of all negative
1. other weeks, Kate received the lowest score from the
judges. 2. comments, Kate received close to 80%. The negative
sentiment was strong.

Positive sentiment, the best predictor of fan votes, showed Combining the judges’ scores with positive fan
3. Kate clearly had more support than four other contestants
despite her large volume of negative comments.
4. sentiment, it was clear Kate would be safe.

% of Total Comments

The week Kate was eliminated – Data never lies
Every week, Kate had the lowest score from the judges. Kate alone received 40% of all comments in social
1. This week was no different. 2. media, but 90% of it was negative.
% of Total Comments
Judges’ Scores
50%
30
25 40%
20 30%
15 20%
10 10%
5
0%
0

In previous weeks, Kate had more positive comments than Given Kate had the lowest judges score and the lowest

3. several of her competitors. However this week, while her
total volume remained high, her percent of positive
comments dropped significantly.
4. number of positive comments, it was clear this week
that she would be eliminated.

% of Positive Comments
35%
30%
25%
20%
15%
10%
5%
0%

Key Takeaways
Social, Unstructured Big Data is Credible
Social data contains true sentiment that can be applied to
data models to provide insight and intelligence.

Clarity of Data
In some cases, the answer is obvious. Other times it is a
general sense or trend, but may not pinpoint the exact
target.

Sentiment Analysis
Sentiment Analysis is a valuable technology. But fine tuning the “degree
of sentiment” can be a challenge. Consider how you would rate the
following: “I love Nicole”. “I voted for Chad”. “Erin is gorgeous”.

Predicting Future Events
As evidenced with Kate, the results clearly demonstrated
the value social media data possesses to help predict future
results.

Data Veracity
Who better represents America’s sentiment? Those who cast
their votes by calling in or texting? Or those who express their
views via social media?

Extracting Value from Social Media – 5 Tips
Data Trumps Conventional Wisdom
Think of Kate. Despite the overwhelming volume of
negative sentiment, her percent of positive sentiment still
dwarfed many of the contestants who lacked any drama

Timing is Critical
Working with data as close to an event as possible is most
valuable. Utilizing data in real-time can provide a
competitive advantage.

Don’t be blind to the Noise Factor
There is a significant amount of non-essential noise in social
media data that needs to be cleansed. It’s not all fluff, but
may not pertain to the question you are trying to answer.

Not all Social Media Sentiment is Created Equal
Not all data is needed or equal in weight. Is one tweet
equal to one blog post? Is negative sentiment equally as
important as positive sentiment?

Don’t Look at Data in a Vacuum
Context around the question you are trying to answer plays an
important role. Knowing to disregard negative sentiment because
votes are only cast for keeping contestants on the show is critical.

Thanks for Viewing
@r_kawamura

Big Data Predictive Analytics

Recomendados

Recomendados

Más contenido relacionado

Similar a Big Data Predictive Analytics

Similar a Big Data Predictive Analytics (20)

Último

Último (20)

Big Data Predictive Analytics