SlideShare una empresa de Scribd logo
1 de 25
Descargar para leer sin conexión
Predicting the Future With Social
                 Media

                          Social Computing Lab
                 The Social Computing Lab focuses on methods     Bernardo A. Huberman
Sitaram Asur     for harvesting the collective intelligence of
                 groups of people in order to realize greater
                 value from the interaction between users and
                 information.


     Published on arXiv Cornell University – March 2010
                         http://arxiv.org/abs/1003.5699




      Maurizio Napolitano, SoNet group,http://sonet.fbk.eu - April 2010
SoNet Research Meetings
These slides were used for an internal presentation of
  the SoNet group.
Every week, one member of the SoNet group presents a
  research papers to the other members. The
  mentioned paper(s) are hence written by other
  researchers.
Being internal presentations, these slides might be a bit
  rough and unpolished.

You can find more information (including this
  presentation) about the SoNet group at
  http://sonet.fbk.eu
The question
 How social media content can be used to predict
  real-world outcomes?
The case study:
predicting box-office revenues for movies using
the chatter from Twitter
Why Twitter?
   several tens of millions of users who actively participate in the
   creation and propagation of content

Why movies?
  The topic of movies is of considerable interest among the social media
  user community
  The real-world outcomes can be easily observed from box-office
  revenue for movies
Topics
Viral marketing
• How buzz and attention is created for different movies
• How buzz and attention changes over time

  movies that are well talked about will be
               well-watched?
Sentiments
•How are created
•How positive and negative opinions propagate
•How they influence people
What discovery
• Social media feeds can be effective indicators of
   real-world performance


• The rate at which movie tweets are generated can be
   used to build a powerful model for predicting movie
   box-office revenue.


• The predictions are better than those produced by the
   Hollywood Stock Exchange, the gold standard in the
                    Exchange
   industry
The dataset
  TWITTER search API                                 2.89 million tweets
  •tweets                                            referring to 24 different movies
  •@userid                                           period of 3 months (nov-feb)
  •retweet                                           from 1.2 million users

                                 by using the movies keywords

Armored           Daybreakers        Extraordinary          Leap Year       Princess And The   Tooth Fairy
(2009-12-04)      (2010-01-08)       Measures               (2010-01-08)    Fog                (2010-02-26)
                                     (2010-02-22)                           (2009-11-13)

Avatar            Dear John          From Paris With Love   Legion          Sherlock Holmes    Transylmania
(2009-12-18)      (2010-02-05)       (2010-02-05)           (2010-01-22)    (2009-12-15)       (2009-12-04)


The Blind Side    Did You Hear       The Imaginarium of     Twilight: New   Spy Next Door      When in Rome
(2009-11-15)      About The          Dr Parnassus           moon            (2010-01-15)       (2010-01-29)
                  Morgans            (2010-01-08)           (2009-11-20)
                  (2009-12-08)
The Book of Eli   Edge of Darkness   Invictus               Pirate Radio    The Crazies        Youth in Revolt
(2010-01-15)      (2010-01-29)       (2009-12-11)           (2009-11-13)    (2010-02-26)       (2010-01-08)



critical period = the time to the week before a release movie
Dataset charatecteristics
         Number of tweets per unique authors for different movies




                                                         y → tweets
LIKE the box-office trends!!!                            x → days
                                                         lines → movies
Dataset characteristics
             Number of tweets per unique authors for different movies




                                                             y → tweets per authors
                                                             x → days
ratio remains fairly consistent between 1 and 1.5
                                                             lines → movies
Dataset charatecteristics
          Log distribution of authors and tweets over the critical period




POWER LAW – Zipfian distribution                           y → log(frequency of authors)
A few authors generating a large number of tweets          x → log(number of tweets)
Dataset characteristics
           Distribution of total authors and the movies they comment on




POWER LAW                                                   y → authors
A majority of the authors talking about only a few movies   x → number of movies
Attention and popularity
                            Twitter and real world



“Prior to the release of a movie, media companies and and
producers generate promotional information in the form of
trailer videos, news, blogs and photos.
We expect the tweets for movies before the time of their
release to consist primarily of such promotional campaigns,
geared to promote word-ofmouth cascades”


In Twitter:

                     tweets and retweets
referring a particular url (photos, trailer and other promotional material)
Attention and popularity
  Percentages of urls in tweets for different movies




there is a greater percentage of tweets containing urls
in the week prior to release than afterwards
Attention and popularity
               tweets with url VS retweets

    URLs and RETWEETs PERCENTAGES FOR CRITICAL WEEK

        Features Week 0      Week 1      Week 2
        url        39.5      25.5        22.5
        retweet    12.1      12.1        11.66


   CORRELATION and COEFFICENT OF DETERMINATION (R2 )
       values for URLS and RETWEETs before release

        Features      Correlation   R2
        url           0.64          0.39
        retweet       0.5           0.20


“This result is quite surprising since we would expect
promotional material to contribute significantly to a movie’s
box-office income”
Prediction
                   first weekend Box-office revenues

    “Using the tweets referring to movies prior to their release,
    can we accurately predict the box-office revenue generated
    by the movie in its opening weekend?”

      How use a quantifiable measure on the tweets?

TWEETRATE
  number of tweets referring to a particular movie per hour

                                ∣tweets mov∣
              Tweetrate mov =
                                ∣Time hours∣

“the correlation of the average tweetrate with the box-office gross
for the 24 movies considered showed a strong positive correlation,
with a correlation coefficient value of 0.90”
Prediction
                         use the regression analisys!

Prediction compared with the real box-office revenue information extracted from
the Box Office Mojo website => POSITIVE RESULTS


     Regression analysis with:

     •Time series values of the tweet rate for the 7 days
     before the release

     •Thent → number of the theaters the movies were
     released

     •HSX Index → the index of the Hollywood Stock
     Exchange
Prediction
                 linear regression the results

Features                         Adjusted R2     p-value***
Avg Tweet-rate                   0.80            3.65e-09

Tweet-rate timeseries            0.93            5.279e-09

Tweet-rate timeseries + thent    0.973           9.14e-12

HSX timeseries + thent           0.963           1.030e-10
Prediction
Predicted vs Actual box office scores using tweet-rate and HSX predictors
Prediction
                          Predicting prices


Prediction of HSX end of opening weekend price
         Predictor               Adjusted R2   p-value***
HSX timeseries + thent           0.95          4.495e-10
Tweet-rate timeseries + 0.97                   2.379e-11
thent



“The Hollywood Stock          Week-end         Adjusted R2
Exchange       de-lists
movie stocks after 4          Jan 15-17        0.92
weeks    of   release,        Jan 22-24        0.97
which means that
there is no timeseries        Jan 29-31        0.92
available for movies
after 4 weeks. In the         Feb 05-07        0.95
case of tweets, people
continue to discuss          Coefficient of determination
movies long after they       (R2) values using tweet-rate
are released”
                             timeseries for different week-
                             ends
Sentiment Analysis
investigate the importance of sentiments in predicting future outcomes

    •For each tweet assign the label Positive, Negative or Neutral
        • Clean data (no stop-words, removel url and userid,
          replace title, question, exclamations)
        • Amazon Meccanical Turk (1000 workers)

    •Use LingPipe – DynamicLDClassifier
           • Obtained an accuracy of 98%

    1)Define two variables

                       ∣Positive and NegativeTweets∣
          Subjectivity=
                              ∣Neutral Tweets∣

                   ∣Tweets with Positive Sentiment∣
          PNratio=
                  ∣Tweets with Negative Sentiment∣
Sentiment Analysis




                                           X → movies
the subjectivity increases after release   Y → subjectivity
Sentiment Analysis




The positive and negative go in the same direction   X → movies
of the movies success                                Y → polarity
Sentiment Analysis
       regression analisys and polartiy (PNRatio)


    Predictor                  Adjusted R2   p-value

    Avg Tweet-rate             0.79          8.39e-09

    Avg Tweet-rate + thent     0.83          7.93a-09

    Avg Tweet-rate + PNRatio   0.92          4.31e-12

    Tweet-rate time series     0.84          4.18e-06
    Tweet-rate timeseries +    0.863         3.64e-06
    thent

    Tweet-rate timeseries +    0.94          1.84e-08
    PNRatio


the sentiments do provide improvements, although they are not as
           important as the rate of tweets themselves
GENERAL PREDICTION MODEL FOR
        SOCIALMEDIA

   y=a∗A p∗P d ∗D

    A : rate of attention seeking
    P : polarity of sentiments and reviews
                          y=∧

    D : distribution parameter

    y denote the revenue to be predicted
    Є the error
    β values correspond to the regression
    coefficients
Bibliography

    D. M. Pennock, S. Lawrence, C. L. Giles, and F. A. Nielsen.
    The real power of artificial markets. Science, 291(5506):987–
    988, Jan 2001.

    W. Zhang and S. Skiena. Improving movie gross prediction
    through news analysis. In Web Intelligence, pages 301304,
    2009.
These slides are released under
Creative Commons
Attribution-ShareAlike 2.5
●
  You are free:
●
  to copy, distribute, display, and perform the work
●
  to make derivative works
●
  to make commercial use of the work

Under the following conditions:
Attribution. You must attribute the work in the manner specified by the author or licensor.
Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work
only under a license identical to this one.
●
    For any reuse or distribution, you must make clear to others the license terms of this work.
●
    Any of these conditions can be waived if you get permission from the copyright holder.


Your fair use and other rights are in no way affected by the above.
More info at http://creativecommons.org/licenses/by-sa/2.5/

Más contenido relacionado

La actualidad más candente

BUSINESSPROPOSALFINAL.docx
BUSINESSPROPOSALFINAL.docxBUSINESSPROPOSALFINAL.docx
BUSINESSPROPOSALFINAL.docx
Alexis Brown
 
Project Report Outlook (
Project Report Outlook (Project Report Outlook (
Project Report Outlook (
Satyam Sharma
 

La actualidad más candente (20)

Social media analytics
Social media analyticsSocial media analytics
Social media analytics
 
DENTSU - 2023 Global Ad Spend Forecasts.pdf
DENTSU - 2023 Global Ad Spend Forecasts.pdfDENTSU - 2023 Global Ad Spend Forecasts.pdf
DENTSU - 2023 Global Ad Spend Forecasts.pdf
 
Use of Analytics by Netflix - Case Study
Use of Analytics by Netflix - Case StudyUse of Analytics by Netflix - Case Study
Use of Analytics by Netflix - Case Study
 
The Roles of Analytics in Digital Marketing
The Roles of Analytics in Digital MarketingThe Roles of Analytics in Digital Marketing
The Roles of Analytics in Digital Marketing
 
Uber Technologies Inc. Social Media Strategy
Uber Technologies Inc. Social Media StrategyUber Technologies Inc. Social Media Strategy
Uber Technologies Inc. Social Media Strategy
 
Zomato
ZomatoZomato
Zomato
 
Performing a social media audit
Performing a social media auditPerforming a social media audit
Performing a social media audit
 
Influencer Marketing Strategy PowerPoint Presentation Slides
Influencer Marketing Strategy PowerPoint Presentation Slides Influencer Marketing Strategy PowerPoint Presentation Slides
Influencer Marketing Strategy PowerPoint Presentation Slides
 
Digital 2021 Indonesia (January 2021) v01
Digital 2021 Indonesia (January 2021) v01Digital 2021 Indonesia (January 2021) v01
Digital 2021 Indonesia (January 2021) v01
 
BUSINESSPROPOSALFINAL.docx
BUSINESSPROPOSALFINAL.docxBUSINESSPROPOSALFINAL.docx
BUSINESSPROPOSALFINAL.docx
 
Zomato - SEO and Search Marketing
Zomato  - SEO and Search MarketingZomato  - SEO and Search Marketing
Zomato - SEO and Search Marketing
 
Marketing Strategy - MakeMyTrip
Marketing Strategy - MakeMyTripMarketing Strategy - MakeMyTrip
Marketing Strategy - MakeMyTrip
 
Final outlook
Final outlookFinal outlook
Final outlook
 
Financial analysis of Ril & competitors
Financial analysis of Ril & competitorsFinancial analysis of Ril & competitors
Financial analysis of Ril & competitors
 
Social Media Trends 2023 - MarketingTrips.pdf
Social Media Trends 2023 - MarketingTrips.pdfSocial Media Trends 2023 - MarketingTrips.pdf
Social Media Trends 2023 - MarketingTrips.pdf
 
Project Report Outlook (
Project Report Outlook (Project Report Outlook (
Project Report Outlook (
 
Twitter sentiment analysis
Twitter sentiment analysisTwitter sentiment analysis
Twitter sentiment analysis
 
A Study Of Financial Performance Analysis Of Mahindra Mahindra And Tata Motors
A Study Of Financial Performance Analysis Of Mahindra   Mahindra And Tata MotorsA Study Of Financial Performance Analysis Of Mahindra   Mahindra And Tata Motors
A Study Of Financial Performance Analysis Of Mahindra Mahindra And Tata Motors
 
digital-2022-indonesia-february-2022-v01.pdf
digital-2022-indonesia-february-2022-v01.pdfdigital-2022-indonesia-february-2022-v01.pdf
digital-2022-indonesia-february-2022-v01.pdf
 
Digital 2023 India (February 2023) v01
Digital 2023 India (February 2023) v01Digital 2023 India (February 2023) v01
Digital 2023 India (February 2023) v01
 

Destacado

Sentiment analysis of arabic,a survey
Sentiment analysis of arabic,a surveySentiment analysis of arabic,a survey
Sentiment analysis of arabic,a survey
Arabic_NLP_ImamU2013
 

Destacado (7)

Predicting the future with social media
Predicting the future with social mediaPredicting the future with social media
Predicting the future with social media
 
Osiris - Opendata impact in jobs&skills
Osiris - Opendata impact in jobs&skillsOsiris - Opendata impact in jobs&skills
Osiris - Opendata impact in jobs&skills
 
Predicting the future with social media (Twitter y Box Office)
Predicting the future with social media (Twitter y Box Office)Predicting the future with social media (Twitter y Box Office)
Predicting the future with social media (Twitter y Box Office)
 
Predicting the future with social media
Predicting the future with social mediaPredicting the future with social media
Predicting the future with social media
 
Cultura del dato e startup (lesson learned from finodex)
Cultura del dato e startup (lesson learned from finodex)Cultura del dato e startup (lesson learned from finodex)
Cultura del dato e startup (lesson learned from finodex)
 
Sentiment Analysis for Arabic tweets
Sentiment Analysis for Arabic tweetsSentiment Analysis for Arabic tweets
Sentiment Analysis for Arabic tweets
 
Sentiment analysis of arabic,a survey
Sentiment analysis of arabic,a surveySentiment analysis of arabic,a survey
Sentiment analysis of arabic,a survey
 

Similar a Predicting The Future With Social Media

Netflix and the film industry
Netflix and the film industryNetflix and the film industry
Netflix and the film industry
lou80
 
EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media ...
EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media ...EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media ...
EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media ...
Symeon Papadopoulos
 

Similar a Predicting The Future With Social Media (20)

Anly 500-presentation
Anly 500-presentationAnly 500-presentation
Anly 500-presentation
 
A Movie Broker Dashboard in PowerBI
A Movie Broker Dashboard in PowerBIA Movie Broker Dashboard in PowerBI
A Movie Broker Dashboard in PowerBI
 
Super Bowl 50 & the Twitterverse
Super Bowl 50 & the TwitterverseSuper Bowl 50 & the Twitterverse
Super Bowl 50 & the Twitterverse
 
Teads Entertainment Barometer - July 2015
Teads Entertainment Barometer - July 2015Teads Entertainment Barometer - July 2015
Teads Entertainment Barometer - July 2015
 
Storytelling with data think broad, mine deep, explain simply
Storytelling with data   think broad, mine deep, explain simplyStorytelling with data   think broad, mine deep, explain simply
Storytelling with data think broad, mine deep, explain simply
 
Teads Entertainment Barometer February 2016
Teads Entertainment Barometer February 2016Teads Entertainment Barometer February 2016
Teads Entertainment Barometer February 2016
 
Tim P
Tim P   Tim P
Tim P
 
Teads Entertainment Barometer October 2015
Teads Entertainment Barometer October 2015Teads Entertainment Barometer October 2015
Teads Entertainment Barometer October 2015
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
Netflix and the film industry
Netflix and the film industryNetflix and the film industry
Netflix and the film industry
 
EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media ...
EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media ...EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media ...
EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media ...
 
Foresee your movie revenue
Foresee your movie revenueForesee your movie revenue
Foresee your movie revenue
 
Community detection recommender system
Community detection   recommender systemCommunity detection   recommender system
Community detection recommender system
 
Artificial Intelligence for the Film Industry
Artificial Intelligence for the Film IndustryArtificial Intelligence for the Film Industry
Artificial Intelligence for the Film Industry
 
A Two Step Ranking Solution for Twitter User Engagement
A Two Step Ranking Solution for Twitter User Engagement�A Two Step Ranking Solution for Twitter User Engagement�
A Two Step Ranking Solution for Twitter User Engagement
 
H2O World - Clustering & Feature Extraction on Text - Seth Redmore
H2O World - Clustering & Feature Extraction on Text - Seth RedmoreH2O World - Clustering & Feature Extraction on Text - Seth Redmore
H2O World - Clustering & Feature Extraction on Text - Seth Redmore
 
Teads Entertainment Barometer November 2015 (October data)
Teads Entertainment Barometer November 2015 (October data)Teads Entertainment Barometer November 2015 (October data)
Teads Entertainment Barometer November 2015 (October data)
 
Introducing telemetrics
Introducing telemetricsIntroducing telemetrics
Introducing telemetrics
 
DataEngConf: Talkographics: Using What Viewers Say Online to Measure TV and B...
DataEngConf: Talkographics: Using What Viewers Say Online to Measure TV and B...DataEngConf: Talkographics: Using What Viewers Say Online to Measure TV and B...
DataEngConf: Talkographics: Using What Viewers Say Online to Measure TV and B...
 

Más de Maurizio Napolitano

Más de Maurizio Napolitano (20)

I dati AGCOM del pluralismo politico sociale in televisione
I dati AGCOM del pluralismo politico sociale in televisioneI dati AGCOM del pluralismo politico sociale in televisione
I dati AGCOM del pluralismo politico sociale in televisione
 
FIPAV - allievo allenatore Il protocollo di allenamento - Modulo 2 - napolita...
FIPAV - allievo allenatore Il protocollo di allenamento - Modulo 2 - napolita...FIPAV - allievo allenatore Il protocollo di allenamento - Modulo 2 - napolita...
FIPAV - allievo allenatore Il protocollo di allenamento - Modulo 2 - napolita...
 
La gestione del gruppo
La gestione del gruppoLa gestione del gruppo
La gestione del gruppo
 
percorsi ciclabili e stress
percorsi ciclabili e stresspercorsi ciclabili e stress
percorsi ciclabili e stress
 
Soluzioni open source per la mobilità
Soluzioni open source per la mobilitàSoluzioni open source per la mobilità
Soluzioni open source per la mobilità
 
Il diritto all'oblio nell'era digitale
Il diritto all'oblio nell'era digitaleIl diritto all'oblio nell'era digitale
Il diritto all'oblio nell'era digitale
 
OpenStreetMap: disegnamo la mappa del mondo
OpenStreetMap: disegnamo la mappa del mondoOpenStreetMap: disegnamo la mappa del mondo
OpenStreetMap: disegnamo la mappa del mondo
 
Estrarre dati da Twitter via API e soluzioni OSINT
Estrarre dati da Twitter via API e soluzioni OSINTEstrarre dati da Twitter via API e soluzioni OSINT
Estrarre dati da Twitter via API e soluzioni OSINT
 
OpenStreetMap: passato, presente e futuro (?)
OpenStreetMap:  passato, presente e futuro (?)OpenStreetMap:  passato, presente e futuro (?)
OpenStreetMap: passato, presente e futuro (?)
 
Strumenti per il Fact Checking
Strumenti per il Fact CheckingStrumenti per il Fact Checking
Strumenti per il Fact Checking
 
Estrarre contenuti da Web
Estrarre contenuti da WebEstrarre contenuti da Web
Estrarre contenuti da Web
 
Ten years of opendata: what has happened and what is there to do
Ten years of opendata: what has happened and what is there to doTen years of opendata: what has happened and what is there to do
Ten years of opendata: what has happened and what is there to do
 
Infographics & data visualization - corso base FBK
Infographics & data visualization - corso base FBKInfographics & data visualization - corso base FBK
Infographics & data visualization - corso base FBK
 
Percorso di specializzazione per i ruoli di ricevitore–attaccante, opposto e ...
Percorso di specializzazione per i ruoli di ricevitore–attaccante, opposto e ...Percorso di specializzazione per i ruoli di ricevitore–attaccante, opposto e ...
Percorso di specializzazione per i ruoli di ricevitore–attaccante, opposto e ...
 
Dati: catalizzatori di innovazione per la smarticity
Dati: catalizzatori di innovazione per la smarticityDati: catalizzatori di innovazione per la smarticity
Dati: catalizzatori di innovazione per la smarticity
 
la comunicazione attraverso i social media
la comunicazione attraverso i social mediala comunicazione attraverso i social media
la comunicazione attraverso i social media
 
creare cruscotti per investigare i dati
creare cruscotti per investigare i daticreare cruscotti per investigare i dati
creare cruscotti per investigare i dati
 
Follow the white Rabbit: opportunità e trabocchetti nella nostra vita digitale
Follow the white Rabbit: opportunità e trabocchetti nella nostra vita digitaleFollow the white Rabbit: opportunità e trabocchetti nella nostra vita digitale
Follow the white Rabbit: opportunità e trabocchetti nella nostra vita digitale
 
Strumenti e suggerimenti per creare grafici
Strumenti e suggerimenti per creare graficiStrumenti e suggerimenti per creare grafici
Strumenti e suggerimenti per creare grafici
 
Data Journalism e Fake News
Data Journalism e Fake NewsData Journalism e Fake News
Data Journalism e Fake News
 

Último

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 

Predicting The Future With Social Media

  • 1. Predicting the Future With Social Media Social Computing Lab The Social Computing Lab focuses on methods Bernardo A. Huberman Sitaram Asur for harvesting the collective intelligence of groups of people in order to realize greater value from the interaction between users and information. Published on arXiv Cornell University – March 2010 http://arxiv.org/abs/1003.5699 Maurizio Napolitano, SoNet group,http://sonet.fbk.eu - April 2010
  • 2. SoNet Research Meetings These slides were used for an internal presentation of the SoNet group. Every week, one member of the SoNet group presents a research papers to the other members. The mentioned paper(s) are hence written by other researchers. Being internal presentations, these slides might be a bit rough and unpolished. You can find more information (including this presentation) about the SoNet group at http://sonet.fbk.eu
  • 3. The question How social media content can be used to predict real-world outcomes? The case study: predicting box-office revenues for movies using the chatter from Twitter Why Twitter? several tens of millions of users who actively participate in the creation and propagation of content Why movies? The topic of movies is of considerable interest among the social media user community The real-world outcomes can be easily observed from box-office revenue for movies
  • 4. Topics Viral marketing • How buzz and attention is created for different movies • How buzz and attention changes over time movies that are well talked about will be well-watched? Sentiments •How are created •How positive and negative opinions propagate •How they influence people
  • 5. What discovery • Social media feeds can be effective indicators of real-world performance • The rate at which movie tweets are generated can be used to build a powerful model for predicting movie box-office revenue. • The predictions are better than those produced by the Hollywood Stock Exchange, the gold standard in the Exchange industry
  • 6. The dataset TWITTER search API 2.89 million tweets •tweets referring to 24 different movies •@userid period of 3 months (nov-feb) •retweet from 1.2 million users by using the movies keywords Armored Daybreakers Extraordinary Leap Year Princess And The Tooth Fairy (2009-12-04) (2010-01-08) Measures (2010-01-08) Fog (2010-02-26) (2010-02-22) (2009-11-13) Avatar Dear John From Paris With Love Legion Sherlock Holmes Transylmania (2009-12-18) (2010-02-05) (2010-02-05) (2010-01-22) (2009-12-15) (2009-12-04) The Blind Side Did You Hear The Imaginarium of Twilight: New Spy Next Door When in Rome (2009-11-15) About The Dr Parnassus moon (2010-01-15) (2010-01-29) Morgans (2010-01-08) (2009-11-20) (2009-12-08) The Book of Eli Edge of Darkness Invictus Pirate Radio The Crazies Youth in Revolt (2010-01-15) (2010-01-29) (2009-12-11) (2009-11-13) (2010-02-26) (2010-01-08) critical period = the time to the week before a release movie
  • 7. Dataset charatecteristics Number of tweets per unique authors for different movies y → tweets LIKE the box-office trends!!! x → days lines → movies
  • 8. Dataset characteristics Number of tweets per unique authors for different movies y → tweets per authors x → days ratio remains fairly consistent between 1 and 1.5 lines → movies
  • 9. Dataset charatecteristics Log distribution of authors and tweets over the critical period POWER LAW – Zipfian distribution y → log(frequency of authors) A few authors generating a large number of tweets x → log(number of tweets)
  • 10. Dataset characteristics Distribution of total authors and the movies they comment on POWER LAW y → authors A majority of the authors talking about only a few movies x → number of movies
  • 11. Attention and popularity Twitter and real world “Prior to the release of a movie, media companies and and producers generate promotional information in the form of trailer videos, news, blogs and photos. We expect the tweets for movies before the time of their release to consist primarily of such promotional campaigns, geared to promote word-ofmouth cascades” In Twitter: tweets and retweets referring a particular url (photos, trailer and other promotional material)
  • 12. Attention and popularity Percentages of urls in tweets for different movies there is a greater percentage of tweets containing urls in the week prior to release than afterwards
  • 13. Attention and popularity tweets with url VS retweets URLs and RETWEETs PERCENTAGES FOR CRITICAL WEEK Features Week 0 Week 1 Week 2 url 39.5 25.5 22.5 retweet 12.1 12.1 11.66 CORRELATION and COEFFICENT OF DETERMINATION (R2 ) values for URLS and RETWEETs before release Features Correlation R2 url 0.64 0.39 retweet 0.5 0.20 “This result is quite surprising since we would expect promotional material to contribute significantly to a movie’s box-office income”
  • 14. Prediction first weekend Box-office revenues “Using the tweets referring to movies prior to their release, can we accurately predict the box-office revenue generated by the movie in its opening weekend?” How use a quantifiable measure on the tweets? TWEETRATE number of tweets referring to a particular movie per hour ∣tweets mov∣ Tweetrate mov = ∣Time hours∣ “the correlation of the average tweetrate with the box-office gross for the 24 movies considered showed a strong positive correlation, with a correlation coefficient value of 0.90”
  • 15. Prediction use the regression analisys! Prediction compared with the real box-office revenue information extracted from the Box Office Mojo website => POSITIVE RESULTS Regression analysis with: •Time series values of the tweet rate for the 7 days before the release •Thent → number of the theaters the movies were released •HSX Index → the index of the Hollywood Stock Exchange
  • 16. Prediction linear regression the results Features Adjusted R2 p-value*** Avg Tweet-rate 0.80 3.65e-09 Tweet-rate timeseries 0.93 5.279e-09 Tweet-rate timeseries + thent 0.973 9.14e-12 HSX timeseries + thent 0.963 1.030e-10
  • 17. Prediction Predicted vs Actual box office scores using tweet-rate and HSX predictors
  • 18. Prediction Predicting prices Prediction of HSX end of opening weekend price Predictor Adjusted R2 p-value*** HSX timeseries + thent 0.95 4.495e-10 Tweet-rate timeseries + 0.97 2.379e-11 thent “The Hollywood Stock Week-end Adjusted R2 Exchange de-lists movie stocks after 4 Jan 15-17 0.92 weeks of release, Jan 22-24 0.97 which means that there is no timeseries Jan 29-31 0.92 available for movies after 4 weeks. In the Feb 05-07 0.95 case of tweets, people continue to discuss Coefficient of determination movies long after they (R2) values using tweet-rate are released” timeseries for different week- ends
  • 19. Sentiment Analysis investigate the importance of sentiments in predicting future outcomes •For each tweet assign the label Positive, Negative or Neutral • Clean data (no stop-words, removel url and userid, replace title, question, exclamations) • Amazon Meccanical Turk (1000 workers) •Use LingPipe – DynamicLDClassifier • Obtained an accuracy of 98% 1)Define two variables ∣Positive and NegativeTweets∣ Subjectivity= ∣Neutral Tweets∣ ∣Tweets with Positive Sentiment∣ PNratio= ∣Tweets with Negative Sentiment∣
  • 20. Sentiment Analysis X → movies the subjectivity increases after release Y → subjectivity
  • 21. Sentiment Analysis The positive and negative go in the same direction X → movies of the movies success Y → polarity
  • 22. Sentiment Analysis regression analisys and polartiy (PNRatio) Predictor Adjusted R2 p-value Avg Tweet-rate 0.79 8.39e-09 Avg Tweet-rate + thent 0.83 7.93a-09 Avg Tweet-rate + PNRatio 0.92 4.31e-12 Tweet-rate time series 0.84 4.18e-06 Tweet-rate timeseries + 0.863 3.64e-06 thent Tweet-rate timeseries + 0.94 1.84e-08 PNRatio the sentiments do provide improvements, although they are not as important as the rate of tweets themselves
  • 23. GENERAL PREDICTION MODEL FOR SOCIALMEDIA y=a∗A p∗P d ∗D A : rate of attention seeking P : polarity of sentiments and reviews y=∧ D : distribution parameter y denote the revenue to be predicted Є the error β values correspond to the regression coefficients
  • 24. Bibliography  D. M. Pennock, S. Lawrence, C. L. Giles, and F. A. Nielsen. The real power of artificial markets. Science, 291(5506):987– 988, Jan 2001.  W. Zhang and S. Skiena. Improving movie gross prediction through news analysis. In Web Intelligence, pages 301304, 2009.
  • 25. These slides are released under Creative Commons Attribution-ShareAlike 2.5 ● You are free: ● to copy, distribute, display, and perform the work ● to make derivative works ● to make commercial use of the work Under the following conditions: Attribution. You must attribute the work in the manner specified by the author or licensor. Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under a license identical to this one. ● For any reuse or distribution, you must make clear to others the license terms of this work. ● Any of these conditions can be waived if you get permission from the copyright holder. Your fair use and other rights are in no way affected by the above. More info at http://creativecommons.org/licenses/by-sa/2.5/