SlideShare a Scribd company logo
1 of 44
WILL TWITTER MAKE YOU A BETTER
INVESTOR?
A LOOK AT SENTIMENT, USER REPUTATION AND THEIR EFFECT ON THE STOCK
MARKET


Eric D. Brown
Background
• Sentiment has long been an underlying factor in the
 investing world
  • Consumer Confidence Index
  • Investors Intelligence Sentiment Index
  • “Market Sentiment”



• Rather than waiting days, months or weeks, can the
 „sentiment of now‟ be used to improve trading
 performance and investing decisions?

• Can Twitter be used to determine the „sentiment of now‟?
Background
The thoughts driving this research are:

• Can analysis of publicly available Twitter Messages
 provide insight for decision making for investing?

• Do Twitter messages (and their subsequent sentiment)
 have any effect on movement in the stock market?

• Can Twitter messages be mined and analyzed to predict
 movements in the stock market?

• Does a Twitter user‟s reputation have an effect on how
 people perceive and use their shared investing ideas?
Research Method
                                                                   Stock
          Twitter                          Data                    Market
                                         Collection                 Data




                                            Price &
                       Sentiment                                 Social
                                            Volume
                        Analysis                                Analysis
                                            Analysis

                      Positive Correlation of sentiment
                                                              Reputation of
                        and message volume with
                                                               Twitter user
                                price/volume


Understanding of predictive capabilities of Twitter Sentiment and the affect
             of user reputation investing decision support
Research Method
• Data Collection
  • Using Twitter API to collect tweets (tweet, sender, date, time)
     • Tweets referencing companies and sectors are collected and stored in a
       MySQL database for future study
     • Using the nomenclature made popular by StockTwits
       (www.stocktwits.com). Example: The stock symbol for Apple is AAPL.
       Users following the StockTwits nomenclature add a “$” to the symbol –
       “$AAPL”.
     • Stocktwits.com describes their purpose as a place to:
       • …share ideas, market insights and trades on stocks, futures and the market in
         general *.


  • Using Yahoo Finance data feed to gather Stock Market data (price
    and volume)
     • Provides historical data
Research Method
• Sentiment Analysis
• Using a Naïve Bayesian text classification algorithm to
 determine sentiment of collected Tweets
     • Naïve Bayesian is being used for simplicity but also because many
       researchers have pointed out very minor differences between it and other
       sentiment analysis methods
     • A subset of the data collected has been manually assigned „sentiment‟ to
       build the necessary training dataset

• Using the Python Natural Language Toolkit, the Bayesian
 classification is performed

• For each tweet, the overall score is calculated and assigned.
   • Ideally, tweets will fall into +1 (Bullish), 0 (Neutral), -1 (Bearish)
     buckets.
Sentiment Classification Process

                     Bayes
 Training Dataset
                    Classifier




                     Trained
                                 Twitter Data
                    Classifier



    Classified
     Twitter
    Messages
Current Dataset
• Twitter Dataset:
   • May 1 2011 to Dec 31 2011
   • 473,901 Tweets
   • No deduplication performed


• Training Dataset:
   • 5000 messages randomly selected from collected Tweets
   • Messages have been manually coded as Bullish, Bearish, Neutral
     or Spam
    • 544 Bullish (10.88%)
    • 638 Bearish (12.76%)
    • 3454 Neutral (69.08%)
    • 364 Spam (7.28%)
Bullish Examples
• Markets seems like consolidating before another rally
 higher $SPX $SPY $QQQ

• $KFT - Kraft Foods Stock Analysis - CCI is bullish and
 rising

• If this daily candle ends like this do not go short! 3 white
 soldiers (bullish) $SPX $SPY http://t.co/oWWUqnu

• buy the dips WORKING :-) $ES_F $SPY
Bearish Examples
• $K - Kellogg Stock Analysis - bearish stochastic
 crossdown

• $SPY - Might be trying to roll a little.


• warning sign as $IWM didn‟t make a higher high, unlike
 $QQQ and $SPY

• RT @grassosteve: $SPX Jobs although important only 1
 aspect of the weakness in the mkts, i would sell pops still
 levels up 1216 1228 123 ...
Neutral Examples
• Stocks and Inflation: What the Market Is Really Telling Us
 http://seekingalpha.com/a/5tap $TLT $TLH $SPY

• NEW POST: UPDATED- MEAN REVERSION TRADE ON
 THE RUSSELL 2000 http://bit.ly/l2Q1Rf $IWM $TNA
 $SPY $QQQ

• Sold my $SPY Jun 03 2011 133.0 Puts for 78c made 7c


• Durable Goods as a Leading Market Indicator
 http://seekingalpha.com/a/5u8j $DIA $SPY $QQQ
Spam Examples
• Hop up out the bed turn my $wag on. ;*


• MILLIONAIRE SECRETS CLUB - MAKE $1 MILLION A YEAR
 http://goo.gl/2Yv8s $OXY $PBR $PDCO $pennystocks $POT $PRU
 $QCOM $QSII axsc

• HOME TYPERS NEEDED - MAKE $1000s WEEKLY - PAYS DAILY
 http://goo.gl/hoNUn $SNP $SOHU $SPLS $SPRD $SPX $SPY
 $STO $stocks sg2

• UNLIMITED FREE TV SHOWS on YOUR PC - 12,000 FREE
 CHANNELS http://goo.gl/v55Nw $CBG $CBS $CF $CLR $CMCSA
 $COF $COP $CROX $CTIC qika
Sentiment Analysis using full training
                Set
• From May 1 to Dec 31 2011:

              All Tweets       473,901



               Bullish         103,770   21.90%

               Bearish          84,454   17.82%

               Neutral         224,300   47.33%

                Spam            61,371   12.95%

                None                6    0.001%
A look at the Market & Sentiment
• I ran a short analysis on the S&P 500 ETF (SPY) between
 July 11 and August 11 2011

 • This date range chosen mainly due to a very volatile movement
   down

 • 32 days of data


 • 26,307 tweets


 • On Jul 11 SPY is 131.40
 • On Aug 11 SPY is 117.33
S&P 500 ETF (SPY) - 7/11 to 8/11
S&P 500 ETF (SPY) - 7/11 to 8/11
S&P 500 ETF (SPY) - 7/11 to 8/11
S&P 500 ETF (SPY) - 7/11 to 8/11
But…is the Classifier accurate?
• We can determine sentiment of a tweet…but is it really
 accurate?

• The Python Natural Language Toolkit provides a method
 to determine accuracy of training dataset

  • Build training dataset as normal
  • Use training dataset as the “input data”
  • Run all messages through classifier and determine accuracy
Sentiment Classification Process

                     Bayes
 Training Dataset
                    Classifier




                     Trained
                                 Twitter Data
                    Classifier



    Classified
     Twitter
    Messages
Classification Accuracy

                     Bayes
 Training Dataset
                    Classifier




                     Trained
                                 Training Dataset
                    Classifier



    Accuracy
Accuracy of Training Dataset
• If you recall, our training dataset is:
   • 5000 messages randomly selected from collected Tweets
   • Messages have been manually coded as Bullish, Bearish, Neutral
     or Spam
    • 544 Bullish (10.88%)
    • 638 Bearish (12.76%)
    • 3454 Neutral (69.08%)
    • 364 Spam (7.28%)


  • Running the accuracy method of the Python NLTK delivers a
   54.18% accuracy.
How can we improve accuracy?
• If we think about the theory behind the Bayesian classifier, it may
 shine some light on the inaccuracies.

• The Bayesian Classifier is a probability based theory and is only as
 good as the data used to train.

• Research suggests that having non-symmetric training data sets /
 features, can throw the Bayesian filter off.

• The training dataset used is non-symmetric.


• What if we create a symmetric dataset with the same number of
 Bullish, Bearish, Neutral and Spam data?
Equivalent Sized Training Dataset
• Rebuild Training Dataset


• Randomly select 500 tweets from each training dataset


• Re-run the accuracy method again.
  • Accuracy = 91.94%



• An improvement from 54.18% to 91.94%


• What will this improved accuracy do for the overall
 dataset?
Sentiment Analysis using Symmetric
            training Set
• From May 1 to Dec 31 2011:

              All Tweets       473,901



               Bullish         110,141   23.24%

               Bearish         103,233   21.78%

               Neutral         212,509   44.84%

                Spam            48,012   10.13%

                None                6    0.001%
S&P 500 ETF (SPY) - 7/11 to 8/11
S&P 500 ETF (SPY) - 7/11 to 8/11
S&P 500 ETF (SPY) - 7/11 to 8/11
S&P 500 ETF (SPY) - 7/11 to 8/11
Statiscally Speaking
• There seems to be some correlation between twitter &
 Stock data

• Correlation of TransformedBBN and Closing Price
  • Correlation=0.495
  • P-Value = 0.004



• Correlation of Num Tweets and Daily Volume
  • Correlaton = 0.648
  • P-Value = 0.000
Correlations?
• Basic Analysis shows some correlation between
 price/sentiment and volume/tweet volume.

• Using Time Series analysis, a cross-correlation analysis
 can be completed to determine how these variables are
 related at different „lag‟ periods.

• Using Cross-correlation analysis we can get the Cross-
 Correlation Coefficient (CCF) which gives us an idea of
 how well two variables are correlated at lag time r.

• If a correlation is found a negative lag time r, that variable
 is a candidate for use in predicting the output variable.
Correlations
• Closing Price Cross Correlation Coefficient
                                          Lag Time r
  Variable       -6       -5       -4         -3        -2       -1       0
  Sentiment    0.299    0.399    0.599      0.620      0.663   0.599    0.495
  Num Tweets   -0.497   -0.461   -0.406    -0.533    -0.577    -0.627   -0.714


• Volume Cross Correlation Coefficient
                                          Lag Time r
  Variable       -6       -5       -4         -3        -2       -1       0
  Sentiment    -0.307   -0.394   -0.527     -0.574   -0.617    -0.605   -0.557
  Num Tweets   0.373    0.443    0.485     0.552     0.613     0.642    0.648
What now?

• Correlations exist between price/volume and
 sentiment/message volume.

• The main reason I‟m looking at sentiment is to determine if it
 can somehow be used to predict price movement.

• So…let‟s build a model using Linear Regression (note…linear
 regression isn‟t a likely fit but a good place to start).

• I want the model to predict Closing Price…so let‟s start with a
 simple model using Sentiment only
Building Models
• Using Sentiment to predict Price:
  • The regression equation is
    • Closing Price = 133 + 47.3 TransformedBBN




    • The p-value is less than 0.05, so we should be good but R-Squared is
     24.5…which tells me this isn‟t a very good model.
Building Models (cont)
• What other variables can be used?
  • Volume?
  • Volume change?
  • Number of Tweets?
  • Volatility measurements?




 • There are a lot of combinations…but let‟s keep it simple. Let‟s
   select the Number of Tweets and re-run the analysis.
Building Models (cont)
• Using Sentiment + Number of Tweets:
• The regression equation is
  • Closing Price = 139 + 41.4 TransformedBBN - 0.00856 Num
    Tweets




  • P-values look good still and R-Squared is up to 69.6%.
Building Models (cont)
• What else can we add?


• How about a measure of volatility?


• One often quoted measure is the VIX. This is a
 measurement of implied volatility of S&P 500 index
 options.

• Often referred to as the „fear index‟.


• On 7/11, the VIX was 18.39
• On 8/11 the VIX was 39.00
Building Models (cont)
• Using Sentiment, Number of Tweets, the VIX Closing Price + the
 change in price for the VIX

• The regression equation is
   • Closing Price = 148 - 0.00197 Num Tweets + 12.1 TransformedBBN - 0.713
     VIX Closing Price + 0.128 VIX Price Change




  • P-values are good & R-Squared hits 97.3%...which tells us that this might be a
    good model.
Checking the model
• Our equation is:
  • Closing Price = 148 + 12.1 TransformedBBN - 0.00197 Num Tweets -
    0.713 VIX Closing Price + 0.128 VIX Price Change
• So…for August 11, our variables are:
  • TransformedBBN = -0.075
  • Num Tweets = 1357
  • VIX Closing Price = 39
  • Vix Price Change = -3.99


• Our predicted closing price for August 12 is then: 116.65
     • The prediction is for a move down from the August 11 price of 117.33
• The actual closing price from August 12 was 118.20
  • Our directional prediction was incorrect and our price prediction
    was incorrect.
Checking the Model
• A few more predictions (50% accuracy) :


• For August 15 – prediction is up. Price is 118.44
  • Actual is 120.62. Price moved up.


• For August 16 – prediction is up. Price is 121.69
  • Actual is 119.59. Price moved down.


• For August 17 – prediction is up. Price is 121.07
  • Actual is 119.67. Price moved up.


• For August 18 – prediction is up. Price is 121.73
  • Actual is 114.51. Price moved down
Checking the model
• The model works for prediction direction on a few dates
 immediately…but about the rest of the time?

• Looking at the rest of the year, there are 99 trading days.
   • For 53 days, the prediction was correct.
   • For 46 days, the prediction was incorrect.


• The model gives a 53.54% accuracy.
   • Not great….but better than 50%.
   • With proper risk management of investments, a 3.54% “edge” on the
     market might be perfectly acceptable.


• FYI - This model applied to a full 6 month data set gives an
 accuracy of 57.06%
Checking the model
• As a test, I removed the Tweet Sentiment and the Number
 of Tweets from consideration and only re-ran an analysis
 to create a model.

• The model uses the VIX and VIX Price change only and
 gives R-Squared of 95.5%.

• The regression equation is
  • Closing Price = 149 - 0.847 VIX Closing Price + 0.120 VIX Price
    Change


• This model gives an accuracy of 48.48%....so there
 seems to some value in sentiment/tweet volume.
Next Steps
• There seems to be some correlation between twitter &
 Stock data

• Begin building more complex predictive models using
 Time Series modeling and prediction methods (ARIMA,
 etc).

• Continue analysis of sentiment and price movements.


• Begin Social Network Analysis of twitter users for
 reputation, etc
Thank you

    Eric D. Brown
eric@ericbrown.com
http://ericbrown.com

More Related Content

What's hot

Ontology based sentiment analysis
Ontology based sentiment analysisOntology based sentiment analysis
Ontology based sentiment analysisprathako
 
Datapedia Analysis Report
Datapedia Analysis ReportDatapedia Analysis Report
Datapedia Analysis ReportAbanoub Amgad
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSumit Raj
 
Best Practices for Sentiment Analysis Webinar
Best Practices for Sentiment Analysis Webinar Best Practices for Sentiment Analysis Webinar
Best Practices for Sentiment Analysis Webinar Mechanical Turk
 
Stock prediction using social network
Stock prediction using social networkStock prediction using social network
Stock prediction using social networkChanon Hongsirikulkit
 
SENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATASENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATAParvathy Devaraj
 
Sentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse AnalysisSentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse Analysis Naveen Kumar
 
Sentiment analysis of Twitter data using python
Sentiment analysis of Twitter data using pythonSentiment analysis of Twitter data using python
Sentiment analysis of Twitter data using pythonHetu Bhavsar
 
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...IRJET Journal
 
Sentiment analysis - Our approach and use cases
Sentiment analysis - Our approach and use casesSentiment analysis - Our approach and use cases
Sentiment analysis - Our approach and use casesKarol Chlasta
 
Sentiment Analysis on Amazon Movie Reviews Dataset
Sentiment Analysis on Amazon Movie Reviews DatasetSentiment Analysis on Amazon Movie Reviews Dataset
Sentiment Analysis on Amazon Movie Reviews DatasetMaham F'Rajput
 
Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14Rachit Goel
 
Twitter sentimentanalysis report
Twitter sentimentanalysis reportTwitter sentimentanalysis report
Twitter sentimentanalysis reportSavio Aberneithie
 
An overview of text mining and sentiment analysis for Decision Support System
An overview of text mining and sentiment analysis for Decision Support SystemAn overview of text mining and sentiment analysis for Decision Support System
An overview of text mining and sentiment analysis for Decision Support SystemGan Keng Hoon
 
Sentiment Analaysis on Twitter
Sentiment Analaysis on TwitterSentiment Analaysis on Twitter
Sentiment Analaysis on TwitterNitish J Prabhu
 
Product Sentiment Analysis
Product Sentiment AnalysisProduct Sentiment Analysis
Product Sentiment Analysisnancy amala
 
social network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysissocial network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysisAshish Mundra
 

What's hot (20)

Opinion Mining – Twitter
Opinion Mining – TwitterOpinion Mining – Twitter
Opinion Mining – Twitter
 
Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new datase...
Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new datase...Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new datase...
Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new datase...
 
Ontology based sentiment analysis
Ontology based sentiment analysisOntology based sentiment analysis
Ontology based sentiment analysis
 
Datapedia Analysis Report
Datapedia Analysis ReportDatapedia Analysis Report
Datapedia Analysis Report
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
 
Best Practices for Sentiment Analysis Webinar
Best Practices for Sentiment Analysis Webinar Best Practices for Sentiment Analysis Webinar
Best Practices for Sentiment Analysis Webinar
 
Final deck
Final deckFinal deck
Final deck
 
Stock prediction using social network
Stock prediction using social networkStock prediction using social network
Stock prediction using social network
 
SENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATASENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATA
 
Sentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse AnalysisSentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse Analysis
 
Sentiment analysis of Twitter data using python
Sentiment analysis of Twitter data using pythonSentiment analysis of Twitter data using python
Sentiment analysis of Twitter data using python
 
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...
 
Sentiment analysis - Our approach and use cases
Sentiment analysis - Our approach and use casesSentiment analysis - Our approach and use cases
Sentiment analysis - Our approach and use cases
 
Sentiment Analysis on Amazon Movie Reviews Dataset
Sentiment Analysis on Amazon Movie Reviews DatasetSentiment Analysis on Amazon Movie Reviews Dataset
Sentiment Analysis on Amazon Movie Reviews Dataset
 
Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14
 
Twitter sentimentanalysis report
Twitter sentimentanalysis reportTwitter sentimentanalysis report
Twitter sentimentanalysis report
 
An overview of text mining and sentiment analysis for Decision Support System
An overview of text mining and sentiment analysis for Decision Support SystemAn overview of text mining and sentiment analysis for Decision Support System
An overview of text mining and sentiment analysis for Decision Support System
 
Sentiment Analaysis on Twitter
Sentiment Analaysis on TwitterSentiment Analaysis on Twitter
Sentiment Analaysis on Twitter
 
Product Sentiment Analysis
Product Sentiment AnalysisProduct Sentiment Analysis
Product Sentiment Analysis
 
social network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysissocial network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysis
 

Similar to WILL TWITTER SENTIMENT PREDICT STOCK MARKET MOVEMENT

TextMiningTwitters
TextMiningTwittersTextMiningTwitters
TextMiningTwittersLiu Chang
 
2023 Supervised_Learning_Association_Rules
2023 Supervised_Learning_Association_Rules2023 Supervised_Learning_Association_Rules
2023 Supervised_Learning_Association_RulesFEG
 
Market Forecasting Twitter Sentiment
Market Forecasting Twitter SentimentMarket Forecasting Twitter Sentiment
Market Forecasting Twitter SentimentNicholasBrown67
 
Predicting the NBA MVP
Predicting the NBA MVPPredicting the NBA MVP
Predicting the NBA MVPThinkful
 
Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...
Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...
Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...Data Works MD
 
Approaching Big Data: Lesson Plan
Approaching Big Data: Lesson Plan Approaching Big Data: Lesson Plan
Approaching Big Data: Lesson Plan Bessie Chu
 
How to Build a Recommendation Engine on Spark
How to Build a Recommendation Engine on SparkHow to Build a Recommendation Engine on Spark
How to Build a Recommendation Engine on SparkCaserta
 
Relationship Between Tweets and Bitcoin
Relationship Between Tweets and BitcoinRelationship Between Tweets and Bitcoin
Relationship Between Tweets and BitcoinKushagra Aggarwal
 
A picture is worth a thousand words
A picture is worth a thousand wordsA picture is worth a thousand words
A picture is worth a thousand wordsMasum Billah
 
datamining and warehousing ppt
datamining  and warehousing pptdatamining  and warehousing ppt
datamining and warehousing pptSatyamverma2011
 
Determining the Fit and Impact of CTI Indicators on Your Monitoring Pipeline ...
Determining the Fit and Impact of CTI Indicators on Your Monitoring Pipeline ...Determining the Fit and Impact of CTI Indicators on Your Monitoring Pipeline ...
Determining the Fit and Impact of CTI Indicators on Your Monitoring Pipeline ...Alex Pinto
 
Threat Intelligence Baseada em Dados: Métricas de Disseminação e Compartilham...
Threat Intelligence Baseada em Dados: Métricas de Disseminação e Compartilham...Threat Intelligence Baseada em Dados: Métricas de Disseminação e Compartilham...
Threat Intelligence Baseada em Dados: Métricas de Disseminação e Compartilham...Alexandre Sieira
 
Super bowl 2017 presentation
Super bowl 2017 presentationSuper bowl 2017 presentation
Super bowl 2017 presentationVikal Gupta
 
Vector Search for Data Scientists.pdf
Vector Search for Data Scientists.pdfVector Search for Data Scientists.pdf
Vector Search for Data Scientists.pdfConnorShorten2
 
Module 1.3 data exploratory
Module 1.3  data exploratoryModule 1.3  data exploratory
Module 1.3 data exploratorySara Hooker
 

Similar to WILL TWITTER SENTIMENT PREDICT STOCK MARKET MOVEMENT (20)

TextMiningTwitters
TextMiningTwittersTextMiningTwitters
TextMiningTwitters
 
Data mining project
Data mining projectData mining project
Data mining project
 
2023 Supervised_Learning_Association_Rules
2023 Supervised_Learning_Association_Rules2023 Supervised_Learning_Association_Rules
2023 Supervised_Learning_Association_Rules
 
Market Forecasting Twitter Sentiment
Market Forecasting Twitter SentimentMarket Forecasting Twitter Sentiment
Market Forecasting Twitter Sentiment
 
Predicting the NBA MVP
Predicting the NBA MVPPredicting the NBA MVP
Predicting the NBA MVP
 
Intro scikitlearnstatsmodels
Intro scikitlearnstatsmodelsIntro scikitlearnstatsmodels
Intro scikitlearnstatsmodels
 
Alleviating Data Sparsity for Twitter Sentiment Analysis
Alleviating Data Sparsity for Twitter Sentiment AnalysisAlleviating Data Sparsity for Twitter Sentiment Analysis
Alleviating Data Sparsity for Twitter Sentiment Analysis
 
Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...
Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...
Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...
 
Approaching Big Data: Lesson Plan
Approaching Big Data: Lesson Plan Approaching Big Data: Lesson Plan
Approaching Big Data: Lesson Plan
 
How to Build a Recommendation Engine on Spark
How to Build a Recommendation Engine on SparkHow to Build a Recommendation Engine on Spark
How to Build a Recommendation Engine on Spark
 
Relationship Between Tweets and Bitcoin
Relationship Between Tweets and BitcoinRelationship Between Tweets and Bitcoin
Relationship Between Tweets and Bitcoin
 
A picture is worth a thousand words
A picture is worth a thousand wordsA picture is worth a thousand words
A picture is worth a thousand words
 
datamining and warehousing ppt
datamining  and warehousing pptdatamining  and warehousing ppt
datamining and warehousing ppt
 
Introduction to Data Analytics
Introduction to Data AnalyticsIntroduction to Data Analytics
Introduction to Data Analytics
 
Determining the Fit and Impact of CTI Indicators on Your Monitoring Pipeline ...
Determining the Fit and Impact of CTI Indicators on Your Monitoring Pipeline ...Determining the Fit and Impact of CTI Indicators on Your Monitoring Pipeline ...
Determining the Fit and Impact of CTI Indicators on Your Monitoring Pipeline ...
 
Threat Intelligence Baseada em Dados: Métricas de Disseminação e Compartilham...
Threat Intelligence Baseada em Dados: Métricas de Disseminação e Compartilham...Threat Intelligence Baseada em Dados: Métricas de Disseminação e Compartilham...
Threat Intelligence Baseada em Dados: Métricas de Disseminação e Compartilham...
 
Super bowl 2017 presentation
Super bowl 2017 presentationSuper bowl 2017 presentation
Super bowl 2017 presentation
 
Selling Text Analytics to your boss
Selling Text Analytics to your bossSelling Text Analytics to your boss
Selling Text Analytics to your boss
 
Vector Search for Data Scientists.pdf
Vector Search for Data Scientists.pdfVector Search for Data Scientists.pdf
Vector Search for Data Scientists.pdf
 
Module 1.3 data exploratory
Module 1.3  data exploratoryModule 1.3  data exploratory
Module 1.3 data exploratory
 

Recently uploaded

(DIYA) Bhumkar Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(DIYA) Bhumkar Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(DIYA) Bhumkar Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(DIYA) Bhumkar Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdf
06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdf06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdf
06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdfFinTech Belgium
 
Best VIP Call Girls Noida Sector 18 Call Me: 8448380779
Best VIP Call Girls Noida Sector 18 Call Me: 8448380779Best VIP Call Girls Noida Sector 18 Call Me: 8448380779
Best VIP Call Girls Noida Sector 18 Call Me: 8448380779Delhi Call girls
 
The Economic History of the U.S. Lecture 18.pdf
The Economic History of the U.S. Lecture 18.pdfThe Economic History of the U.S. Lecture 18.pdf
The Economic History of the U.S. Lecture 18.pdfGale Pooley
 
Call Girls Koregaon Park Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Koregaon Park Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Koregaon Park Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Koregaon Park Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
The Economic History of the U.S. Lecture 30.pdf
The Economic History of the U.S. Lecture 30.pdfThe Economic History of the U.S. Lecture 30.pdf
The Economic History of the U.S. Lecture 30.pdfGale Pooley
 
20240429 Calibre April 2024 Investor Presentation.pdf
20240429 Calibre April 2024 Investor Presentation.pdf20240429 Calibre April 2024 Investor Presentation.pdf
20240429 Calibre April 2024 Investor Presentation.pdfAdnet Communications
 
03_Emmanuel Ndiaye_Degroof Petercam.pptx
03_Emmanuel Ndiaye_Degroof Petercam.pptx03_Emmanuel Ndiaye_Degroof Petercam.pptx
03_Emmanuel Ndiaye_Degroof Petercam.pptxFinTech Belgium
 
Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...
Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...
Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...shivangimorya083
 
High Class Call Girls Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
High Class Call Girls Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsHigh Class Call Girls Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
High Class Call Girls Nagpur Grishma Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Vip Call US 📞 7738631006 ✅Call Girls In Sakinaka ( Mumbai )
Vip Call US 📞 7738631006 ✅Call Girls In Sakinaka ( Mumbai )Vip Call US 📞 7738631006 ✅Call Girls In Sakinaka ( Mumbai )
Vip Call US 📞 7738631006 ✅Call Girls In Sakinaka ( Mumbai )Pooja Nehwal
 
The Economic History of the U.S. Lecture 19.pdf
The Economic History of the U.S. Lecture 19.pdfThe Economic History of the U.S. Lecture 19.pdf
The Economic History of the U.S. Lecture 19.pdfGale Pooley
 
The Economic History of the U.S. Lecture 20.pdf
The Economic History of the U.S. Lecture 20.pdfThe Economic History of the U.S. Lecture 20.pdf
The Economic History of the U.S. Lecture 20.pdfGale Pooley
 
Q3 2024 Earnings Conference Call and Webcast Slides
Q3 2024 Earnings Conference Call and Webcast SlidesQ3 2024 Earnings Conference Call and Webcast Slides
Q3 2024 Earnings Conference Call and Webcast SlidesMarketing847413
 
Andheri Call Girls In 9825968104 Mumbai Hot Models
Andheri Call Girls In 9825968104 Mumbai Hot ModelsAndheri Call Girls In 9825968104 Mumbai Hot Models
Andheri Call Girls In 9825968104 Mumbai Hot Modelshematsharma006
 
Call Girls Service Nagpur Maya Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Maya Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Maya Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Maya Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Independent Call Girl Number in Kurla Mumbai📲 Pooja Nehwal 9892124323 💞 Full ...
Independent Call Girl Number in Kurla Mumbai📲 Pooja Nehwal 9892124323 💞 Full ...Independent Call Girl Number in Kurla Mumbai📲 Pooja Nehwal 9892124323 💞 Full ...
Independent Call Girl Number in Kurla Mumbai📲 Pooja Nehwal 9892124323 💞 Full ...Pooja Nehwal
 
Instant Issue Debit Cards - School Designs
Instant Issue Debit Cards - School DesignsInstant Issue Debit Cards - School Designs
Instant Issue Debit Cards - School Designsegoetzinger
 
Dividend Policy and Dividend Decision Theories.pptx
Dividend Policy and Dividend Decision Theories.pptxDividend Policy and Dividend Decision Theories.pptx
Dividend Policy and Dividend Decision Theories.pptxanshikagoel52
 

Recently uploaded (20)

(DIYA) Bhumkar Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(DIYA) Bhumkar Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(DIYA) Bhumkar Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(DIYA) Bhumkar Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdf
06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdf06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdf
06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdf
 
Best VIP Call Girls Noida Sector 18 Call Me: 8448380779
Best VIP Call Girls Noida Sector 18 Call Me: 8448380779Best VIP Call Girls Noida Sector 18 Call Me: 8448380779
Best VIP Call Girls Noida Sector 18 Call Me: 8448380779
 
The Economic History of the U.S. Lecture 18.pdf
The Economic History of the U.S. Lecture 18.pdfThe Economic History of the U.S. Lecture 18.pdf
The Economic History of the U.S. Lecture 18.pdf
 
Call Girls Koregaon Park Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Koregaon Park Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Koregaon Park Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Koregaon Park Call Me 7737669865 Budget Friendly No Advance Booking
 
The Economic History of the U.S. Lecture 30.pdf
The Economic History of the U.S. Lecture 30.pdfThe Economic History of the U.S. Lecture 30.pdf
The Economic History of the U.S. Lecture 30.pdf
 
20240429 Calibre April 2024 Investor Presentation.pdf
20240429 Calibre April 2024 Investor Presentation.pdf20240429 Calibre April 2024 Investor Presentation.pdf
20240429 Calibre April 2024 Investor Presentation.pdf
 
03_Emmanuel Ndiaye_Degroof Petercam.pptx
03_Emmanuel Ndiaye_Degroof Petercam.pptx03_Emmanuel Ndiaye_Degroof Petercam.pptx
03_Emmanuel Ndiaye_Degroof Petercam.pptx
 
Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...
Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...
Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...
 
High Class Call Girls Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
High Class Call Girls Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsHigh Class Call Girls Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
High Class Call Girls Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
 
Vip Call US 📞 7738631006 ✅Call Girls In Sakinaka ( Mumbai )
Vip Call US 📞 7738631006 ✅Call Girls In Sakinaka ( Mumbai )Vip Call US 📞 7738631006 ✅Call Girls In Sakinaka ( Mumbai )
Vip Call US 📞 7738631006 ✅Call Girls In Sakinaka ( Mumbai )
 
The Economic History of the U.S. Lecture 19.pdf
The Economic History of the U.S. Lecture 19.pdfThe Economic History of the U.S. Lecture 19.pdf
The Economic History of the U.S. Lecture 19.pdf
 
The Economic History of the U.S. Lecture 20.pdf
The Economic History of the U.S. Lecture 20.pdfThe Economic History of the U.S. Lecture 20.pdf
The Economic History of the U.S. Lecture 20.pdf
 
Q3 2024 Earnings Conference Call and Webcast Slides
Q3 2024 Earnings Conference Call and Webcast SlidesQ3 2024 Earnings Conference Call and Webcast Slides
Q3 2024 Earnings Conference Call and Webcast Slides
 
Andheri Call Girls In 9825968104 Mumbai Hot Models
Andheri Call Girls In 9825968104 Mumbai Hot ModelsAndheri Call Girls In 9825968104 Mumbai Hot Models
Andheri Call Girls In 9825968104 Mumbai Hot Models
 
Call Girls Service Nagpur Maya Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Maya Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Maya Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Maya Call 7001035870 Meet With Nagpur Escorts
 
Commercial Bank Economic Capsule - April 2024
Commercial Bank Economic Capsule - April 2024Commercial Bank Economic Capsule - April 2024
Commercial Bank Economic Capsule - April 2024
 
Independent Call Girl Number in Kurla Mumbai📲 Pooja Nehwal 9892124323 💞 Full ...
Independent Call Girl Number in Kurla Mumbai📲 Pooja Nehwal 9892124323 💞 Full ...Independent Call Girl Number in Kurla Mumbai📲 Pooja Nehwal 9892124323 💞 Full ...
Independent Call Girl Number in Kurla Mumbai📲 Pooja Nehwal 9892124323 💞 Full ...
 
Instant Issue Debit Cards - School Designs
Instant Issue Debit Cards - School DesignsInstant Issue Debit Cards - School Designs
Instant Issue Debit Cards - School Designs
 
Dividend Policy and Dividend Decision Theories.pptx
Dividend Policy and Dividend Decision Theories.pptxDividend Policy and Dividend Decision Theories.pptx
Dividend Policy and Dividend Decision Theories.pptx
 

WILL TWITTER SENTIMENT PREDICT STOCK MARKET MOVEMENT

  • 1. WILL TWITTER MAKE YOU A BETTER INVESTOR? A LOOK AT SENTIMENT, USER REPUTATION AND THEIR EFFECT ON THE STOCK MARKET Eric D. Brown
  • 2. Background • Sentiment has long been an underlying factor in the investing world • Consumer Confidence Index • Investors Intelligence Sentiment Index • “Market Sentiment” • Rather than waiting days, months or weeks, can the „sentiment of now‟ be used to improve trading performance and investing decisions? • Can Twitter be used to determine the „sentiment of now‟?
  • 3. Background The thoughts driving this research are: • Can analysis of publicly available Twitter Messages provide insight for decision making for investing? • Do Twitter messages (and their subsequent sentiment) have any effect on movement in the stock market? • Can Twitter messages be mined and analyzed to predict movements in the stock market? • Does a Twitter user‟s reputation have an effect on how people perceive and use their shared investing ideas?
  • 4. Research Method Stock Twitter Data Market Collection Data Price & Sentiment Social Volume Analysis Analysis Analysis Positive Correlation of sentiment Reputation of and message volume with Twitter user price/volume Understanding of predictive capabilities of Twitter Sentiment and the affect of user reputation investing decision support
  • 5. Research Method • Data Collection • Using Twitter API to collect tweets (tweet, sender, date, time) • Tweets referencing companies and sectors are collected and stored in a MySQL database for future study • Using the nomenclature made popular by StockTwits (www.stocktwits.com). Example: The stock symbol for Apple is AAPL. Users following the StockTwits nomenclature add a “$” to the symbol – “$AAPL”. • Stocktwits.com describes their purpose as a place to: • …share ideas, market insights and trades on stocks, futures and the market in general *. • Using Yahoo Finance data feed to gather Stock Market data (price and volume) • Provides historical data
  • 6. Research Method • Sentiment Analysis • Using a Naïve Bayesian text classification algorithm to determine sentiment of collected Tweets • Naïve Bayesian is being used for simplicity but also because many researchers have pointed out very minor differences between it and other sentiment analysis methods • A subset of the data collected has been manually assigned „sentiment‟ to build the necessary training dataset • Using the Python Natural Language Toolkit, the Bayesian classification is performed • For each tweet, the overall score is calculated and assigned. • Ideally, tweets will fall into +1 (Bullish), 0 (Neutral), -1 (Bearish) buckets.
  • 7. Sentiment Classification Process Bayes Training Dataset Classifier Trained Twitter Data Classifier Classified Twitter Messages
  • 8. Current Dataset • Twitter Dataset: • May 1 2011 to Dec 31 2011 • 473,901 Tweets • No deduplication performed • Training Dataset: • 5000 messages randomly selected from collected Tweets • Messages have been manually coded as Bullish, Bearish, Neutral or Spam • 544 Bullish (10.88%) • 638 Bearish (12.76%) • 3454 Neutral (69.08%) • 364 Spam (7.28%)
  • 9. Bullish Examples • Markets seems like consolidating before another rally higher $SPX $SPY $QQQ • $KFT - Kraft Foods Stock Analysis - CCI is bullish and rising • If this daily candle ends like this do not go short! 3 white soldiers (bullish) $SPX $SPY http://t.co/oWWUqnu • buy the dips WORKING :-) $ES_F $SPY
  • 10. Bearish Examples • $K - Kellogg Stock Analysis - bearish stochastic crossdown • $SPY - Might be trying to roll a little. • warning sign as $IWM didn‟t make a higher high, unlike $QQQ and $SPY • RT @grassosteve: $SPX Jobs although important only 1 aspect of the weakness in the mkts, i would sell pops still levels up 1216 1228 123 ...
  • 11. Neutral Examples • Stocks and Inflation: What the Market Is Really Telling Us http://seekingalpha.com/a/5tap $TLT $TLH $SPY • NEW POST: UPDATED- MEAN REVERSION TRADE ON THE RUSSELL 2000 http://bit.ly/l2Q1Rf $IWM $TNA $SPY $QQQ • Sold my $SPY Jun 03 2011 133.0 Puts for 78c made 7c • Durable Goods as a Leading Market Indicator http://seekingalpha.com/a/5u8j $DIA $SPY $QQQ
  • 12. Spam Examples • Hop up out the bed turn my $wag on. ;* • MILLIONAIRE SECRETS CLUB - MAKE $1 MILLION A YEAR http://goo.gl/2Yv8s $OXY $PBR $PDCO $pennystocks $POT $PRU $QCOM $QSII axsc • HOME TYPERS NEEDED - MAKE $1000s WEEKLY - PAYS DAILY http://goo.gl/hoNUn $SNP $SOHU $SPLS $SPRD $SPX $SPY $STO $stocks sg2 • UNLIMITED FREE TV SHOWS on YOUR PC - 12,000 FREE CHANNELS http://goo.gl/v55Nw $CBG $CBS $CF $CLR $CMCSA $COF $COP $CROX $CTIC qika
  • 13. Sentiment Analysis using full training Set • From May 1 to Dec 31 2011: All Tweets 473,901 Bullish 103,770 21.90% Bearish 84,454 17.82% Neutral 224,300 47.33% Spam 61,371 12.95% None 6 0.001%
  • 14. A look at the Market & Sentiment • I ran a short analysis on the S&P 500 ETF (SPY) between July 11 and August 11 2011 • This date range chosen mainly due to a very volatile movement down • 32 days of data • 26,307 tweets • On Jul 11 SPY is 131.40 • On Aug 11 SPY is 117.33
  • 15. S&P 500 ETF (SPY) - 7/11 to 8/11
  • 16. S&P 500 ETF (SPY) - 7/11 to 8/11
  • 17. S&P 500 ETF (SPY) - 7/11 to 8/11
  • 18. S&P 500 ETF (SPY) - 7/11 to 8/11
  • 19. But…is the Classifier accurate? • We can determine sentiment of a tweet…but is it really accurate? • The Python Natural Language Toolkit provides a method to determine accuracy of training dataset • Build training dataset as normal • Use training dataset as the “input data” • Run all messages through classifier and determine accuracy
  • 20. Sentiment Classification Process Bayes Training Dataset Classifier Trained Twitter Data Classifier Classified Twitter Messages
  • 21. Classification Accuracy Bayes Training Dataset Classifier Trained Training Dataset Classifier Accuracy
  • 22. Accuracy of Training Dataset • If you recall, our training dataset is: • 5000 messages randomly selected from collected Tweets • Messages have been manually coded as Bullish, Bearish, Neutral or Spam • 544 Bullish (10.88%) • 638 Bearish (12.76%) • 3454 Neutral (69.08%) • 364 Spam (7.28%) • Running the accuracy method of the Python NLTK delivers a 54.18% accuracy.
  • 23. How can we improve accuracy? • If we think about the theory behind the Bayesian classifier, it may shine some light on the inaccuracies. • The Bayesian Classifier is a probability based theory and is only as good as the data used to train. • Research suggests that having non-symmetric training data sets / features, can throw the Bayesian filter off. • The training dataset used is non-symmetric. • What if we create a symmetric dataset with the same number of Bullish, Bearish, Neutral and Spam data?
  • 24. Equivalent Sized Training Dataset • Rebuild Training Dataset • Randomly select 500 tweets from each training dataset • Re-run the accuracy method again. • Accuracy = 91.94% • An improvement from 54.18% to 91.94% • What will this improved accuracy do for the overall dataset?
  • 25. Sentiment Analysis using Symmetric training Set • From May 1 to Dec 31 2011: All Tweets 473,901 Bullish 110,141 23.24% Bearish 103,233 21.78% Neutral 212,509 44.84% Spam 48,012 10.13% None 6 0.001%
  • 26. S&P 500 ETF (SPY) - 7/11 to 8/11
  • 27. S&P 500 ETF (SPY) - 7/11 to 8/11
  • 28. S&P 500 ETF (SPY) - 7/11 to 8/11
  • 29. S&P 500 ETF (SPY) - 7/11 to 8/11
  • 30. Statiscally Speaking • There seems to be some correlation between twitter & Stock data • Correlation of TransformedBBN and Closing Price • Correlation=0.495 • P-Value = 0.004 • Correlation of Num Tweets and Daily Volume • Correlaton = 0.648 • P-Value = 0.000
  • 31. Correlations? • Basic Analysis shows some correlation between price/sentiment and volume/tweet volume. • Using Time Series analysis, a cross-correlation analysis can be completed to determine how these variables are related at different „lag‟ periods. • Using Cross-correlation analysis we can get the Cross- Correlation Coefficient (CCF) which gives us an idea of how well two variables are correlated at lag time r. • If a correlation is found a negative lag time r, that variable is a candidate for use in predicting the output variable.
  • 32. Correlations • Closing Price Cross Correlation Coefficient Lag Time r Variable -6 -5 -4 -3 -2 -1 0 Sentiment 0.299 0.399 0.599 0.620 0.663 0.599 0.495 Num Tweets -0.497 -0.461 -0.406 -0.533 -0.577 -0.627 -0.714 • Volume Cross Correlation Coefficient Lag Time r Variable -6 -5 -4 -3 -2 -1 0 Sentiment -0.307 -0.394 -0.527 -0.574 -0.617 -0.605 -0.557 Num Tweets 0.373 0.443 0.485 0.552 0.613 0.642 0.648
  • 33. What now? • Correlations exist between price/volume and sentiment/message volume. • The main reason I‟m looking at sentiment is to determine if it can somehow be used to predict price movement. • So…let‟s build a model using Linear Regression (note…linear regression isn‟t a likely fit but a good place to start). • I want the model to predict Closing Price…so let‟s start with a simple model using Sentiment only
  • 34. Building Models • Using Sentiment to predict Price: • The regression equation is • Closing Price = 133 + 47.3 TransformedBBN • The p-value is less than 0.05, so we should be good but R-Squared is 24.5…which tells me this isn‟t a very good model.
  • 35. Building Models (cont) • What other variables can be used? • Volume? • Volume change? • Number of Tweets? • Volatility measurements? • There are a lot of combinations…but let‟s keep it simple. Let‟s select the Number of Tweets and re-run the analysis.
  • 36. Building Models (cont) • Using Sentiment + Number of Tweets: • The regression equation is • Closing Price = 139 + 41.4 TransformedBBN - 0.00856 Num Tweets • P-values look good still and R-Squared is up to 69.6%.
  • 37. Building Models (cont) • What else can we add? • How about a measure of volatility? • One often quoted measure is the VIX. This is a measurement of implied volatility of S&P 500 index options. • Often referred to as the „fear index‟. • On 7/11, the VIX was 18.39 • On 8/11 the VIX was 39.00
  • 38. Building Models (cont) • Using Sentiment, Number of Tweets, the VIX Closing Price + the change in price for the VIX • The regression equation is • Closing Price = 148 - 0.00197 Num Tweets + 12.1 TransformedBBN - 0.713 VIX Closing Price + 0.128 VIX Price Change • P-values are good & R-Squared hits 97.3%...which tells us that this might be a good model.
  • 39. Checking the model • Our equation is: • Closing Price = 148 + 12.1 TransformedBBN - 0.00197 Num Tweets - 0.713 VIX Closing Price + 0.128 VIX Price Change • So…for August 11, our variables are: • TransformedBBN = -0.075 • Num Tweets = 1357 • VIX Closing Price = 39 • Vix Price Change = -3.99 • Our predicted closing price for August 12 is then: 116.65 • The prediction is for a move down from the August 11 price of 117.33 • The actual closing price from August 12 was 118.20 • Our directional prediction was incorrect and our price prediction was incorrect.
  • 40. Checking the Model • A few more predictions (50% accuracy) : • For August 15 – prediction is up. Price is 118.44 • Actual is 120.62. Price moved up. • For August 16 – prediction is up. Price is 121.69 • Actual is 119.59. Price moved down. • For August 17 – prediction is up. Price is 121.07 • Actual is 119.67. Price moved up. • For August 18 – prediction is up. Price is 121.73 • Actual is 114.51. Price moved down
  • 41. Checking the model • The model works for prediction direction on a few dates immediately…but about the rest of the time? • Looking at the rest of the year, there are 99 trading days. • For 53 days, the prediction was correct. • For 46 days, the prediction was incorrect. • The model gives a 53.54% accuracy. • Not great….but better than 50%. • With proper risk management of investments, a 3.54% “edge” on the market might be perfectly acceptable. • FYI - This model applied to a full 6 month data set gives an accuracy of 57.06%
  • 42. Checking the model • As a test, I removed the Tweet Sentiment and the Number of Tweets from consideration and only re-ran an analysis to create a model. • The model uses the VIX and VIX Price change only and gives R-Squared of 95.5%. • The regression equation is • Closing Price = 149 - 0.847 VIX Closing Price + 0.120 VIX Price Change • This model gives an accuracy of 48.48%....so there seems to some value in sentiment/tweet volume.
  • 43. Next Steps • There seems to be some correlation between twitter & Stock data • Begin building more complex predictive models using Time Series modeling and prediction methods (ARIMA, etc). • Continue analysis of sentiment and price movements. • Begin Social Network Analysis of twitter users for reputation, etc
  • 44. Thank you Eric D. Brown eric@ericbrown.com http://ericbrown.com