SlideShare una empresa de Scribd logo
1 de 40
Descargar para leer sin conexión
Peter Zadrozny
The contents of this
presentation are part of the
book Big Data Analytics
Using Splunk by Peter
Zadrozny and Raghu Kodali
Introduction
The technical side
The Splunk sentiment analysis app
The world sentiment indicator project
Conclusions
Agenda
Introduction
Sentiment Analysis
Is the process of examining text or speech to find out
the opinions, views or feelings of the author or speaker
This definition applies to a computer system
When a human does this, it s called reading
The words in the title describe highly subjective and
ambiguous concepts for a human
Even more challenging for a computer program
Opinions, Views, Beliefs, Convictions
Words or expressions have different meanings
depending on the knowledge domain (domain of
expertise)
Example: Go Around
Sarcasm, jokes, etc.
Domains of expertise usually have slang
Conclusion:
Sentiment is contextual and domain dependent
Opinions, Views, Beliefs, Convictions
Analysis tends to be done by
Domain of expertise
Media channel
Newspaper articles follow grammar rules, use proper words,
no orthographical mistakes
Tweets lack sentence structure, likely use slang, include
emoticons ( , ) and sometimes words are lengthened ( I
looooooove chocolate )
Sentiment Analysis
Companies want to know what their
Customers
Competitors
General public
Think about their
Products
Services
Brands
Usually associated with marketing and public relations
Commercial Uses
When done correctly, sentiment analysis is powerful
From Tweets to Polls: Linking Text Sentiment to Public
Opinion Time Series , O'Connor et al. 2010
Analysis of surveys on consumer confidence and political
opinion correlate to sentiment word frequencies in Twitter
by as much as 80%
These results highlight the potential of text streams as a
substitute and supplement for traditional polling.
Commercial Uses
When not well done
"The Hathaway Effect: How Anne Gives Warren Buffet a
Rise", Dan Mirvish, Huffington Post, 2011
Suspicions that some robotics trading programs in Wall
Street include sentiment analysis
Every time Anne Hathaway makes the headlines, the
stock of Warren Buffet s company Berkshire-Hathaway
goes up
Commercial Uses
The Technical Side
Sentiment Analysis is text categorization
The results fall into two categories
Polarity
Positive, negative, neutral
Range of polarity
Ratings or rankings
Example: 1 to 5 stars for movie reviews
The Technical Side
Extracting and categorizing sentiment is based on features
Frequency: Words that appear most often decide the polarity
Term Presence: Most unique words define polarity
N-Grams: The position of a word determines polarity
Parts of Speech: Adjectives define the polarity
Syntax: Attempts to analyze syntactic relations haven t been very
successful
Negation: Explicit negation terms reverse polarity
Text classifiers tend to use combinations of features
The Technical Side
To assign contextual polarity, you need a base
polarity
Use a lexicon, which provides a polarity for each word
Word Phrase Sentence Document
Use training documents
Preferred
The Technical Side
Training documents
Contain a number of sentences
Are classified with a specific polarity
Polarity for each word is based on a combination of
feature extractors and its appearance in the different
classifications
The more sentences, the more accurate
Results are placed in a model
The Technical Side
Machine learning tools
Naïve Bayes Classifier
Generally use N-grams, frequency, and term of presence. Sometimes
part of speech
Maximum Entropy
Bayes assumes each feature is independent, ME does not
Allows for overlap of words
Support Vector Machines
One vector per feature
Linear, polynomials, sigmoid and other functions are applied to the
vectors
The Technical Side
The Technical Side
TrainerNeutral
Negative
Positive
Training
Corpus
Model
TesterNeutral
Negative
Positive
Testing
Corpus
Processor
Accuracy &
Margin of Error
Document
Sentiment
The Splunk Sentiment Analysis App
Based on the Naïve Bayes Classifier
Has three commands
Sentiment
Language
Token
Includes a training/testing program and two models
Twitter: 190,862 positive and 37,469
IMDb
Range of polarity from 1 to 10
Each ranking has 11 movie reviews, averaging 200 words
The Splunk Sentiment Analysis App
index=twitter lang=en
| where like(text, %love% )
| sentiment twitter text
| stats avg(sentiment)
The Splunk Sentiment Analysis App
Love, Hate and Justin Beaver
index=twitter lang=en
| rename entities.hashtags{}.text as hashtags
| fields text, hashtags
| mvexpand hashtags
| where like(hastags, Beliebers )
| sentiment twitter text
| stats avg(sentiment)
The Beliebers Search
index=twitter lang=en
| rename entities.hashtags{}.text as hashtags
| fields text, hashtags
| mvexpand hashtags
| where like(hastags, Beliebers )
| sentiment twitter text
| stats avg(sentiment)
The Beliebers Search
So that we don t have to type
entities.hashtags{x}.text everytime we
want to refer to a hashtag, rename this
multi-value field to hashtags
index=twitter lang=en
| rename entities.hashtags{}.text as hashtags
| fields text, hashtags
| mvexpand hashtags
| where like(hastags, Beliebers )
| sentiment twitter text
| stats avg(sentiment)
The Beliebers Search
We only want the fields that contain the
tweet and the hashtags
index=twitter lang=en
| rename entities.hashtags{}.text as hashtags
| fields text, hashtags
| mvexpand hashtags
| where like(hastags, Beliebers )
| sentiment twitter text
| stats avg(sentiment)
The Beliebers Search
Expand the values of this multi-value
field into separate Splunk events
index=twitter lang=en
| rename entities.hashtags{}.text as hashtags
| fields text, hashtags
| mvexpand hashtags
| where like(hastags, Beliebers )
| sentiment twitter text
| stats avg(sentiment)
The Beliebers Search
The training corpus is key to accuracy
Beware: Naïve Bayes is not an exact algorithm
The best accuracy obtained using Naïve Bayes is
approximately 83%
Key factors to increase accuracy
Similarity to the data being analyzed
Size of the corpus
Training and Testing Data
Training and Testing Data
Test Data Size Accuracy Margin of
Error
University
of Michigan
1.5 Million 72.49% 1.05%
Splunk 228,000 68.79% 1.12%
Sanders 5,500 60.61% 0.76%
Love, Hate & Justin Bieber: Sanders Model
The World Sentiment Indicator Project
Based on news headlines
From news web sites all around the world
Collecting RSS feeds in English
The World Sentiment Indicator
Steps for this project
1. Collect the RSS feeds
2. Index the headlines into Splunk
3. Define the sentiment corpus
4. Create a visualization of the results
The World Sentiment Indicator
Collecting the RSS Feeds
Create your own
Crowd-source
University of Michigan ‒ Kaggle competition
Bootstrap
Twitter Sentiment Classification Using Distant Supervision , Go et al,
2010
Uses emoticons to classify tweets
Accuracy for unigrams and bigrams
Naïve Bayes 82.7%
Maximum Entropy 82.7%
Support Vector Machine 81.6%
Training Corpus Creation
Issues with subjectivity
Pope Benedict XVI announces resignation
Pope too frail to carry on
Pope steps down as head of Catholic church
Pope quits for health reasons
Average size of RSS headline 47.8 chars, 7.6 words
Twitter average 78 characters, 14 words
Training Corpus Considerations
Create a special corpus based on news headlines
Version 1: 100 positive, 100 negative, 100 neutral
Version 2: 200 positive, 200 negative, 200 neutral
Use an existing Twitter corpus
The one included with the Splunk app
University of Michigan
Use a movie review corpus
Pang & Lee: 1,000 positive, 1,000 negative
Training Corpus Strategy
Training Corpus Accuracy
Training Corpus Size Accuracy Margin of Error
Headlines V1 300 headlines 38.89% 1.02%
Headlines V2 600 headlines 47.22% 1.05%
Splunk Twitter 228,000 tweets 40.80% 1.16%
U of Michigan 1.5 million tweets 43.81 1.11%
Movie Reviews 2,000 reviews 36.79% 1.23%
The World Sentiment Indicator
The key to accuracy is the quality of the training data
Train with the same data you will analyze
Size of the training data improves accuracy
Subjectivity of crowd-sourcing tends to even out as the amount of
training data increases
All machine learning tools tend to converge to similar
levels of accuracy
Use the easiest one for you
Conclusions
Questions?

Más contenido relacionado

La actualidad más candente

Big Data + Sentiment Analysis = Awesome
Big Data + Sentiment Analysis = AwesomeBig Data + Sentiment Analysis = Awesome
Big Data + Sentiment Analysis = AwesomeAdel Rahimi
 
Sentiment analysis in Twitter on Big Data
Sentiment analysis in Twitter on Big DataSentiment analysis in Twitter on Big Data
Sentiment analysis in Twitter on Big DataIswarya M
 
sentiment analysis text extraction from social media
sentiment  analysis text extraction from social media sentiment  analysis text extraction from social media
sentiment analysis text extraction from social media Ravindra Chaudhary
 
Sentiment analyzer and opinion mining
Sentiment analyzer and opinion miningSentiment analyzer and opinion mining
Sentiment analyzer and opinion miningAnkush Mehta
 
Sentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse AnalysisSentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse Analysis Naveen Kumar
 
MTech Seminar Presentation [IIT-Bombay]
MTech Seminar Presentation [IIT-Bombay]MTech Seminar Presentation [IIT-Bombay]
MTech Seminar Presentation [IIT-Bombay]Sagar Ahire
 
Sentiment Analysis Using Twitter
Sentiment Analysis Using TwitterSentiment Analysis Using Twitter
Sentiment Analysis Using Twitterpiya chauhan
 
Datapedia Analysis Report
Datapedia Analysis ReportDatapedia Analysis Report
Datapedia Analysis ReportAbanoub Amgad
 
Twitter sentiment analysis
Twitter sentiment analysisTwitter sentiment analysis
Twitter sentiment analysisSunil Kandari
 
Sentiment Analaysis on Twitter
Sentiment Analaysis on TwitterSentiment Analaysis on Twitter
Sentiment Analaysis on TwitterNitish J Prabhu
 
Ontology based sentiment analysis
Ontology based sentiment analysisOntology based sentiment analysis
Ontology based sentiment analysisprathako
 
Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14Rachit Goel
 
Best Practices for Sentiment Analysis Webinar
Best Practices for Sentiment Analysis Webinar Best Practices for Sentiment Analysis Webinar
Best Practices for Sentiment Analysis Webinar Mechanical Turk
 
Query recommendation papers
Query recommendation papersQuery recommendation papers
Query recommendation papersAshish Kulkarni
 
Sentiment tool Project presentaion
Sentiment tool Project presentaionSentiment tool Project presentaion
Sentiment tool Project presentaionRavindra Chaudhary
 

La actualidad más candente (20)

Big Data + Sentiment Analysis = Awesome
Big Data + Sentiment Analysis = AwesomeBig Data + Sentiment Analysis = Awesome
Big Data + Sentiment Analysis = Awesome
 
Sentiment analysis in Twitter on Big Data
Sentiment analysis in Twitter on Big DataSentiment analysis in Twitter on Big Data
Sentiment analysis in Twitter on Big Data
 
Opinion Mining – Twitter
Opinion Mining – TwitterOpinion Mining – Twitter
Opinion Mining – Twitter
 
sentiment analysis text extraction from social media
sentiment  analysis text extraction from social media sentiment  analysis text extraction from social media
sentiment analysis text extraction from social media
 
Sentiment analyzer and opinion mining
Sentiment analyzer and opinion miningSentiment analyzer and opinion mining
Sentiment analyzer and opinion mining
 
Sentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse AnalysisSentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse Analysis
 
Natural Language Processing using Java
Natural Language Processing using JavaNatural Language Processing using Java
Natural Language Processing using Java
 
MTech Seminar Presentation [IIT-Bombay]
MTech Seminar Presentation [IIT-Bombay]MTech Seminar Presentation [IIT-Bombay]
MTech Seminar Presentation [IIT-Bombay]
 
Sentiment Analysis Using Twitter
Sentiment Analysis Using TwitterSentiment Analysis Using Twitter
Sentiment Analysis Using Twitter
 
Datapedia Analysis Report
Datapedia Analysis ReportDatapedia Analysis Report
Datapedia Analysis Report
 
Twitter sentiment analysis
Twitter sentiment analysisTwitter sentiment analysis
Twitter sentiment analysis
 
Sentiment Analaysis on Twitter
Sentiment Analaysis on TwitterSentiment Analaysis on Twitter
Sentiment Analaysis on Twitter
 
Sentimental Analysis of twitter data .
Sentimental Analysis of twitter data .Sentimental Analysis of twitter data .
Sentimental Analysis of twitter data .
 
Ontology based sentiment analysis
Ontology based sentiment analysisOntology based sentiment analysis
Ontology based sentiment analysis
 
SentiCircles for Contextual and Conceptual Semantic Sentiment Analysis of Twi...
SentiCircles for Contextual and Conceptual Semantic Sentiment Analysis of Twi...SentiCircles for Contextual and Conceptual Semantic Sentiment Analysis of Twi...
SentiCircles for Contextual and Conceptual Semantic Sentiment Analysis of Twi...
 
Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14
 
Best Practices for Sentiment Analysis Webinar
Best Practices for Sentiment Analysis Webinar Best Practices for Sentiment Analysis Webinar
Best Practices for Sentiment Analysis Webinar
 
Query recommendation papers
Query recommendation papersQuery recommendation papers
Query recommendation papers
 
Sentiment tool Project presentaion
Sentiment tool Project presentaionSentiment tool Project presentaion
Sentiment tool Project presentaion
 
Excel for grading with rubrics
Excel for grading with rubricsExcel for grading with rubrics
Excel for grading with rubrics
 

Destacado

jump! Future Junkie report Litfest 2016
jump! Future Junkie report Litfest 2016jump! Future Junkie report Litfest 2016
jump! Future Junkie report Litfest 2016jump! innovation
 
3 states of matter
3 states of matter3 states of matter
3 states of matterBrian Chan
 
Học tiêng anh
Học tiêng anhHọc tiêng anh
Học tiêng anhLethang
 
Biznesa vadības sistēmas
Biznesa vadības sistēmas Biznesa vadības sistēmas
Biznesa vadības sistēmas Ozols Grupa, Ltd
 
Energy Sector Competition in the Spotlight
Energy Sector Competition in the SpotlightEnergy Sector Competition in the Spotlight
Energy Sector Competition in the SpotlightBlpLaw1
 
Q1) In what ways does your media product use, develop or challenge forms and ...
Q1) In what ways does your media product use, develop or challenge forms and ...Q1) In what ways does your media product use, develop or challenge forms and ...
Q1) In what ways does your media product use, develop or challenge forms and ...Paige Armstrong
 
jump! goes to Dismaland rubberneck report
jump! goes to Dismaland rubberneck report jump! goes to Dismaland rubberneck report
jump! goes to Dismaland rubberneck report jump! innovation
 
C&c construction group social media
C&c construction group social mediaC&c construction group social media
C&c construction group social mediaBob Galarneau
 
Test for gcfsi
Test for gcfsiTest for gcfsi
Test for gcfsicindymcqt
 

Destacado (16)

Strokes Exhibits llc | Dubai |Credential presentation
Strokes Exhibits llc | Dubai |Credential presentationStrokes Exhibits llc | Dubai |Credential presentation
Strokes Exhibits llc | Dubai |Credential presentation
 
jump! Future Junkie report Litfest 2016
jump! Future Junkie report Litfest 2016jump! Future Junkie report Litfest 2016
jump! Future Junkie report Litfest 2016
 
Jaz
JazJaz
Jaz
 
3 states of matter
3 states of matter3 states of matter
3 states of matter
 
Học tiêng anh
Học tiêng anhHọc tiêng anh
Học tiêng anh
 
Biznesa vadības sistēmas
Biznesa vadības sistēmas Biznesa vadības sistēmas
Biznesa vadības sistēmas
 
rayane fasle 3 Ahmadraji 1391-92
rayane fasle 3 Ahmadraji 1391-92rayane fasle 3 Ahmadraji 1391-92
rayane fasle 3 Ahmadraji 1391-92
 
Tejidos animales y vegetales
Tejidos animales y vegetalesTejidos animales y vegetales
Tejidos animales y vegetales
 
Storyboard - A2
Storyboard - A2Storyboard - A2
Storyboard - A2
 
Energy Sector Competition in the Spotlight
Energy Sector Competition in the SpotlightEnergy Sector Competition in the Spotlight
Energy Sector Competition in the Spotlight
 
Gv ie application question l
Gv ie application question lGv ie application question l
Gv ie application question l
 
Q1) In what ways does your media product use, develop or challenge forms and ...
Q1) In what ways does your media product use, develop or challenge forms and ...Q1) In what ways does your media product use, develop or challenge forms and ...
Q1) In what ways does your media product use, develop or challenge forms and ...
 
jump! goes to Dismaland rubberneck report
jump! goes to Dismaland rubberneck report jump! goes to Dismaland rubberneck report
jump! goes to Dismaland rubberneck report
 
jibon dan
jibon danjibon dan
jibon dan
 
C&c construction group social media
C&c construction group social mediaC&c construction group social media
C&c construction group social media
 
Test for gcfsi
Test for gcfsiTest for gcfsi
Test for gcfsi
 

Similar a Experiences with Sentiment Analysis with Peter Zadrozny

Dictionary Based Approach to Sentiment Analysis - A Review
Dictionary Based Approach to Sentiment Analysis - A ReviewDictionary Based Approach to Sentiment Analysis - A Review
Dictionary Based Approach to Sentiment Analysis - A ReviewINFOGAIN PUBLICATION
 
Deep Machine Reading for Customer Analytics
Deep Machine Reading for Customer AnalyticsDeep Machine Reading for Customer Analytics
Deep Machine Reading for Customer AnalyticsNaveen Ashish
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
New sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumarNew sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumarRavi Kumar
 
SENTIMENT ANALYSIS-AN OBJECTIVE VIEW
SENTIMENT ANALYSIS-AN OBJECTIVE VIEWSENTIMENT ANALYSIS-AN OBJECTIVE VIEW
SENTIMENT ANALYSIS-AN OBJECTIVE VIEWJournal For Research
 
Supervised Sentiment Classification using DTDP algorithm
Supervised Sentiment Classification using DTDP algorithmSupervised Sentiment Classification using DTDP algorithm
Supervised Sentiment Classification using DTDP algorithmIJSRD
 
Sentiment+Analysis.ppt
Sentiment+Analysis.pptSentiment+Analysis.ppt
Sentiment+Analysis.pptvisheshs4
 
Aspect mining and sentiment association
Aspect mining and sentiment associationAspect mining and sentiment association
Aspect mining and sentiment associationKoushik Ramachandra
 
Veda Semantics - introduction document
Veda Semantics - introduction documentVeda Semantics - introduction document
Veda Semantics - introduction documentrajatkr
 
Sentiment analysis using ml
Sentiment analysis using mlSentiment analysis using ml
Sentiment analysis using mlPravin Katiyar
 
Sentiment Analysis Using Hybrid Approach: A Survey
Sentiment Analysis Using Hybrid Approach: A SurveySentiment Analysis Using Hybrid Approach: A Survey
Sentiment Analysis Using Hybrid Approach: A SurveyIJERA Editor
 
Streaming Analytics
Streaming AnalyticsStreaming Analytics
Streaming AnalyticsIJARIIT
 
Sentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes AlgorithmSentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes AlgorithmKhushboo Gupta
 
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...Dr. Amarjeet Singh
 
IRJET- Opinion Targets and Opinion Words Extraction for Online Reviews wi...
IRJET-  	  Opinion Targets and Opinion Words Extraction for Online Reviews wi...IRJET-  	  Opinion Targets and Opinion Words Extraction for Online Reviews wi...
IRJET- Opinion Targets and Opinion Words Extraction for Online Reviews wi...IRJET Journal
 
Boost Your Text Analytics Accuracy - MeaningCloud Webinar
Boost Your Text Analytics Accuracy - MeaningCloud WebinarBoost Your Text Analytics Accuracy - MeaningCloud Webinar
Boost Your Text Analytics Accuracy - MeaningCloud WebinarMeaningCloud
 

Similar a Experiences with Sentiment Analysis with Peter Zadrozny (20)

Dictionary Based Approach to Sentiment Analysis - A Review
Dictionary Based Approach to Sentiment Analysis - A ReviewDictionary Based Approach to Sentiment Analysis - A Review
Dictionary Based Approach to Sentiment Analysis - A Review
 
Deep Machine Reading for Customer Analytics
Deep Machine Reading for Customer AnalyticsDeep Machine Reading for Customer Analytics
Deep Machine Reading for Customer Analytics
 
Key Phrases for Better Search
Key Phrases for Better SearchKey Phrases for Better Search
Key Phrases for Better Search
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
New sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumarNew sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumar
 
SENTIMENT ANALYSIS-AN OBJECTIVE VIEW
SENTIMENT ANALYSIS-AN OBJECTIVE VIEWSENTIMENT ANALYSIS-AN OBJECTIVE VIEW
SENTIMENT ANALYSIS-AN OBJECTIVE VIEW
 
Supervised Sentiment Classification using DTDP algorithm
Supervised Sentiment Classification using DTDP algorithmSupervised Sentiment Classification using DTDP algorithm
Supervised Sentiment Classification using DTDP algorithm
 
Sentiment+Analysis.ppt
Sentiment+Analysis.pptSentiment+Analysis.ppt
Sentiment+Analysis.ppt
 
Aspect mining and sentiment association
Aspect mining and sentiment associationAspect mining and sentiment association
Aspect mining and sentiment association
 
Veda Semantics - introduction document
Veda Semantics - introduction documentVeda Semantics - introduction document
Veda Semantics - introduction document
 
Sentiment analysis using ml
Sentiment analysis using mlSentiment analysis using ml
Sentiment analysis using ml
 
Sentiment Analysis Using Hybrid Approach: A Survey
Sentiment Analysis Using Hybrid Approach: A SurveySentiment Analysis Using Hybrid Approach: A Survey
Sentiment Analysis Using Hybrid Approach: A Survey
 
Streaming Analytics
Streaming AnalyticsStreaming Analytics
Streaming Analytics
 
Sentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes AlgorithmSentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes Algorithm
 
Web Search Basics for Writers
Web Search Basics for WritersWeb Search Basics for Writers
Web Search Basics for Writers
 
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...
 
IRJET- Opinion Targets and Opinion Words Extraction for Online Reviews wi...
IRJET-  	  Opinion Targets and Opinion Words Extraction for Online Reviews wi...IRJET-  	  Opinion Targets and Opinion Words Extraction for Online Reviews wi...
IRJET- Opinion Targets and Opinion Words Extraction for Online Reviews wi...
 
N01741100102
N01741100102N01741100102
N01741100102
 
Sph 107 Ch 8
Sph 107 Ch 8Sph 107 Ch 8
Sph 107 Ch 8
 
Boost Your Text Analytics Accuracy - MeaningCloud Webinar
Boost Your Text Analytics Accuracy - MeaningCloud WebinarBoost Your Text Analytics Accuracy - MeaningCloud Webinar
Boost Your Text Analytics Accuracy - MeaningCloud Webinar
 

Último

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 

Último (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 

Experiences with Sentiment Analysis with Peter Zadrozny

  • 2. The contents of this presentation are part of the book Big Data Analytics Using Splunk by Peter Zadrozny and Raghu Kodali
  • 3. Introduction The technical side The Splunk sentiment analysis app The world sentiment indicator project Conclusions Agenda
  • 5. Sentiment Analysis Is the process of examining text or speech to find out the opinions, views or feelings of the author or speaker This definition applies to a computer system When a human does this, it s called reading The words in the title describe highly subjective and ambiguous concepts for a human Even more challenging for a computer program Opinions, Views, Beliefs, Convictions
  • 6. Words or expressions have different meanings depending on the knowledge domain (domain of expertise) Example: Go Around Sarcasm, jokes, etc. Domains of expertise usually have slang Conclusion: Sentiment is contextual and domain dependent Opinions, Views, Beliefs, Convictions
  • 7. Analysis tends to be done by Domain of expertise Media channel Newspaper articles follow grammar rules, use proper words, no orthographical mistakes Tweets lack sentence structure, likely use slang, include emoticons ( , ) and sometimes words are lengthened ( I looooooove chocolate ) Sentiment Analysis
  • 8. Companies want to know what their Customers Competitors General public Think about their Products Services Brands Usually associated with marketing and public relations Commercial Uses
  • 9. When done correctly, sentiment analysis is powerful From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series , O'Connor et al. 2010 Analysis of surveys on consumer confidence and political opinion correlate to sentiment word frequencies in Twitter by as much as 80% These results highlight the potential of text streams as a substitute and supplement for traditional polling. Commercial Uses
  • 10. When not well done "The Hathaway Effect: How Anne Gives Warren Buffet a Rise", Dan Mirvish, Huffington Post, 2011 Suspicions that some robotics trading programs in Wall Street include sentiment analysis Every time Anne Hathaway makes the headlines, the stock of Warren Buffet s company Berkshire-Hathaway goes up Commercial Uses
  • 12. Sentiment Analysis is text categorization The results fall into two categories Polarity Positive, negative, neutral Range of polarity Ratings or rankings Example: 1 to 5 stars for movie reviews The Technical Side
  • 13. Extracting and categorizing sentiment is based on features Frequency: Words that appear most often decide the polarity Term Presence: Most unique words define polarity N-Grams: The position of a word determines polarity Parts of Speech: Adjectives define the polarity Syntax: Attempts to analyze syntactic relations haven t been very successful Negation: Explicit negation terms reverse polarity Text classifiers tend to use combinations of features The Technical Side
  • 14. To assign contextual polarity, you need a base polarity Use a lexicon, which provides a polarity for each word Word Phrase Sentence Document Use training documents Preferred The Technical Side
  • 15. Training documents Contain a number of sentences Are classified with a specific polarity Polarity for each word is based on a combination of feature extractors and its appearance in the different classifications The more sentences, the more accurate Results are placed in a model The Technical Side
  • 16. Machine learning tools Naïve Bayes Classifier Generally use N-grams, frequency, and term of presence. Sometimes part of speech Maximum Entropy Bayes assumes each feature is independent, ME does not Allows for overlap of words Support Vector Machines One vector per feature Linear, polynomials, sigmoid and other functions are applied to the vectors The Technical Side
  • 18. The Splunk Sentiment Analysis App
  • 19. Based on the Naïve Bayes Classifier Has three commands Sentiment Language Token Includes a training/testing program and two models Twitter: 190,862 positive and 37,469 IMDb Range of polarity from 1 to 10 Each ranking has 11 movie reviews, averaging 200 words The Splunk Sentiment Analysis App
  • 20. index=twitter lang=en | where like(text, %love% ) | sentiment twitter text | stats avg(sentiment) The Splunk Sentiment Analysis App
  • 21. Love, Hate and Justin Beaver
  • 22. index=twitter lang=en | rename entities.hashtags{}.text as hashtags | fields text, hashtags | mvexpand hashtags | where like(hastags, Beliebers ) | sentiment twitter text | stats avg(sentiment) The Beliebers Search
  • 23. index=twitter lang=en | rename entities.hashtags{}.text as hashtags | fields text, hashtags | mvexpand hashtags | where like(hastags, Beliebers ) | sentiment twitter text | stats avg(sentiment) The Beliebers Search So that we don t have to type entities.hashtags{x}.text everytime we want to refer to a hashtag, rename this multi-value field to hashtags
  • 24. index=twitter lang=en | rename entities.hashtags{}.text as hashtags | fields text, hashtags | mvexpand hashtags | where like(hastags, Beliebers ) | sentiment twitter text | stats avg(sentiment) The Beliebers Search We only want the fields that contain the tweet and the hashtags
  • 25. index=twitter lang=en | rename entities.hashtags{}.text as hashtags | fields text, hashtags | mvexpand hashtags | where like(hastags, Beliebers ) | sentiment twitter text | stats avg(sentiment) The Beliebers Search Expand the values of this multi-value field into separate Splunk events
  • 26. index=twitter lang=en | rename entities.hashtags{}.text as hashtags | fields text, hashtags | mvexpand hashtags | where like(hastags, Beliebers ) | sentiment twitter text | stats avg(sentiment) The Beliebers Search
  • 27. The training corpus is key to accuracy Beware: Naïve Bayes is not an exact algorithm The best accuracy obtained using Naïve Bayes is approximately 83% Key factors to increase accuracy Similarity to the data being analyzed Size of the corpus Training and Testing Data
  • 28. Training and Testing Data Test Data Size Accuracy Margin of Error University of Michigan 1.5 Million 72.49% 1.05% Splunk 228,000 68.79% 1.12% Sanders 5,500 60.61% 0.76%
  • 29. Love, Hate & Justin Bieber: Sanders Model
  • 30. The World Sentiment Indicator Project
  • 31. Based on news headlines From news web sites all around the world Collecting RSS feeds in English The World Sentiment Indicator
  • 32. Steps for this project 1. Collect the RSS feeds 2. Index the headlines into Splunk 3. Define the sentiment corpus 4. Create a visualization of the results The World Sentiment Indicator
  • 34. Create your own Crowd-source University of Michigan ‒ Kaggle competition Bootstrap Twitter Sentiment Classification Using Distant Supervision , Go et al, 2010 Uses emoticons to classify tweets Accuracy for unigrams and bigrams Naïve Bayes 82.7% Maximum Entropy 82.7% Support Vector Machine 81.6% Training Corpus Creation
  • 35. Issues with subjectivity Pope Benedict XVI announces resignation Pope too frail to carry on Pope steps down as head of Catholic church Pope quits for health reasons Average size of RSS headline 47.8 chars, 7.6 words Twitter average 78 characters, 14 words Training Corpus Considerations
  • 36. Create a special corpus based on news headlines Version 1: 100 positive, 100 negative, 100 neutral Version 2: 200 positive, 200 negative, 200 neutral Use an existing Twitter corpus The one included with the Splunk app University of Michigan Use a movie review corpus Pang & Lee: 1,000 positive, 1,000 negative Training Corpus Strategy
  • 37. Training Corpus Accuracy Training Corpus Size Accuracy Margin of Error Headlines V1 300 headlines 38.89% 1.02% Headlines V2 600 headlines 47.22% 1.05% Splunk Twitter 228,000 tweets 40.80% 1.16% U of Michigan 1.5 million tweets 43.81 1.11% Movie Reviews 2,000 reviews 36.79% 1.23%
  • 38. The World Sentiment Indicator
  • 39. The key to accuracy is the quality of the training data Train with the same data you will analyze Size of the training data improves accuracy Subjectivity of crowd-sourcing tends to even out as the amount of training data increases All machine learning tools tend to converge to similar levels of accuracy Use the easiest one for you Conclusions