SlideShare una empresa de Scribd logo
1 de 39
Descargar para leer sin conexión
Dr. Diana Maynard
University of Sheffield, UK
Do we really know what people
mean when they tweet?
We are all connected to each other...
● Information,
thoughts and
opinions are
shared
prolifically on
the social web
these days
● 72% of online
adults use social
networking sites
Your grandmother is three times as likely to
use a social networking site now as in 2009
●
In Britain and the
US, approx 1 hour
a day on social media
●
30 % of Americans get
all their news from
Facebook
●
Facebook has more
users than the whole of
the Internet in 2005
●
40% of Twitter users
don't actually tweet
anything
What are people reading about?
●
Of the top 10 Twitter accounts
with the highest number of
followers:
●
7 pop stars
●
2 social media sites
●
and Barack Obama
●
As you can imagine, there's a lot
of mindless drivel on social
media sites
There can be surprising value in mindless drivel
Opinion mining is about finding out what
people think...
Voice of the Customer
●
Someone who wants to buy a camera
●
Looks for comments and reviews
●
Someone who just bought a camera
●
Comments on it
●
Writes about their experience
●
Camera Manufacturer
●
Gets feedback from customer
●
Improves their products
●
Adjusts marketing strategies
More than just analysing product reviews
●
Understanding customer reviews and so on is a huge
business
●
But also:
●
Tracking political opinions: what events make people
change their minds?
●
How does public mood influence the stock market,
consumer choices etc?
●
How are opinions distributed in relation to
demographics?
●
NLP tools are crucial in order to make sense of all the
information
“Climate change? Oh great!” says Justin Bieber
What else do we want to investigate?
●
Decarbonet project: investigating public perception of
climate change
●
How do opinions change over time and what causes these
changes?
●
How can we influence opinion?
●
Sarcasm detection
●
What social media content should be preserved/forgotten?
Interestingness of opinions (ARCOMEM and ForgetIt
projects)
But there are lots of tools that “analyse”
social media already....
●
Streamcrab http://www.streamcrab.com/
●
Semantria http://semantria.com
●
Social Mention http://socialmention.com/
●
Sentiment140: http://www.sentiment140.com/
●
TipTop: http://feeltiptop.com/
Why are these sites unsuccessful?
●
They don't work well at more than a very basic level
●
They mainly use dictionary lookup for positive and
negative words
●
Or they use ML, which only works for text that's similar in
style to the training data
●
They classify the tweets as positive or negative, but not
with respect to the keyword you're searching for
●
keyword search just retrieves any tweet mentioning it,
but not necessarily about it as a topic
●
no correlation between the keyword and the sentiment
“Positive” tweets about fracking
●
Help me stop fracking. Sign the petition to David
Cameron for a #frack-free UK now!
●
I'll take it as a sign that the gods applaud my new anti-
fracking country love song.
●
#Cameron wants to change the law to allow #fracking
under homes without permission. Tell him NO!!!!!
It's not just about looking at sentiment words
“It's a great movie if you have the taste and sensibilities of a 5-year-
old boy.”
“I'm not happy that John did so well in the debate last night.”
“I'd have liked the film if it had been shorter.”
“You're an idiot.”
●
Conditional statements, content clauses, situational context can
all play a role
Whitney Houston wasn't very popular...
Death confuses opinion mining tools

Opinion mining
tools are good for a
general overview,
but not for some
situations
Opinion spamming
Spam opinion detection (fake reviews)
●
Sometimes people get paid to post “spam” opinions supporting
a product, organisation or even government
●
An article in the New York Times discussed one such company
who gave big discounts to post a 5-star review about the
product on Amazon
●
http://www.nytimes.com/2012/01/27/technology/for-2-a-star-
a-retailer-gets-5-star-reviews.html?_r=3&ref=business
●
Could be either positive or negative opinions
●
Generally, negative opinions are more damaging than positive
ones
How to detect fake opinions?
●
Review content: lexical features, content and style
inconsistencies from the same user, or similarities
between different users
●
Complex relationships between reviews, reviewers
and products
●
Publicly available information about posters (time
posted, posting frequency etc)
Irony and sarcasm
• I had never seen snow in Holland before but thanks to twitter and
facebook I now know what it looks like. Thanks guys, awesome!
• Life's too short, so be sure to read as many articles about
celebrity breakups as possible.
• I feel like there aren't enough singing competitions on TV .
#sarcasmexplosion
• I wish I was cool enough to stalk my ex-boyfriend ! #sarcasm
#bitchtweet
• On a bright note if downing gets injured we have Henderson to
come in
How do you know when someone is being
sarcastic?
●
Use of hashtags in tweets such as #sarcasm, emoticons etc.
●
Large collections of tweets based on hashtags can be used to
make a training set for machine learning
●
But you still have to know which bit of the tweet is the sarcastic
bit
Man , I hate when I get those chain letters & I don't resend them , then
I die the next day .. #Sarcasm
I am not happy that I woke up at 5:15 this morning. #greatstart
#sarcasm
You are really mature. #lying #sarcasm
Case study:
Rule-based Opinion Mining on Tweets
Why Rule-based?
●
Although ML applications are typically used for
Opinion Mining, there are advantages to using a rule-
based approach when training data isn't easily
available
●
For example, working with multiple languages and/or
domains
●
Rule-based system is more easily adaptable
●
Novel use of language and grammar makes ML hard
GATE Components
●
TwitIE
●
structural and linguistic pre-processing, specific
to Twitter
●
includes language detection, hashtag
retokenisation, POS tagging, NER
●
(Optional) term recognition using TermRaider
●
Sentiment gazetteer lookup
●
JAPE opinion detection grammars
●
(Optional) aggregation of opinions
●
includes opinion interestingness component
Basic approach for opinion finding
●
Find sentiment-containing words in a linguistic
relation with terms/entities (opinion-target
matching)
●
Dictionaries give a starting score for sentiment words
●
Use a number of linguistic sub-components to deal
with issues such as negatives, adverbial modification,
swear words, conditionals, sarcasm etc.
●
Starting from basic sentiment lookup, we then adjust
the scores and polarity of the opinions via these
components
A positive sentiment list
●
awesome category=adjective score=0.5
●
beaming category=adjective score=0.5
●
becharm category=verb score=0.5
●
belonging category=noun score=0.5
●
benefic category=adjective score=0.5
●
benevolentlycategory=adverb score=0.5
●
caring category=noun score=0.5
●
charitable category=adjective score=0.5
●
charm category=verb score=0.5
A negative sentiment list
Examples of phrases following the word “go”:
●
down the pan
●
down the drain
●
to the dogs
●
downhill
●
pear-shaped
Opinion scoring
●
Sentiment gazetteers (developed from sentiment words in
WordNet) have a starting “strength” score
●
These get modified by context words, e.g. adverbs, swear
words, negatives and so on
The film was awesome --> The film was really amazing.
The film was awful --> The film was absolutely awful..
The film was good --> The film was not so good.
●
Swear words modifying adjectives count as intensifiers
The film was good --> The film was damned good.
●
Swear words on their own are classified as negative
Damned politicians.
Example: Opinions on Greek Crisis
Using Machine Learning for the task
●
If we can collect enough manually annotated training data, we
can also use an ML approach for this task
●
Product reviews: use star-based rating (but these have flaws)
●
Other domains, e.g. politics: classify sentences or tweets (the
ML instances), many of which do not contain opinions.
●
So the ML classes will be positive, neutral and negative
●
(Some people classify neutral and no opinion as distinct classes,
but we find the distinction too difficult to make reliably)
Training on tweets
●
You can use hashtags as a source of classes
●
Example: collect a set of tweets with the #angry
tag, and a set without it, and delete from the
second set any tweets that look angry
●
Remove the #angry tag from the text in the first
set (so you're not just training the ML to spot the
tag)
●
You now have a corpus of manually annotated
angry/non-angry data
●
This approach can work well, but if you have huge
datasets, you may not be able to do the manual deletions
●
You can also train on things like #sarcasm and #irony
Summary
●
Opinion mining is hard and therefore error-prone (despite
what vendors will tell you about how great their product is)
●
There are many types of sentiment analysis, and many different
uses, each requiring a different solution
●
It's very unlikely that an off-the-shelf tool will do exactly what
you want, and even if it does, performance may be low
●
Opinion mining tools need to be customised to the task and
domain
●
Anything that involves processing social media (especially
messy stuff like Facebook posts and Twitter) is even harder, and
likely to have lower performance
●
For tasks that mainly look at aggregated data, this isn't such an
issue, but for getting specific sentiment on individual
posts/reviews etc, this is more problematic
So where does this leave us?
●
Opinion mining is ubiquitous, but it's still far from perfect,
especially on social media
●
There are lots of linguistic and social quirks that fool
sentiment analysis tools.
●
The good news is that this means there are lots of
interesting problems for us to research
●
And it doesn’t mean we shouldn’t use existing opinion
mining tools
●
The benefits of a modular approach mean that we can pick
the bits that are most useful
●
Take-away message: use the right tool for the right job
More information
●
GATE http://gate.ac.uk (general info, download, tutorials, demos,
references etc)
●
The EU-funded Decarbonet and TrendMiner projects are dealing with lots
of issues about opinion and trend mining from social media
●
http://www.decarbonet.eu
●
http://www.trendminer-project.eu/
●
Tutorials
●
Module 12 of the annual GATE training course: “Opinion
Mining” (2013 version)
http://gate.ac.uk/wiki/TrainingCourseJune2013/
●
Module 14 of the annual GATE training course: “GATE for
social media mining”
Some GATE-related opinion mining papers
(available from http://gate.ac.uk/gate/doc/papers.html)
●
Diana Maynard, Gerhard Gossen, Marco Fisichella, Adam Funk. Should I care about
your opinion? Detection of opinion interestingness and dynamics in social media. To
appear in Journal of Future Internet, Special Issue on Archiving Community Memories,
2014.
●
Diana Maynard and Mark A. Greenwood. Who cares about sarcastic tweets?
Investigating the impact of sarcasm on sentiment analysis. Proc. of LREC 2014,
Reykjavik, Iceland, May 2014.
●
Diana Maynard, David Dupplaw, Jonathon Hare. Multimodal Sentiment Analysis of
Social Media. Proc. of BCS SGAI Workshop on Social Media Analysis, Dec 2013
●
D. Maynard and K. Bontcheva and D. Rout. Challenges in developing opinion mining
tools for social media. In Proceedings of @NLP can u tag #usergeneratedcontent?!
Workshop at LREC 2012, May 2012, Istanbul, Turkey.
●
D. Maynard and A. Funk. Automatic detection of political opinions in tweets. In Raúl
García-Castro, Dieter Fensel and Grigoris Antoniou (eds.) The Semantic Web: ESWC
2011 Selected Workshop Papers, Lecture Notes in Computer Science, Springer, 2011.
●
H.Saggion, A.Funk: Extracting Opinions and Facts for Business Intelligence. Revue
des Nouvelles Technologies de l’Information (RNTI), no. E-17 pp119-146; 2009.
Some demos to try
●
http://sentiment.christopherpotts.net/lexicon/
Get sentiment scores for single words from a variety of
sentiment lexicons
●
http://sentiment.christopherpotts.net/textscores/
Show how a variety of lexicons score novel texts
●
http://sentiment.christopherpotts.net/classify/
Classify tweets according to various probabilistic classifier
models
●
http://demos.gate.ac.uk/arcomem/opinions/
Find and classify opinionated text, using GATE-based
ARCOMEM system
Questions?

Más contenido relacionado

Similar a Understanding Sentiment in Social Media: Challenges and Opportunities

Multimodal opinion mining from social media
Multimodal opinion mining from social mediaMultimodal opinion mining from social media
Multimodal opinion mining from social mediaDiana Maynard
 
Can Social Media Analysis Improve Collective Awareness of Climate Change?
Can Social Media Analysis Improve Collective Awareness of Climate Change?Can Social Media Analysis Improve Collective Awareness of Climate Change?
Can Social Media Analysis Improve Collective Awareness of Climate Change?Diana Maynard
 
Social Media: Friend or Foe (May 2014)
Social Media: Friend or Foe (May 2014)Social Media: Friend or Foe (May 2014)
Social Media: Friend or Foe (May 2014)nilamapatel
 
Social media basics for college
Social media basics for collegeSocial media basics for college
Social media basics for collegeGenevieve Howard
 
Twitter Presentation
Twitter PresentationTwitter Presentation
Twitter PresentationChris Hunter
 
Doing customer development (and stop wasting your time) - StartupBus edition
Doing customer development (and stop wasting your time) -  StartupBus editionDoing customer development (and stop wasting your time) -  StartupBus edition
Doing customer development (and stop wasting your time) - StartupBus editionHans van Gent
 
Website 102 google slides
Website 102   google slidesWebsite 102   google slides
Website 102 google slidesMary Heston
 
Aardman Social Media Training Slides
Aardman Social Media Training SlidesAardman Social Media Training Slides
Aardman Social Media Training SlidesNoisy Little Monkey
 
Social Media for Professional Learning
Social Media for Professional LearningSocial Media for Professional Learning
Social Media for Professional LearningSandy Kendell
 
Effectively Using Technology for Leading and Learning
Effectively Using Technology for Leading and LearningEffectively Using Technology for Leading and Learning
Effectively Using Technology for Leading and LearningBobby Dodd
 
Best Practices for Creating Digital Content to Share in Social Media
Best Practices for Creating Digital Content to Share in Social MediaBest Practices for Creating Digital Content to Share in Social Media
Best Practices for Creating Digital Content to Share in Social MediaAlex Garrido
 
#sbIMPACT: Getting Started with Social Media
#sbIMPACT: Getting Started with Social Media#sbIMPACT: Getting Started with Social Media
#sbIMPACT: Getting Started with Social MediaMel DePaoli, MBA
 
Using Social Media as a Project Management Professional
Using Social Media as a Project Management ProfessionalUsing Social Media as a Project Management Professional
Using Social Media as a Project Management ProfessionalBurriss Consulting, Inc.
 
Challenges of social media analysis in the real world
Challenges of social media analysis in the real worldChallenges of social media analysis in the real world
Challenges of social media analysis in the real worldDiana Maynard
 
Social media for the voice actor
Social media for the voice actorSocial media for the voice actor
Social media for the voice actorvoicesforall
 

Similar a Understanding Sentiment in Social Media: Challenges and Opportunities (20)

Multimodal opinion mining from social media
Multimodal opinion mining from social mediaMultimodal opinion mining from social media
Multimodal opinion mining from social media
 
Can Social Media Analysis Improve Collective Awareness of Climate Change?
Can Social Media Analysis Improve Collective Awareness of Climate Change?Can Social Media Analysis Improve Collective Awareness of Climate Change?
Can Social Media Analysis Improve Collective Awareness of Climate Change?
 
Social Media: Friend or Foe (May 2014)
Social Media: Friend or Foe (May 2014)Social Media: Friend or Foe (May 2014)
Social Media: Friend or Foe (May 2014)
 
Cls8 decarbonet
Cls8 decarbonetCls8 decarbonet
Cls8 decarbonet
 
Slide Presentation Week 9
Slide Presentation Week 9Slide Presentation Week 9
Slide Presentation Week 9
 
Social media basics for college
Social media basics for collegeSocial media basics for college
Social media basics for college
 
Social Media in Short- January 2014
Social Media in Short- January 2014Social Media in Short- January 2014
Social Media in Short- January 2014
 
Twitter Presentation
Twitter PresentationTwitter Presentation
Twitter Presentation
 
Doing customer development (and stop wasting your time) - StartupBus edition
Doing customer development (and stop wasting your time) -  StartupBus editionDoing customer development (and stop wasting your time) -  StartupBus edition
Doing customer development (and stop wasting your time) - StartupBus edition
 
Website 102 google slides
Website 102   google slidesWebsite 102   google slides
Website 102 google slides
 
Aardman Social Media Training Slides
Aardman Social Media Training SlidesAardman Social Media Training Slides
Aardman Social Media Training Slides
 
Social Media for Professional Learning
Social Media for Professional LearningSocial Media for Professional Learning
Social Media for Professional Learning
 
Social Degital Media Basics
Social Degital Media BasicsSocial Degital Media Basics
Social Degital Media Basics
 
Effectively Using Technology for Leading and Learning
Effectively Using Technology for Leading and LearningEffectively Using Technology for Leading and Learning
Effectively Using Technology for Leading and Learning
 
Twitter Smarter in 2016
Twitter Smarter in 2016 Twitter Smarter in 2016
Twitter Smarter in 2016
 
Best Practices for Creating Digital Content to Share in Social Media
Best Practices for Creating Digital Content to Share in Social MediaBest Practices for Creating Digital Content to Share in Social Media
Best Practices for Creating Digital Content to Share in Social Media
 
#sbIMPACT: Getting Started with Social Media
#sbIMPACT: Getting Started with Social Media#sbIMPACT: Getting Started with Social Media
#sbIMPACT: Getting Started with Social Media
 
Using Social Media as a Project Management Professional
Using Social Media as a Project Management ProfessionalUsing Social Media as a Project Management Professional
Using Social Media as a Project Management Professional
 
Challenges of social media analysis in the real world
Challenges of social media analysis in the real worldChallenges of social media analysis in the real world
Challenges of social media analysis in the real world
 
Social media for the voice actor
Social media for the voice actorSocial media for the voice actor
Social media for the voice actor
 

Más de Diana Maynard

Filth and lies: analysing social media
Filth and lies: analysing social mediaFilth and lies: analysing social media
Filth and lies: analysing social mediaDiana Maynard
 
Adding value to NLP: a little semantics goes a long way
Adding value to NLP: a little semantics goes a long wayAdding value to NLP: a little semantics goes a long way
Adding value to NLP: a little semantics goes a long wayDiana Maynard
 
Methodological possibilities for strengthening the monitoring of SDG indicato...
Methodological possibilities for strengthening the monitoring of SDG indicato...Methodological possibilities for strengthening the monitoring of SDG indicato...
Methodological possibilities for strengthening the monitoring of SDG indicato...Diana Maynard
 
Getting the-most-out-of-conferences
Getting the-most-out-of-conferencesGetting the-most-out-of-conferences
Getting the-most-out-of-conferencesDiana Maynard
 
Understanding the world with NLP: interactions between society, behaviour and...
Understanding the world with NLP: interactions between society, behaviour and...Understanding the world with NLP: interactions between society, behaviour and...
Understanding the world with NLP: interactions between society, behaviour and...Diana Maynard
 
The language of social media
The language of social mediaThe language of social media
The language of social mediaDiana Maynard
 
Text analysis-semantic-search
Text analysis-semantic-searchText analysis-semantic-search
Text analysis-semantic-searchDiana Maynard
 
Using language to save the world: interactions between society, behaviour and...
Using language to save the world: interactions between society, behaviour and...Using language to save the world: interactions between society, behaviour and...
Using language to save the world: interactions between society, behaviour and...Diana Maynard
 
Adapting NLP tools to diverse data: challenges and solutions
Adapting NLP tools to diverse data: challenges and solutionsAdapting NLP tools to diverse data: challenges and solutions
Adapting NLP tools to diverse data: challenges and solutionsDiana Maynard
 
Ontologies as bridges between data sources and user queries: the KNOWMAK proj...
Ontologies as bridges between data sources and user queries: the KNOWMAK proj...Ontologies as bridges between data sources and user queries: the KNOWMAK proj...
Ontologies as bridges between data sources and user queries: the KNOWMAK proj...Diana Maynard
 
20 Years of Text Mining Applications with GATE: from Donald Trump to curing c...
20 Years of Text Mining Applications with GATE: from Donald Trump to curing c...20 Years of Text Mining Applications with GATE: from Donald Trump to curing c...
20 Years of Text Mining Applications with GATE: from Donald Trump to curing c...Diana Maynard
 
Text Analysis and Semantic Search with GATE
Text Analysis and Semantic Search with GATEText Analysis and Semantic Search with GATE
Text Analysis and Semantic Search with GATEDiana Maynard
 
GATE: a text analysis tool for social media
GATE: a text analysis tool for social mediaGATE: a text analysis tool for social media
GATE: a text analysis tool for social mediaDiana Maynard
 
Tools for (Almost) Real-Time Social Media Analysis
Tools for (Almost) Real-Time Social Media AnalysisTools for (Almost) Real-Time Social Media Analysis
Tools for (Almost) Real-Time Social Media AnalysisDiana Maynard
 
Social media analytics as a service: tools from GATE
Social media analytics as a service: tools from GATESocial media analytics as a service: tools from GATE
Social media analytics as a service: tools from GATEDiana Maynard
 
Text analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATEText analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATEDiana Maynard
 
Disability and Adventure Travel: the Double-Edged Sword
Disability and Adventure Travel: the Double-Edged SwordDisability and Adventure Travel: the Double-Edged Sword
Disability and Adventure Travel: the Double-Edged SwordDiana Maynard
 
Who cares about sarcastic tweets? Investigating the impact of sarcasm on sent...
Who cares about sarcastic tweets? Investigating the impact of sarcasm on sent...Who cares about sarcastic tweets? Investigating the impact of sarcasm on sent...
Who cares about sarcastic tweets? Investigating the impact of sarcasm on sent...Diana Maynard
 
Practical Opinion Mining for Social Media
Practical Opinion Mining for Social MediaPractical Opinion Mining for Social Media
Practical Opinion Mining for Social MediaDiana Maynard
 
A tailor-made one-size-fits-all approach to sentiment analysis
A tailor-made one-size-fits-all approach to sentiment analysisA tailor-made one-size-fits-all approach to sentiment analysis
A tailor-made one-size-fits-all approach to sentiment analysisDiana Maynard
 

Más de Diana Maynard (20)

Filth and lies: analysing social media
Filth and lies: analysing social mediaFilth and lies: analysing social media
Filth and lies: analysing social media
 
Adding value to NLP: a little semantics goes a long way
Adding value to NLP: a little semantics goes a long wayAdding value to NLP: a little semantics goes a long way
Adding value to NLP: a little semantics goes a long way
 
Methodological possibilities for strengthening the monitoring of SDG indicato...
Methodological possibilities for strengthening the monitoring of SDG indicato...Methodological possibilities for strengthening the monitoring of SDG indicato...
Methodological possibilities for strengthening the monitoring of SDG indicato...
 
Getting the-most-out-of-conferences
Getting the-most-out-of-conferencesGetting the-most-out-of-conferences
Getting the-most-out-of-conferences
 
Understanding the world with NLP: interactions between society, behaviour and...
Understanding the world with NLP: interactions between society, behaviour and...Understanding the world with NLP: interactions between society, behaviour and...
Understanding the world with NLP: interactions between society, behaviour and...
 
The language of social media
The language of social mediaThe language of social media
The language of social media
 
Text analysis-semantic-search
Text analysis-semantic-searchText analysis-semantic-search
Text analysis-semantic-search
 
Using language to save the world: interactions between society, behaviour and...
Using language to save the world: interactions between society, behaviour and...Using language to save the world: interactions between society, behaviour and...
Using language to save the world: interactions between society, behaviour and...
 
Adapting NLP tools to diverse data: challenges and solutions
Adapting NLP tools to diverse data: challenges and solutionsAdapting NLP tools to diverse data: challenges and solutions
Adapting NLP tools to diverse data: challenges and solutions
 
Ontologies as bridges between data sources and user queries: the KNOWMAK proj...
Ontologies as bridges between data sources and user queries: the KNOWMAK proj...Ontologies as bridges between data sources and user queries: the KNOWMAK proj...
Ontologies as bridges between data sources and user queries: the KNOWMAK proj...
 
20 Years of Text Mining Applications with GATE: from Donald Trump to curing c...
20 Years of Text Mining Applications with GATE: from Donald Trump to curing c...20 Years of Text Mining Applications with GATE: from Donald Trump to curing c...
20 Years of Text Mining Applications with GATE: from Donald Trump to curing c...
 
Text Analysis and Semantic Search with GATE
Text Analysis and Semantic Search with GATEText Analysis and Semantic Search with GATE
Text Analysis and Semantic Search with GATE
 
GATE: a text analysis tool for social media
GATE: a text analysis tool for social mediaGATE: a text analysis tool for social media
GATE: a text analysis tool for social media
 
Tools for (Almost) Real-Time Social Media Analysis
Tools for (Almost) Real-Time Social Media AnalysisTools for (Almost) Real-Time Social Media Analysis
Tools for (Almost) Real-Time Social Media Analysis
 
Social media analytics as a service: tools from GATE
Social media analytics as a service: tools from GATESocial media analytics as a service: tools from GATE
Social media analytics as a service: tools from GATE
 
Text analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATEText analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATE
 
Disability and Adventure Travel: the Double-Edged Sword
Disability and Adventure Travel: the Double-Edged SwordDisability and Adventure Travel: the Double-Edged Sword
Disability and Adventure Travel: the Double-Edged Sword
 
Who cares about sarcastic tweets? Investigating the impact of sarcasm on sent...
Who cares about sarcastic tweets? Investigating the impact of sarcasm on sent...Who cares about sarcastic tweets? Investigating the impact of sarcasm on sent...
Who cares about sarcastic tweets? Investigating the impact of sarcasm on sent...
 
Practical Opinion Mining for Social Media
Practical Opinion Mining for Social MediaPractical Opinion Mining for Social Media
Practical Opinion Mining for Social Media
 
A tailor-made one-size-fits-all approach to sentiment analysis
A tailor-made one-size-fits-all approach to sentiment analysisA tailor-made one-size-fits-all approach to sentiment analysis
A tailor-made one-size-fits-all approach to sentiment analysis
 

Último

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 

Último (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

Understanding Sentiment in Social Media: Challenges and Opportunities

  • 1. Dr. Diana Maynard University of Sheffield, UK Do we really know what people mean when they tweet?
  • 2. We are all connected to each other... ● Information, thoughts and opinions are shared prolifically on the social web these days ● 72% of online adults use social networking sites
  • 3. Your grandmother is three times as likely to use a social networking site now as in 2009
  • 4. ● In Britain and the US, approx 1 hour a day on social media ● 30 % of Americans get all their news from Facebook ● Facebook has more users than the whole of the Internet in 2005 ● 40% of Twitter users don't actually tweet anything
  • 5. What are people reading about? ● Of the top 10 Twitter accounts with the highest number of followers: ● 7 pop stars ● 2 social media sites ● and Barack Obama ● As you can imagine, there's a lot of mindless drivel on social media sites
  • 6. There can be surprising value in mindless drivel
  • 7.
  • 8. Opinion mining is about finding out what people think...
  • 9. Voice of the Customer ● Someone who wants to buy a camera ● Looks for comments and reviews ● Someone who just bought a camera ● Comments on it ● Writes about their experience ● Camera Manufacturer ● Gets feedback from customer ● Improves their products ● Adjusts marketing strategies
  • 10. More than just analysing product reviews ● Understanding customer reviews and so on is a huge business ● But also: ● Tracking political opinions: what events make people change their minds? ● How does public mood influence the stock market, consumer choices etc? ● How are opinions distributed in relation to demographics? ● NLP tools are crucial in order to make sense of all the information
  • 11. “Climate change? Oh great!” says Justin Bieber
  • 12. What else do we want to investigate? ● Decarbonet project: investigating public perception of climate change ● How do opinions change over time and what causes these changes? ● How can we influence opinion? ● Sarcasm detection ● What social media content should be preserved/forgotten? Interestingness of opinions (ARCOMEM and ForgetIt projects)
  • 13. But there are lots of tools that “analyse” social media already.... ● Streamcrab http://www.streamcrab.com/ ● Semantria http://semantria.com ● Social Mention http://socialmention.com/ ● Sentiment140: http://www.sentiment140.com/ ● TipTop: http://feeltiptop.com/
  • 14. Why are these sites unsuccessful? ● They don't work well at more than a very basic level ● They mainly use dictionary lookup for positive and negative words ● Or they use ML, which only works for text that's similar in style to the training data ● They classify the tweets as positive or negative, but not with respect to the keyword you're searching for ● keyword search just retrieves any tweet mentioning it, but not necessarily about it as a topic ● no correlation between the keyword and the sentiment
  • 15. “Positive” tweets about fracking ● Help me stop fracking. Sign the petition to David Cameron for a #frack-free UK now! ● I'll take it as a sign that the gods applaud my new anti- fracking country love song. ● #Cameron wants to change the law to allow #fracking under homes without permission. Tell him NO!!!!!
  • 16. It's not just about looking at sentiment words “It's a great movie if you have the taste and sensibilities of a 5-year- old boy.” “I'm not happy that John did so well in the debate last night.” “I'd have liked the film if it had been shorter.” “You're an idiot.” ● Conditional statements, content clauses, situational context can all play a role
  • 17. Whitney Houston wasn't very popular...
  • 18. Death confuses opinion mining tools  Opinion mining tools are good for a general overview, but not for some situations
  • 20. Spam opinion detection (fake reviews) ● Sometimes people get paid to post “spam” opinions supporting a product, organisation or even government ● An article in the New York Times discussed one such company who gave big discounts to post a 5-star review about the product on Amazon ● http://www.nytimes.com/2012/01/27/technology/for-2-a-star- a-retailer-gets-5-star-reviews.html?_r=3&ref=business ● Could be either positive or negative opinions ● Generally, negative opinions are more damaging than positive ones
  • 21. How to detect fake opinions? ● Review content: lexical features, content and style inconsistencies from the same user, or similarities between different users ● Complex relationships between reviews, reviewers and products ● Publicly available information about posters (time posted, posting frequency etc)
  • 22. Irony and sarcasm • I had never seen snow in Holland before but thanks to twitter and facebook I now know what it looks like. Thanks guys, awesome! • Life's too short, so be sure to read as many articles about celebrity breakups as possible. • I feel like there aren't enough singing competitions on TV . #sarcasmexplosion • I wish I was cool enough to stalk my ex-boyfriend ! #sarcasm #bitchtweet • On a bright note if downing gets injured we have Henderson to come in
  • 23. How do you know when someone is being sarcastic? ● Use of hashtags in tweets such as #sarcasm, emoticons etc. ● Large collections of tweets based on hashtags can be used to make a training set for machine learning ● But you still have to know which bit of the tweet is the sarcastic bit Man , I hate when I get those chain letters & I don't resend them , then I die the next day .. #Sarcasm I am not happy that I woke up at 5:15 this morning. #greatstart #sarcasm You are really mature. #lying #sarcasm
  • 24. Case study: Rule-based Opinion Mining on Tweets
  • 25. Why Rule-based? ● Although ML applications are typically used for Opinion Mining, there are advantages to using a rule- based approach when training data isn't easily available ● For example, working with multiple languages and/or domains ● Rule-based system is more easily adaptable ● Novel use of language and grammar makes ML hard
  • 26. GATE Components ● TwitIE ● structural and linguistic pre-processing, specific to Twitter ● includes language detection, hashtag retokenisation, POS tagging, NER ● (Optional) term recognition using TermRaider ● Sentiment gazetteer lookup ● JAPE opinion detection grammars ● (Optional) aggregation of opinions ● includes opinion interestingness component
  • 27. Basic approach for opinion finding ● Find sentiment-containing words in a linguistic relation with terms/entities (opinion-target matching) ● Dictionaries give a starting score for sentiment words ● Use a number of linguistic sub-components to deal with issues such as negatives, adverbial modification, swear words, conditionals, sarcasm etc. ● Starting from basic sentiment lookup, we then adjust the scores and polarity of the opinions via these components
  • 28. A positive sentiment list ● awesome category=adjective score=0.5 ● beaming category=adjective score=0.5 ● becharm category=verb score=0.5 ● belonging category=noun score=0.5 ● benefic category=adjective score=0.5 ● benevolentlycategory=adverb score=0.5 ● caring category=noun score=0.5 ● charitable category=adjective score=0.5 ● charm category=verb score=0.5
  • 29. A negative sentiment list Examples of phrases following the word “go”: ● down the pan ● down the drain ● to the dogs ● downhill ● pear-shaped
  • 30. Opinion scoring ● Sentiment gazetteers (developed from sentiment words in WordNet) have a starting “strength” score ● These get modified by context words, e.g. adverbs, swear words, negatives and so on The film was awesome --> The film was really amazing. The film was awful --> The film was absolutely awful.. The film was good --> The film was not so good. ● Swear words modifying adjectives count as intensifiers The film was good --> The film was damned good. ● Swear words on their own are classified as negative Damned politicians.
  • 31. Example: Opinions on Greek Crisis
  • 32. Using Machine Learning for the task ● If we can collect enough manually annotated training data, we can also use an ML approach for this task ● Product reviews: use star-based rating (but these have flaws) ● Other domains, e.g. politics: classify sentences or tweets (the ML instances), many of which do not contain opinions. ● So the ML classes will be positive, neutral and negative ● (Some people classify neutral and no opinion as distinct classes, but we find the distinction too difficult to make reliably)
  • 33. Training on tweets ● You can use hashtags as a source of classes ● Example: collect a set of tweets with the #angry tag, and a set without it, and delete from the second set any tweets that look angry ● Remove the #angry tag from the text in the first set (so you're not just training the ML to spot the tag) ● You now have a corpus of manually annotated angry/non-angry data ● This approach can work well, but if you have huge datasets, you may not be able to do the manual deletions ● You can also train on things like #sarcasm and #irony
  • 34. Summary ● Opinion mining is hard and therefore error-prone (despite what vendors will tell you about how great their product is) ● There are many types of sentiment analysis, and many different uses, each requiring a different solution ● It's very unlikely that an off-the-shelf tool will do exactly what you want, and even if it does, performance may be low ● Opinion mining tools need to be customised to the task and domain ● Anything that involves processing social media (especially messy stuff like Facebook posts and Twitter) is even harder, and likely to have lower performance ● For tasks that mainly look at aggregated data, this isn't such an issue, but for getting specific sentiment on individual posts/reviews etc, this is more problematic
  • 35. So where does this leave us? ● Opinion mining is ubiquitous, but it's still far from perfect, especially on social media ● There are lots of linguistic and social quirks that fool sentiment analysis tools. ● The good news is that this means there are lots of interesting problems for us to research ● And it doesn’t mean we shouldn’t use existing opinion mining tools ● The benefits of a modular approach mean that we can pick the bits that are most useful ● Take-away message: use the right tool for the right job
  • 36. More information ● GATE http://gate.ac.uk (general info, download, tutorials, demos, references etc) ● The EU-funded Decarbonet and TrendMiner projects are dealing with lots of issues about opinion and trend mining from social media ● http://www.decarbonet.eu ● http://www.trendminer-project.eu/ ● Tutorials ● Module 12 of the annual GATE training course: “Opinion Mining” (2013 version) http://gate.ac.uk/wiki/TrainingCourseJune2013/ ● Module 14 of the annual GATE training course: “GATE for social media mining”
  • 37. Some GATE-related opinion mining papers (available from http://gate.ac.uk/gate/doc/papers.html) ● Diana Maynard, Gerhard Gossen, Marco Fisichella, Adam Funk. Should I care about your opinion? Detection of opinion interestingness and dynamics in social media. To appear in Journal of Future Internet, Special Issue on Archiving Community Memories, 2014. ● Diana Maynard and Mark A. Greenwood. Who cares about sarcastic tweets? Investigating the impact of sarcasm on sentiment analysis. Proc. of LREC 2014, Reykjavik, Iceland, May 2014. ● Diana Maynard, David Dupplaw, Jonathon Hare. Multimodal Sentiment Analysis of Social Media. Proc. of BCS SGAI Workshop on Social Media Analysis, Dec 2013 ● D. Maynard and K. Bontcheva and D. Rout. Challenges in developing opinion mining tools for social media. In Proceedings of @NLP can u tag #usergeneratedcontent?! Workshop at LREC 2012, May 2012, Istanbul, Turkey. ● D. Maynard and A. Funk. Automatic detection of political opinions in tweets. In Raúl García-Castro, Dieter Fensel and Grigoris Antoniou (eds.) The Semantic Web: ESWC 2011 Selected Workshop Papers, Lecture Notes in Computer Science, Springer, 2011. ● H.Saggion, A.Funk: Extracting Opinions and Facts for Business Intelligence. Revue des Nouvelles Technologies de l’Information (RNTI), no. E-17 pp119-146; 2009.
  • 38. Some demos to try ● http://sentiment.christopherpotts.net/lexicon/ Get sentiment scores for single words from a variety of sentiment lexicons ● http://sentiment.christopherpotts.net/textscores/ Show how a variety of lexicons score novel texts ● http://sentiment.christopherpotts.net/classify/ Classify tweets according to various probabilistic classifier models ● http://demos.gate.ac.uk/arcomem/opinions/ Find and classify opinionated text, using GATE-based ARCOMEM system