The document discusses research being conducted on topic modeling, computational social science, and analyzing sentiment and emotions. It provides an overview of recent works on topic modeling and modeling topic hierarchies. It also discusses research on analyzing aspects and sentiments of online reviews and analyzing social aspects of emotions and self-disclosure behaviors in Twitter conversations.
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Topic and text analysis for sentiment, emotion, and computational social science
1. Topic and Text Analysis for Sentiment, Emotion,
and Computational Social Science
November 2012
Alice Oh
alice.oh@kaist.edu
Users & Information Lab
http://uilab.kaist.ac.kr
1
Thursday, December 6, 2012
2. Overview
• Topic modeling research
• CIKM 2011: Distance-dependent Chinese restaurant franchise (ddCRF)
• ICML 2012: Dirichlet process with random mixed measures (DP-MRM)
• CIKM 2012: Recursive chinese restaurant process for modeling topic
hierarchies (rCRP)
• NIPS Big Learning Workshop 2012: Distributed Online Learning for
Latent Dirichlet Allocation (DoLDA)
• Computational social science research
• WSDM 2011: Aspect sentiment unification model for online review analysis
• ICWSM 2012: Social aspects of emotions in Twitter conversations
• ACL 2012: Self-disclosure and relationship strength in Twitter
conversations
2
Thursday, December 6, 2012
3. Do you feel what I feel?
Social Aspects of Emotions in Twitter Conversations
Suin Kim, JinYeong Bak, Alice Oh
ICWSM 2012
3
Thursday, December 6, 2012
6. Asking Research Questions
Human emotion is typically studied as a within-person, one-direction,
non-repetitive phenomenon; focus has traditionally been on how one
individual feels in reaction to various stimuli at a certain point of
time. But people recognize and inevitably react emotionally and
otherwise to expressions of emotion of other people. We propose
that organizational dyads and groups inhabit emotion cycles:
Emotions of an individual influence the emotions, thoughts and
behaviors of others; others’ reactions can then influence their
future interactions with the individual expressing the original
emotion, as well as that individual’s future emotions and
behaviors. People can mimic the emotions of others, thereby
extending the social presence of a specific emotion, but can also
respond to others’ emotions, extending the range of emotions
present.
5
Thursday, December 6, 2012
7. Social Aspects of Emotions: Motivating Question
How are our emotions affected by others we talk to?
Thursday, December 6, 2012
8. Social Aspects of Emotions: Research Questions
• How do we communicate our emotions?
• Use a topic model on Twitter conversations to discover the “topics” that
represent the eight emotions
• Analyze the proportions of the total tweets for the emotions
• How do we influence other people’s emotions?
• Analyze the and emotion transitions of the tweets
• Look for topics that change the emotions of the conversation partners
• Find interesting patterns of emotion pairs
Thursday, December 6, 2012
9. Social Aspects of Emotions: Data
• Twitter conversation data: approx 220k dyads who “reply” to each other,
1,670k conversational chains
!
"!
#!
$!
%!
Thursday, December 6, 2012
10. Seed Words (We Feel Fine by Harris & Kamvar)
anticipation
hope
wait
await
inspir
excit
bore
readi
expect
nervou
calm
motiv
prepar
certain
anxiou
optimist
forese
joy
awesom
amaz
wonder
excit
glad
fine
beauti
high
lucki
super
perfect
complet
special
bless
safe
proud
anger
shit
bitch
ass
mean
damn
mad
jealou
piss
annoi
angri
upset
moron
rage
screw
stuck
irrit
surprise
amaz
wow
wonder
weird
lucki
differ
awkward
confus
holi
strang
shock
odd
embarrass
overwhelm
astound
astonish
fear
scare
stress
horror
nervou
terror
alarm
behind
panic
fear
afraid
desper
threaten
tens
terrifi
fright
anxiou
sadness
sorri
bad
aw
sad
wrong
hurt
blue
dead
lost
crush
weak
depress
wors
low
terribl
lone
disgust
sick
wrong
evil
fat
ugli
horribl
gross
terribl
selfish
miser
pathet
disgust
worthless
aw
asham
fuck
acceptance
okai
ok
same
alright
safe
lazi
relax
peac
content
normal
secur
complet
numb
fulfil
comfort
defeat
Thursday, December 6, 2012
11. Dirichlet Forest Prior
• Dirichlet Forest Prior (Andrzejewski et al.)
• Mixture of Dirichlet tree distribution
• Dirichlet tree: Generalization of Dirichlet distribution
• Knowledge is expressed using Must-link and Cannot-link
primitives
• Must-link (love, sweetheart)
• Cannot-link (exciting, bored)
10
DF-LDA
Thursday, December 6, 2012
12. Dirichlet Forest Prior
• Dirichlet Forest Prior (Andrzejewski et al.)
• Mixture of Dirichlet tree distribution
• Dirichlet tree: Generalization of Dirichlet distribution
• Knowledge is expressed using Must-link and Cannot-link
primitives
• Must-link (love, sweetheart)
• Cannot-link (exciting, bored)
10
q
β
η
DF-LDA
Thursday, December 6, 2012
13. Domain Knowledge in Dirichlet Forest Prior
11
Seed Words
anticipation
hope
wait
await
inspir
excit
bore
readi
expect
nervou
calm
motiv
prepar
certain
anxiou
optimist
forese
joy
awesom
amaz
wonder
excit
glad
fine
beauti
high
lucki
super
perfect
complet
special
bless
safe
proud
anger
shit
bitch
ass
mean
damn
mad
jealou
piss
annoi
angri
upset
moron
rage
screw
stuck
irrit
surprise
amaz
wow
wonder
weird
lucki
differ
awkward
confus
holi
strang
shock
odd
embarrass
overwhelm
astound
astonish
fear
scare
stress
horror
nervou
terror
alarm
behind
panic
fear
afraid
desper
threaten
tens
terrifi
fright
anxiou
sadness
sorri
bad
aw
sad
wrong
hurt
blue
dead
lost
crush
weak
depress
wors
low
terribl
lone
disgust
sick
wrong
evil
fat
ugli
horribl
gross
terribl
selfish
miser
pathet
disgust
worthless
aw
asham
fuck
acceptance
okai
ok
same
alright
safe
lazi
relax
peac
content
normal
secur
complet
numb
fulfil
comfort
defeat
Must-link within a class Cannot-link between classes
Thursday, December 6, 2012
14. Dirichlet Forest vs. Dirichlet
12
Fear
DF-LDA don’t think but know why even wanna care worry understand
Fear
LDA good exam lol luck just school haha i’m xx worry tomorrow
Surprise
DF-LDA that very really cool wow wonder just some differ amazing
Surprise
LDA just rt holy got thank did shit new love lol awesome buy oh
Sadness
DF-LDA bad my real feel life aw sad kill lost dead hurt wrong sick
Sadness
LDA lol just know sorry isn’t oh tweet did haha don’t thought think
Thursday, December 6, 2012
15. Emotion Topics How do we express emotions?
JoyAnticipation Anger
Topic 114
omg
love
haha
thank
really
Topic 107
love
thank
follow
wow
Topic 159
good
day
hope
morning
thank
Topic 158
love
thank
miss
hug
Topic 125
hope
better
feel
thank
soon
Topic 26
good
thank
hope
miss
Topic 146
come
wait
week
day
june
Topic 146
good
day
time
work
Topic 131
lmao
fuck
ass
bitch
shit
Topic 4
ass
yo
lmao
nigga
Topic 19
lmao
shit
damn
fuck
oh
Topic 13
shit
nigga
smh
yea
Fear
Topic 48
omg
oh
lmao
shit
scare
Topic 78
happen
heart
attack
hospital
Topic 27
don’t
come
night
sleep
outside
Topic 140
time
got
work
day
Surprise
Topic 172
yeag
know
think
true
funny
Topic 89
know
don’t
think
look
Topic 15
think
don’t
know
make
really
Topic 94
haha
dont
think
really
29 70 21 14 5
Sadness Disgust
Topic 6
oh
sorry
haha
know
didnt
Topic 59
hurt
got
good
bad
pain
Topic 106
tweet
reply
didn’t
read
sorry
Topic 155
oh
really
make
feel
Topic 116
oh
fuck
don’t
ye
ew
Topic 116
look
haha
oh
know
Topic 22
don’t
oh
think
yeah
lmao
Topic 174
don’t
think
say
people
Acceptance
Topic 43
ok
oh
thank
cool
okay
Topic 102
know
try
let
ok
Topic 199
xx
thank
good
okay
follow
Topic 8
night
love
good
sleep
17 7 18 Neutral
Topic 180
com
www
http
check
youtube
Topic 156
twitter
facebook
people
account
Topic 184
account
google
app
work
email
Topic 67
food
chicken
cook
rt
19
13
Thursday, December 6, 2012
16. Emotion Topics How do we express emotions?
JoyAnticipation
Topic 114
omg
love
haha
thank
really
Topic 107
love
thank
follow
wow
Topic 125
hope
better
feel
thank
soon
Topic 26
good
thank
hope
miss
Sadness
Topic 6
oh
sorry
know
didnt
Topic 59
hurt
got
good
bad
pain
Neutral
Topic 180
com
www
http
check
youtube
Topic 156
twitter
facebook
people
account
GreetingCaring Sympathy IT/Tech
14
Thursday, December 6, 2012
18. Defining “Influence”
User A
User B
Having a tough day
today. RIP Harrison. I’ll
miss you a ton :/
Just pray about it.
God will help you.
Not really religious,
but thanks man. :)
If you need talk
you know I’m here.
Time
(Sadness)
(Acceptance)
(Anticipation)
16
Thursday, December 6, 2012
19. Defining “Influence”
emotion influencing tweet
User A
User B
Having a tough day
today. RIP Harrison. I’ll
miss you a ton :/
Just pray about it.
God will help you.
Not really religious,
but thanks man. :)
If you need talk
you know I’m here.
Time
(Sadness)
(Acceptance)
(Anticipation)
16
Thursday, December 6, 2012
20. Topic 117
tweet
people
don’t
read
post
Topic 59
hurt
got
bad
pain
feel
Emotion Influences What can you say to make your
partner feel better?
Joy → SadnessSadness → Joy
Topic 18
wear
look
think
love
black
Topic 24
love
thank
great
new
look
Acceptance → Anger
Topic 31
i’m
got
lmax
shit
da
Topic 13
lmao
shit
nigga
smh
yea
Greeting
Sympathizing
Swearing Complaining
17
Thursday, December 6, 2012
21. 0
0.075
0.15
0.225
0.3
Anticipation Joy Surprise Fear Anger Sadness Disgust Acceptance Neutral
0.041
0.0710.082
0.053
0.265
0.061
0.081
0.0420.051
Emotion Influence: Sadness to Joy
Emotion Influence: Joy to Anger
0
0.1
0.2
0.3
0.4
Anticipation Joy Surprise Fear Anger Sadness Disgust Acceptance Neutral
0.211
0.230.2140.209
0.191
0.2370.253
0.358
0.273
Expressing Anger has 26.5% of chance
of changing the partner’s emotion from
Joy to Anger.
18
Expressing Joy has 35.8% of chance of changing
the partner’s emotion from Sadness to Joy.
Thursday, December 6, 2012
22. Outliers
19
A: Sorry to hear about your bags.
If you would like us to get
someone to contact you DM us
your reference and contact
number.
B: it's on it's way to manch. If the
woman on the check in desk in
Miami hadn't been trying
to be all smart! Been no problem.
A: Sorry about that. Pleased to
hear they located it quickly for you
though.
B: mistakes happen.
Thursday, December 6, 2012
23. Analyzing Self-Disclosure Behaviors in
Twitter Conversations Using Text Mining
Techniques (Presented at ACL 2012)
JinYeong Bak, Suin Kim, Alice Oh
{jy.bak, suin.kim}@kaist.ac.kr, alice.oh@kaist.edu
Department of Computer Science, KAIST
Thursday, December 6, 2012
24. 2012-07-11
In social psychology
} Degree of self-disclosure in a relationship depends on
the strength of the relationship
} Strategic self-disclosure can strengthen the relationship
Introduction
21
I like you
too!
You’re my
best
friend!
Thursday, December 6, 2012
25. 2012-07-11
Hypothesis
22
Twitter conversations also show a similar pattern
} Dyads with high relationship strength show more self-disclosure
behavior
} Dyads with low relationship strength show less self-disclosure
behavior
I like you
too!
You’re my
best
friend!
Hello~
Hi
Thursday, December 6, 2012
26. 2012-07-11
Methodology
} Twitter Data
} 131K users
} 2M conversations
} Relationship Strength
} Chain frequency (CF)
} Chain length (CL)
} Self-Disclosure
} Personal information
} Open communication
} Profanity
} Analysis with Topic Models
} Latent Dirichlet allocation (LDA, [Blei, JMLR 2003])
} Aspect and sentiment unification model (ASUM, [Jo,WSDM 2011])
23
Thursday, December 6, 2012
27. 2012-07-11
Twitter Conversation
} A Twitter conversation chain
} 3 or more tweets
} at least one reply by each user
} Our Twitter conversation data
} Oct 2011 to Dec 2011
} 131K users
} 2M chains
} 11M tweets
24
https://twitter.com/#!/britneyspears
Example of a conversation chain
Thursday, December 6, 2012
28. 2012-07-11
Relationship Strength
} Social psychology literature states relationship strength can be
measured by communication frequency and length [Granovetter, 1973;
Levin and Cross, 2004]
} CF: chain frequency
} The number of conversational chains between the dyad
averaged per month
} CL: chain length
} The length of conversational chains between the dyad
averaged per month
} Relationship strength
} A high CF or CL for a dyad means the relationship is strong
} A low CF or CL for a dyad means the relationship is weak
25
Thursday, December 6, 2012
29. 2012-07-11
Self-Disclosure
} Open communication - Openness
} Negative openness
} Nonverbal openness
} Emotional openness
} Receptive openness – difficult to find in tweets
} General-style openness – not clearly defined in the literature
} Personal Information
} Personally Identifiable Information (PII)
} Personally Embarrassing Information (PEI)
} Profanity
} nigga, ass, wtf, lmao
26
Thursday, December 6, 2012
30. 2012-07-11
Negative openness
} Method
} We use ASUM with emoticons as seed words
[ “Aspect and sentiment unification model for online review analysis”, Jo,WSDM’11]
} ASUM is LDA-based joint model of topic and sentiment
} ASUM takes unannotated data and classifies each sentence (tweet) as
positive/negative/neutral
Self-Disclosure - Openness
27
Thursday, December 6, 2012
31. 2012-07-11
Self-Disclosure - Openness
Nonverbal openness
} Method
} We look for emoticons,‘lol’,‘xxx’
} Emoticons are like facial expressions -- :) :( :P
} ‘lol’ (laughing out loud) and ‘xxx’ (kisses) are very frequently used in a
similar manner to nonverbal openness
28
Thursday, December 6, 2012
32. 2012-07-11
Self-Disclosure - Openness
Emotional openness
} Method
} Look for tweets that contain common expressions of feeling words
[We feel fine (Harris, J, 2009)]
29
Thursday, December 6, 2012
33. 2012-07-11
Self-Disclosure – Personal Information
Personally Identifiable Information (PII)
Personally Embarrassing Information (PEI)
30
Ex) name, location,
email address, job,
social security number
Ex) clinical history,
sexual life,
job loss,
family problem
Thursday, December 6, 2012
35. 2012-07-11
Self-Disclosure – Personal Information
Example of PII, PEI and Profanity topics
} Shown by high probability words in each topic
PII 1 PII 2 PEI 1 PEI 2 PEI 3 Profanity
san tonight pants teeth family nigga
live time wear doctor brother lmao
state tomorrow boobs dr sister shit
texas good naked dentist uncle ass
south ill wearing tooth cousin bitch
32
Thursday, December 6, 2012
39. 2012-07-11
Results: Interpretation
} Emotional openness
} When they are not very close, they express frequent encouragements,
or polite reactions to baby or pets
36
Thursday, December 6, 2012
42. Distributed Online Learning for
Latent Dirichlet Allocation
JinYeong Bak, Dongwoo Kim, and Alice Oh
NIPS 2012
Workshop on Big Learning
39
Thursday, December 6, 2012
43. Motivation
• Problem 1: Inference for LDA takes a long time
• Problem 2: Continuously expanding corpus necessitates continuous updates
of model parameters
• But updating of model parameters is not possible with plain LDA
• Must re-train with the entire updated corpus
• Solution to 1: Distributed inference shortens inference time (Newman
JMLR 2009, Wang WWW 2012)
• Solution to 2: Online (batch) learning enables updates to model
parameters (Hoffman NIPS 2010)
• Our Approach: Combine distributed inference and online learning
40
Thursday, December 6, 2012
44. Distributed Online LDA
• Based on variational inference
• Mini-batch updates via stochastic learning (variational EM)
• Distribute variational EM using MapReduce
41
Thursday, December 6, 2012
45. Experimental Setup
• Data: 5.1M Twitter conversations
• 4.8M English Wikipedia articles
• 60 node Hadoop system
• Each node with 8 x 2.30GHz cores
42
Thursday, December 6, 2012
46. Wikipedia Results
43
Topic 0 Topic 22 Topic 42 Topic 65 Topic 94 Topic 170 Topic 232
relativity
physics
einstein
quantum
gravity
channel
television
tv
cable
news
milk
chocolate
sugar
food
cream
god
bible
moses
chapter
genesis
party
election
president
member
elected
season
team
league
game
football
album
song
band
music
released
Minibatch oLDA DoLDA Speedup
16,384 238666.25 47994.03 4.97
32,768 188508.71 33470.03 5.63
65,536 206290.27 26788.53 7.70
Thursday, December 6, 2012
47. Twitter Temporal Patterns of Topics
44
Conversation b1 on November 2, 2010
A I wish I could vote today, but I have to work for 14 hours
B is it legal for them not to give you time off to vote?
A probably
Conversation b2 on March 31, 2012
A Mitt Romney: "Obama should release the notes and transcripts of
all his meetings with world leaders"
B Why is he being held to higher standard than any other president.
A did you see my Santorum 'slip' tweet? Is the media afraid to
comment on it?
B oh yes I did. I saw it mentioned yesterday also. disgusting and he
should be raked over hot coals for it.
0.005
0.010
0.015
10−10 11−01 11−04 11−07 11−10 12−01 12−04
Day
Documentproportion
0.004
0.006
0.008
0.010
0.012
11−07 11−10 12−01
Day
Documentproportion
Conversation c1 on September 5, 2011
A Oh god, miss Waite ran over to me up the school just now! :L on
the plus subjects are now picked! :D
B what did you pick??
A english, RE, art and psychology! :) was unsure between history
and psych but found out bubbles was teaching it so nooo! :L
Conversation c2 on October 12, 2011
A :) My day's been okay! It feels long! But school' was okayish. I
hope you have an awesome day! :D
B that's good then! Ahh hope it's not cause anything bad happened?
Thanks! Have a great sleep :)
A no! Class was just boring lol and thanks! :) i will! Even though i
have to wake up early tomorrow for a midterm! :S
<Topic words: party vote people politics obama>
<Topic words: school mate class teacher grade>
Thursday, December 6, 2012
48. CAVEAT
45
Big Data, social media data, do not always get the right answers!
They contain much noise and much bias.
Sentiment analysis is also full of problems at the big data-level
because every small assumption can turn out to cause wide swings
in the final interpretation of the data.
They are valuable because they have opened up possibilities for
analyses of naturally-occurring data in huge amounts.
We need better methods and tools that are tailored for social media.
We need to ask the right questions that can be answered well despite
the biases of the social media data.
Thursday, December 6, 2012
49. For details, visit our webpage:
http://uilab.kaist.ac.kr
Or email me:
alice.oh@kaist.edu
Thursday, December 6, 2012