Topic and Opinion Classification based Information Credibility Analysis on Twitter
1. Topic and Opinion Classification
based Information Credibility
Analysis on Twitter
Yukino Ikegami
Kenta Kawai
Yoshimi Namihira
Setsuo Tsuruta
At SMC 2013
2013/10/16 1
2. Background and Motivation
• False rumors often confuse people
• Confirming reliability of rumors often requires
a domain knowledge about the problem
Automatically Information credibility analysis
2013/10/16 2
3. Related Work (1)
Using Web-page-dependent features
• [Wassmer et al., 2005]
– Use credentials of the site, advertisements and
Web design
• [Castillo et al. 2011]
– Twitter-dependent features
• E.g. number of followers
– Twitter-independent features
• E.g. number of !/?
2013/10/16 3
4. Related Work (2)
Using textual features
• Rumor information cloud system [Miyabe et al.
2011]
– Confirm a rumor whether is truth or not by alerting
information about a rumor
– Find correcting information by SVM applying word n-
grams model
– The word n-gram model consists of words in front and
back of the word “デマ” (“dema” is the abbreviation
of “demagogic” in Japanese-English).
• Dematter [Toriumi et al. 2012]
– Assesse credibility by the percentage of alerting
tweets about a rumor
– Detect alerting tweets by keyword matching
2013/10/16 4
5. Topic and Opinion Classification based
Information Credibility Analysis
2013/10/16 5
Twitter
Tweet crawler
Topic & opinion
classifier
Tweet opinion
DB
Tweet credibility
calculator
6. Topic classification
• Classify tweet by topic model
– Topic model:
Latent Dirichlet Allocation (LDA)
with Gibbs sampling [Griffiths, 2002]
– Feature:
content words (i.e.) noun, verb, adjective, adverb
2013/10/16 6
Topic1 Topic2 Topic3
Vegetable Measure Radioactive material
Eat Amount of radiation In prefecture
No problem Result Governor
Leaf of tea Pool Fukushima
7. Opinion Classification
• Classify whether a tweet is positive opinion or
negative one by a dictionary
• Takamura’s semantic orientation dictionary
[Takamura et al. 2006]
– Contains word-positivity [-1, 1] pairs
2013/10/16 7
9. Evaluation
• Dataset: 2960 tweets
– Confirmed whether it is true or not by human
• Criteria: Weighted kappa
2013/10/16 9
– Weight w is designed as follows:
judging certainly false-information as certainly
true or vice versa are critical error
10. Result
Fully Random
method
(All tweets)
Our method
(All tweets)
Our method
(Only Topic & Opinion correct)
0.003 0.604 0.616
2013/10/16 10
TABLE 1: Kappa of each conditions
• Landis’s kappa guideline: κ > 0.61 is substantial
• Our method has the substantial effectiveness
for assessing tweet credibility
11. Conclusion
• Topic and opinion classification based
information analysis on Twitter
– Topic model and sentiment analysis based
majority decision
• Evaluation shows it has substantial effect
2013/10/16 11
12. Future works
• Weighting tweets by author’s expertise
– people often determine whether information is
trustworthy or not by author’s expertise
• Applying online topic model
– New topic and usage of existing words are created
one after another
• Excluding neutral tweets
– No-sentiment tweets are useless on our method
2013/10/16 12
13. References
• [Wassmer et al. 2015] M. Wassmer and C. Eastman,
“Automatic evaluation of credibility on the Web,” ASIS&T
2005, 42(1), 2005.
• [Castillo et al. 2011] Castillo, C., Mendoza, M., and Poblete,
B. “Information credibility on twitter,” WWW 2011, pp. 675-
684, 2011.
• [Miyabe et al. 2005] M. Miyabe, A. Umejima, A. Nadamoto
and E. Aramaki, “Proposal of Rumor Information Cloud
based on Rumor-Correction Information” (In Japanese),
RRDS4-019, 2011.
• [Toriumi et al. 2006] F. Toriumi, K. Shinoda, G. Kaneyama,
“Accuracy Evaluation of Dema- gogue Detection System
using Social Media” (In Japanese), IPSJ Digital Practice, 3.3,
pp. 201-208, 2012.
2013/10/16 13