A poster presented at ICWSM 2009 (International AAAI Conference on Weblogs and Social Media).
Authors: Dan Knights, Michael C. Mozer (University of Colorado at Boulder), and Nicolas Nicolov (J.D. Power and Associates, McGraw-Hill).
The actual paper is here: http://dan.knights.googlepages.com/knights-icwsm09.pdf
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Detecting Topic Drift with Compound Topic Models
1. Detecting Topic Drift
with Compound Topic Models
Dan Knights Mike Mozer Nicolas Nicolov
J.D. Power and Associates
McGraw-Hill, U.S.A.
Boulder, CO 80303
Goals:
Track topics over time
Detect topic drift
Identify emerging topics
Visualize topic trends
Dan Knights (JDPA) Detecting Topic Drift May 19, 2009 1/9
2. Topic tracking challenge: emerging topics
Dataset 1 Dataset 2
LDA LDA
0: energy hybrid gas prius fuel 0: money stock dow economy
1: million billion economy stock ? 1: hybrid gas prius alternative
... 2: obama mccain election race
...
probability
probability
...
Topic
0 1 ... correspondence 0 1 2
topic index not guaranteed topic index
Dan Knights (JDPA) Detecting Topic Drift May 19, 2009 2/9
3. Compound topic models guarantee correspondence
CTM
Dataset 1 + Dataset 2
LDA
0: money stock dow economy 0: money stock dow economy
1: hybrid gas prius alternative 1: hybrid gas prius alternative
2: obama mccain election race 2: obama mccain election race
... ...
probability
probability
Topic
0 1 2 correspondence 0 1 2
topic index guaranteed topic index
Dan Knights (JDPA) Detecting Topic Drift May 19, 2009 3/9
4. Potential indicators of drift
3 kinds of indicator:
Kullback-Leibler divergence (KLD)
Relative Perplexity (RP)
Chi-square test (not shown)
2 kinds of model:
Topic model
Unigram model
3 x 2 = 6 potential indicators
Dan Knights (JDPA) Detecting Topic Drift May 19, 2009 4/9
5. Case study: synthetic topic drift
Gradual topic drift, days 150-179:
Days Days Days
1-149 150-179 180-300
Drift indicators: All indicators
detect drift
Drift
Dan Knights (JDPA) Detecting Topic Drift May 19, 2009 5/9
6. Case study: Toyota
All blogs mentioning “Toyota”
6 months (January – June 2008)
Drift indicators: Highest Drift?
Dan Knights (JDPA) Detecting Topic Drift May 19, 2009 6/9
7. Emerging topics, Toyota (Mar-Jun 2008)
Emerging
“energy” topic
Chapman
auto accident
topic
“Energy” topic
tracks gas
price
Dan Knights (JDPA) Detecting Topic Drift May 19, 2009 7/9
8. Case study: iPhone
Public blogs mentioning “iPhone” and “platform”
12 months (April 2007 – March 2008)
Most variable topics for Aug-Nov 2007
“Apple opens
window: iPhone
platform”
“Google
launches
Android
platform”
Dan Knights (JDPA) Detecting Topic Drift May 19, 2009 8/9
9. Summary
Compound topic models help with:
tracking topics between distinct data sets
detecting drift related to news events
avoiding topic/vocabulary matching problem
visualizing topic trends
Open questions:
How to interpret drift indicators
Are unigram models sufficient for detecting
topic drift?
fast and frugal compared to topic models
Dan Knights (JDPA) Detecting Topic Drift May 19, 2009 9/9