I gave this presentation at Workshop on Interactive Language Learning, Visualization, and Interfaces / ACL 2014 in Baltimore, MD on June 27, 2014.
http://nlp.stanford.edu/events/illvi2014/index.html
ABSTRACT
Everyday on Twitter, there are millions of thoughts that are captured and shared to the world in the form of 140-character messages, or Tweets. There are many things we could learn from these thoughts if we could figure out a way to digest this gigantic dataset. Visualization is one of the many ways to extract information from these Tweets. In this presentation, I will talk about several visualizations based on Tweets, as well as share experiences and challenges from working with Tweet data.
Making Sense of Millions of Thoughts: Finding Patterns in the Tweets
1. Making Sense of
Millions of Thoughts
Finding
patterns
in the
Tweets
“Knowing comes from learning, from seeking.”
“What we call chaos is just we haven't recognized.”
“I am looking for a needle haystack.”
“140-character text messages, called ”
Krist Wongsuphasawat
(50 characters)
(58 characters)
(42 characters)
(42 characters)
27. Filter data (2)
• #hashtags — e.g. #world-cup
• easy to filter
• hashtags must be presented
• typo?
28. Filter data (2)
• #hashtags — e.g. #world-cup
• easy to filter
• hashtags must be presented
• keywords — e.g. goal
• broader
• can be ambiguous
29. Filter data (3)
• Combine with other attributes
• Time
• during the first half of World Cup final
30. Filter data (3)
• Combine with other attributes
• Time
• during the first half of World Cup final
• Location
• Tweets from Brazil
• Not every Tweet is geotagged.
31. Filter data (4)
• Languages
• Sometimes use only English Tweets
• Future
• Translation?
57. TEXT
• Now
• Derived information: Sentiment, Topic
• Combine with other information (geo & time) + context
• Future
• Better technique + involves more NLP e.g. key phrases, etc.
79. A B C D
A C
C
Competition Tree
+ =
uclfinal.twitter.com
vs vs
vs
80. TIME + TEXT UEFA Champions League
• Challenges
• Filter relevance tweets
• Multiple matches at the same time
• Ambiguous words: “goal”, “red”, “yellow”
• Tweets mentioning both teams e.g. “#GER 2-2 #GHA”
82. TIME + GEO + TEXT State of the Union
twitter.github.io/interactive/sotu2014
83. TIME + GEO + TEXT State of the Union
1) timeline + topic from Tweets
4) Density map of
Tweets about
selected topic
3) Volume of Tweets
by topics
during selected
part of the SOTU
2) context
(speech)
twitter.github.io/interactive/sotu2014
101. Working
together
Raw data
Human
Aggregated information
Ignored informationProcessed information
VIS
Help people consume information.
Computer (One machine, Cloud, MapReduce, etc.)
NLP Make computers
think more like Human.
HCI
User interactions
or
Provide feedback
Bridge the gap. Connect human & computer.
104. Summary
• Thoughts are captured in the Tweets: what, where, when
• Finding patterns from: text + geo + time
• Opportunities for NLP + HCI + VIS collaboration
• Better technique vs. Scalability + Real-time
@kristw / interactive.twitter.com