7. Tools
Word Filter Web Application
300,000
Tweets
Filter
User
Tuesday, February 26, 13
8. Algorithm
read query and d/l
corpus of 1500 tweets
filter out
common words count link two candidates
words if their relative
proportion of co-
rank remaining
occurrence is
words by number
select potentially greater than 0.25
of occurrences and
meaningful words
select top 10
rank connected
rank remaining components by
cluster candidates
words by rate of total occurrences
into groups
capitalization and and take top 3
select top 10
assign tweets
to clusters
Tuesday, February 26, 13
9. Kevin Teh
kkwteh@gmail.com
Math PhD -- May ’13 B.A.Sc. -- April ’07
Topic: Noncommutative Geometry (Whatever that is) Engineering Science (Whatever that is)
Tuesday, February 26, 13