14. Dynamic Topics
Manually updating
keywords to get topic
relevant tweets is not
feasible
“indianelection”
“modi”
“bjp”
“congress”
“jan25”
“egypt”
“tunisia”
“arabspring”
“sandy”
“newyork”
“redcross”
“fema”
“swineflu” “ebola”
14
15. Problem
How can we automatically update
the filters to track a dynamically
evolving topic on Twitter
15
16. Hashtags as Filters
• Identify a topic on Twitter
• Tweets with hashtags are
more informative
• Users have a lot of freedom
to create them
• Some get popular, most die
16
28. Clustering Co-efficient of Hashtag
Co-occurrence network (1%)
Clustering co-efficient
The top ones co-occur
with each other the best
28
29. Determining Relevancy of Co-
occurring Hashtags
#indianelection2015
#modikisarkar
Co-occurring:
Threshold δ
Preferably a prominent hashtag
29
30. Hashtag Co-occurrence
works?
o No. Just co-occurrence does not work
o Many noisy or unrelated hashtags co-occurs
o Determine the “dynamic” relevance of
the top co-occurring hashtag with the
dynamic topic
30
31. Determining Relevancy of Co-
occurring Hashtags
#indianelection2015
#modikisarkar
Co-occurring:
Threshold
Latest K (200,500)
Narendra Modi: 0.9
BJP: 0.7
NDA: 0.6
India: 0.4
Elections: 0.2
Rahul Gandhi: 0.2
Congress: 0.2
Entity Extraction
and Scoring
δ
Normalized
Frequency
Scoring
31
(Vector Space Model)
32. Determining Relevancy of Co-
occurring Hashtags (Vector
Space Model)
#indianelection2015
#modikisarkar
Co-occurring:
Threshold
Latest K (200,500)
Narendra Modi: 0.9
BJP: 0.7
NDA: 0.6
India: 0.4
Elections: 0.2
Rahul Gandhi: 0.2
Congress: 0.2
Entity Extraction
and Scoring
Indian General
Election,_2014
Dynamically Updated
Background Knowledge
δ
32
35. o Entities mentioned on the Event page of
Wikipedia are relevant to the Event
Event Relevant Background
Knowledge
35
36. o Wikipedia’s Hyperlink structure is very
rich
o Page-Page (Wikipedia) links
Indian General
Election, 2014
Narendra Modi
Rahul Gandhi
NDA (India)UPA (India)
BJP
Indian National
Congress
Event Relevant Background
Knowledge – Graph Structure
36
37. Determining Relevancy of Co-
occurring Hashtags (Vector
Space Model)
#indianelection2015
#modikisarkar
Co-occurring:
Threshold
Latest K (200,500)
Narendra Modi: 0.9
BJP: 0.7
NDA: 0.6
India: 0.4
Elections: 0.2
Rahul Gandhi: 0.2
Congress: 0.2
Entity Extraction
and Scoring
Indian General
Election,_2014
Extract, Periodically
Update Hyperlink structure
One hop from Event
Page
δ
37
38. o Hyperlink structure is dynamically
updated
Indian General
Election, 2014
Narendra Modi
Rahul Gandhi
NDA (India)UPA (India)
BJP
Indian National
Congress
10 May 2010
Event Relevant Background
Knowledge
38
39. o Hyperlink structure is dynamically
updated
Indian General
Election, 2014
Narendra Modi
Rahul Gandhi
NDA (India)UPA (India)
BJP
Indian National
Congress
10 May 2010
29 March 2013
29 March 2013 29 March 2013
29 March 2013
Event Relevant Background
Knowledge
39
40. o Hyperlink structure is dynamically
updated
Indian General
Election, 2014
Narendra Modi
Rahul Gandhi
NDA (India)UPA (India)
BJP
Indian National
Congress
10 May 2010
29 March 2013
29 March 2013 29 March 2013
29 March 2013
20 May 2013
20 May 2013
Event Relevant Background
Knowledge
40
41. Determining Relevancy of Co-
occurring Hashtags (Vector
Space Model)
#indianelection2015
#modikisarkar
Co-occurring:
Threshold
Latest K (200,500)
Narendra Modi: 0.9
BJP: 0.7
NDA: 0.6
India: 0.4
Elections: 0.2
Rahul Gandhi: 0.2
Congress: 0.2
Entity Extraction
and Scoring
Indian General
Election,_2014
Extract, Periodically
Update Hyperlink structure
Entity scoring based
on relevance to the Event
One hop from Event
Page
δ
41
42. o Edge Based Measure
o Link Overlap Measure: Jaccard similarity
o Out(c) are the links in Wikipedia page “c”
o Final Score: r(c,E) = ed(c,E) + oco(c,E)
Hyperlink Entity Scoring
India General
Election, 2014
Narendra Modi
India General
Election, 2014
India General
Election, 2009
1
Mutually
Important
ed (c,E) = 1
ed (c,E) = 2
42
43. Determining Relevancy of Co-
occurring Hashtags (Vector
Space Model)
#indianelection2015
#modikisarkar
Co-occurring:
Threshold
Latest K (200,500)
Narendra Modi: 0.9
BJP: 0.7
NDA: 0.6
India: 0.4
Elections: 0.2
Rahul Gandhi: 0.2
Congress: 0.2
Entity Extraction
and Scoring
Indian General
Election,_2014
Extract, Periodically
Update Hyperlink structure
Entity scoring based
on relevance to the Event
One hop from Event
Page
Indian General Elec: 1.0
India: 0.9
Elections: 0.7
UPA: 0.6
BJP: 0.3
NDA: 0.3
Narendra Modi: 0.3
δ
43
44. Determining Relevancy of Co-
occurring Hashtags (Vector
Space Model)
#indianelection2015
#modikisarkar
Co-occurring:
Threshold
Latest K (200,500)
Narendra Modi: 0.9
BJP: 0.7
NDA: 0.6
India: 0.4
Elections: 0.2
Rahul Gandhi: 0.2
Congress: 0.2
Entity Extraction
and Scoring
Indian General
Election,_2014
Extract, Periodically
Update Hyperlink structure
Entity scoring based
on relevance to the Event
One hop from Event
Page
Indian General Elec: 1.0
India: 0.9
Elections: 0.7
UPA: 0.6
BJP: 0.3
NDA: 0.3
Narendra Modi: 0.3
Similarity
Check
Relevance Score: 0.6
δ
44
45. o Set Based
o Jaccard Similarity
o Considers the entities without the scores
o Vector Based
o Symmetric
o Cosine Similarity
o Asymmetric
o Subsumption Similarity
Similarity Check
45
47. Determining Relevancy of Co-
occurring Hashtags (Vector
Space Model)
#indianelection2015
#modikisarkar
Co-occurring:
Threshold
Latest K (200,500)
Narendra Modi: 0.9
BJP: 0.7
NDA: 0.6
India: 0.4
Elections: 0.2
Rahul Gandhi: 0.2
Congress: 0.2
Entity Extraction
and Scoring
Indian General
Election,_2014
Extract, Periodically
Update Hyperlink structure
Entity scoring based
on relevance to the Event
One hop from Event
Page
Indian General Elec: 1.0
India: 0.9
Elections: 0.7
UPA: 0.6
BJP: 0.3
NDA: 0.3
Narendra Modi: 0.3
Similarity
Check
Relevance Score: 0.6
δ
47
48. o 2 events
o US Presidential Elections (#election2012)
o Hurricane Sandy (#sandy)
o Top 25 co-occurring hashtags
Evaluation – Dataset
48
49. o Ranking Problem
o Rank the Top 25 hashtags based on the
relevancy of tweets to the event
o Experiment with all the similarity metrics
o Manually annotated the tweets of these
hashtags as relevant/irrelevant (Gold
Standard)
o Ranking Evaluation Metrics
o Mean Average Precision
o NDCG
Evaluation –
Strategy
49
56. o User Interest Identification on Twitter
o Content-based (Only Tweets)
o Term-based (semantic, web, #semanticweb)
o Entity-based (sematic web <same as> #semanticweb)
o Interest Graphs derived from knowledge-base
(Hierarchical Interest Graphs)
o Collaborative (Users’ Friends)
o Hybrid
User Modeling
56
60. What is in your mind? (Next
concept/term)
Fruit
60
61. What is in your mind? (Next
concept/term)
Fruit
Other Fruit
Names
61
62. Cognitive Science
o Human memory has been argued to be
structured as a hierarchy of concepts
(Semantic Network)
o Spreading activation theory has been
utilized to simulate search on semantic
network
o This theory has not been well explored
for user interest modeling
62
63. Hierarchical Interest Graphs
o Extending user profiles from Twitter to
comprise a hierarchy of concepts
o Hierarchy of concepts are derived from
Wikipedia Category Structure
o Each concept in the hierarchy is scored
based on the users extent of interest
63
76. 76
Cricket
M S Dhoni Virat Kohli
Sachin
Tendulkar
Sports
Indian
Cricket
Indian
Cricketers
0.8 0.2 0.6
0.5
0.4
0.25
0.1
Activation Function
Determines the extent of spreading
Example
87. Boosting Common Ancestors
87
Cricket
M S Dhoni Virat Kohli
Sachin
Tendulkar
Sports
Indian
Cricket
Indian
Cricketers3
3
5
5
Michael
Clarke
Shane
Watson
Australian
Cricket
Australian
Cricketers
2
2
87
89. o Bell
𝐴𝑗 = 𝐴𝑖 × 𝐹𝑗
𝑛
𝑖=0
o Bell Log
𝐴𝑗 = 𝐴𝑖 × 𝐹𝐿𝑗
𝑛
𝑖=0
o Priority Intersect
𝐴𝑗 = 𝐴𝑖 × 𝐹𝐿𝑗 × 𝑃𝑗𝑖 × 𝐵𝑗
𝑛
𝑖=0
89
Activation Functions
90. Evaluation
User Study
• 37 Users
• 30K Tweets
Evaluated the top-10 categories of
interests derived from the hierarchy
• 76% Mean Average Precision
• 98% Mean Reciprocal Recall
• 70% are not mentioned in tweets
90
91. o Working on a Tweet recommendation
system that utilizes Hierarchical
Interest Graph
o Preliminary results are “interesting”
91
Tweet Recommendation using
Hierarchical Interest Graph
92. Conclusion
o Focus on “Information” overload instead of
“Data” overload.
o Personalized Information Filtering
o Knowledge-base enabled solutions for
challenges in Tweets filtering
o Wikipedia hyperlink structure and category
graph leveraged for Twitter data filtering
o More Research on User Specific Attribute
Extraction (Personalization) from Twitter
Data
o Activity Estimation
o Location Prediction
95. Through physical monitoring and
analysis, our cellphones could act as
an early warning system to detect
serious health conditions, and
provide actionable information
canary in a coal mine
Empowering Individuals (who are not Larry Smarr!) for their own health
kHealth: knowledge-enabled healthcare
95
97. Motivational Scenario
Manually going through
news articles, diabetes
forums, blogs, etc.
- Time consuming
- Relevant?
Interesting?
Informative? Useful?
97
How about all the relevant and important health
information aggregated at one platform?
A diabetic patient is interested in keeping himself up to date with
new information about diabetes
98. 98
Search and Explore
X Controls
Cancer
X = diet, treatment, exercise
(Pattern-based Approach
leveraging domain
semantics)
Top Health News
Informative news about selected
disease
Faceted search (by health topics)
Learn about disease
Source: Wikipedia
Search &
Explore
Top Health
News
Tweet
Traffic
Learn about
Disease
Home