Knoesis-Semantic filtering-Tutorials

Semantic Filtering
An example of Semantic technologies for real-time
analysis
Pavan Kapanipathi
Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis)
Wright State University, USA
Tutorial @ Kno.e.sis Centre: Semantics Approach to Big Data and Event Processing, Oct 7-9, 2015

Streams are everywhere
Social Data
Text
Images
Videos
Sensor Data
Streams

Information Overload
500M users generate 500M tweets per day
3
It’s not information overload.
It’s filter failure
-- Clay Shirky

Each of our projects face
Information Overload
• Disaster Management
• Hazards SEES
• Healthcare Issues
• Depression
• Societal Issues
• Edrug Trends
• Harassment

• Filtering is necessary
• Understanding the
requirements and utilizing
semantics for filtering is
important
Semantic Filtering

Two Main Topics
• Twarql
• Streaming annotation and flexible
querying on Twitter
• Continuous Semantics
• Tracking dynamic topics on Twitter

Twarql
Tracking health
care debate in the
United States on
Social Media
Health Care Reform
Health Care
Reform
Healthcare reform
legislation in the
United States
Patient Protection
and Affordable Care
Act (Obamacare)
Health Care Reform

Extraction Pipeline - Tweet
I think it’s good deal Apple Ipad Tablet (3G, wifi, WiFi + 3G) Hard
Nylon Cube Carrying Case for ipad ( iPad.. http://bit.ly/cry6LF)
Dbpedia:Ipad
Dbpedia:Tablet
URLs
http://penguinkang.com/tweetprobe/

RDF
• RDF Annotation
• Common RDF/OWL Data formats.
• FOAF, SIOC, OPO, MOAT

: Health_care_reform
Twarql – Use Case

Demo
http://knoesis.wright.edu/library/tools/twarql/demo.swf

Dynamic Topics
Continuously
Evolving on
Twitter
Entity – Event
relevance
changes
Many entities are
involved
14

Dynamic Topics
Manually crawl using
keywords
“indianelection”“jan25” “sandy”
“swineflu” “ebola”
15

Dynamic Topics
Manually updating keywords
to get topic relevant tweets is
not feasible
“indianelection”
“modi”
“bjp”
“congress”
“jan25”
“egypt”
“tunisia”
“arabspring”
“sandy”
“newyork”
“redcross”
“fema”
“swineflu” “ebola”
16

Problem
How can we automatically update the
filters to track a dynamically evolving
topic on Twitter
17

Hashtags as Filters
• Identify a topic on Twitter
• Tweets with hashtags are more
informative
• Users have a lot of freedom to
create them
• Some get popular, most die
18

Exploring Hashtags as Evolving Filters for
Dynamic Topics
Colorado Shooting
Occupy Wall Street
CS OWS
Tweets: 122,062 Tweets: 6,077,378
Tags: 192,512
Distinct: 12,350
100% Retrieval: 7,763
Tags: 15,963,209
Distinct: 191,602
100% Retrieval: 21,314
HASHTAG
FILTERS 19

Top 1% retrieves
around 85% of the
tweets
Hashtag distributions
20

Colorado Shooting Occupy Wall Street
Event Related
Hashtags co-occur
with each other
Hashtag Filters Co-occurrence Graph
21

Summarizing Hashtag Analysis
Starting with one of the event relevant
hashtags, by co-occurrence we can reach
other relevant hashtags
22

Determining Relevancy of Co-occurring
Hashtags
#indianelection2015
#modikisarkar
Too many
co-occurring hashtags
23

Hashtags
#indianelection2015
#modikisarkar
Co-occurring:
Threshold δ
Preferably a prominent hashtag
24

Hashtag Co-occurrence works?
o No. Just co-occurrence does not work
o Many noisy or unrelated hashtags co-occurs
o Determine the “dynamic” relevance of the top
co-occurring hashtag with the dynamic topic
25

Hashtags
#indianelection2015
#modikisarkar
Co-occurring:
Threshold
Latest K (200,500)
Narendra Modi: 0.9
BJP: 0.7
NDA: 0.6
India: 0.4
Elections: 0.2
Rahul Gandhi: 0.2
Congress: 0.2
Entity Extraction
and Scoring
δ
Normalized
Frequency
Scoring
26
(Vector Space Model)

Hashtags (Vector Space Model)
#indianelection2015
#modikisarkar
Co-occurring:
Threshold
Latest K (200,500)
Narendra Modi: 0.9
BJP: 0.7
NDA: 0.6
India: 0.4
Elections: 0.2
Rahul Gandhi: 0.2
Congress: 0.2
Entity Extraction
and Scoring
Indian General
Election,_2014
Dynamically Updated
Background Knowledge
δ
27

Event Relevant Background
Knowledge
o Wikipedia Event Pages
28

o Wikipedia Event Pages
Knowledge
29

o Entities mentioned on the Event page of
Wikipedia are relevant to the Event
Knowledge
30

o Wikipedia’s Hyperlink structure is very rich
o Page-Page (Wikipedia) links
Indian General
Election, 2014
Narendra Modi
Rahul Gandhi
NDA (India)UPA (India)
BJP
Indian National
Congress
Knowledge – Graph Structure
31

#indianelection2015
#modikisarkar
Co-occurring:
Threshold
Latest K (200,500)
Narendra Modi: 0.9
BJP: 0.7
NDA: 0.6
India: 0.4
Elections: 0.2
Rahul Gandhi: 0.2
Congress: 0.2
Entity Extraction
and Scoring
Indian General
Election,_2014
Extract, Periodically
Update Hyperlink structure
One hop from Event
Page
δ
32

o Hyperlink structure is dynamically updated
Indian General
Election, 2014
Narendra Modi
Rahul Gandhi
BJP
Indian National
Congress
10 May 2010
Knowledge
33

Indian General
Election, 2014
Narendra Modi
Rahul Gandhi
BJP
Indian National
Congress
10 May 2010
29 March 2013
29 March 2013 29 March 2013
29 March 2013
Knowledge
34

Indian General
Election, 2014
Narendra Modi
Rahul Gandhi
BJP
Indian National
Congress
10 May 2010
29 March 2013
29 March 2013 29 March 2013
29 March 2013
20 May 2013
20 May 2013
Knowledge
35

#indianelection2015
#modikisarkar
Co-occurring:
Threshold
Latest K (200,500)
Narendra Modi: 0.9
BJP: 0.7
NDA: 0.6
India: 0.4
Elections: 0.2
Rahul Gandhi: 0.2
Congress: 0.2
Entity Extraction
and Scoring
Indian General
Election,_2014
Entity scoring based
on relevance to the Event
One hop from Event
Page
δ
36

o Edge Based Measure
o Link Overlap Measure: Jaccard similarity
o Out(c) are the links in Wikipedia page “c”
o Final Score: r(c,E) = ed(c,E) + oco(c,E)
Hyperlink Entity Scoring
India General
Election, 2014
Narendra Modi
India General
Election, 2014
India General
Election, 2009
1
Mutually
Important
ed (c,E) = 1
ed (c,E) = 2
37

#indianelection2015
#modikisarkar
Co-occurring:
Threshold
Latest K (200,500)
Narendra Modi: 0.9
BJP: 0.7
NDA: 0.6
India: 0.4
Elections: 0.2
Rahul Gandhi: 0.2
Congress: 0.2
Entity Extraction
and Scoring
Indian General
Election,_2014
One hop from Event
Page
Indian General Elec: 1.0
India: 0.9
Elections: 0.7
UPA: 0.6
BJP: 0.3
NDA: 0.3
Narendra Modi: 0.3
δ
38

#indianelection2015
#modikisarkar
Co-occurring:
Threshold
Latest K (200,500)
Narendra Modi: 0.9
BJP: 0.7
NDA: 0.6
India: 0.4
Elections: 0.2
Rahul Gandhi: 0.2
Congress: 0.2
Entity Extraction
and Scoring
Indian General
Election,_2014
One hop from Event
Page
India: 0.9
Elections: 0.7
UPA: 0.6
BJP: 0.3
NDA: 0.3
Narendra Modi: 0.3
Similarity
Check
Relevance Score: 0.6
δ
39

o Set Based
o Jaccard Similarity
o Considers the entities without the scores
o Vector Based
o Symmetric
o Cosine Similarity
o Asymmetric
o Subsumption Similarity
Similarity Check
40

India General
Election 2014
Narendra
Modi
Intuition behind
Asymmetric
India General
Election 2014
Narendra
Modi
Penalized
Ignored
Similarity
Symmetric
Asymmetric
41

#indianelection2015
#modikisarkar
Co-occurring:
Threshold
Latest K (200,500)
Narendra Modi: 0.9
BJP: 0.7
NDA: 0.6
India: 0.4
Elections: 0.2
Rahul Gandhi: 0.2
Congress: 0.2
Entity Extraction
and Scoring
Indian General
Election,_2014
One hop from Event
Page
India: 0.9
Elections: 0.7
UPA: 0.6
BJP: 0.3
NDA: 0.3
Narendra Modi: 0.3
Similarity
Check
Relevance Score: 0.6
δ
42

o 2 events
o US Presidential Elections (#election2012)
o Hurricane Sandy (#sandy)
o Top 25 co-occurring hashtags
Evaluation – Dataset
43

o Ranking Problem
o Rank the Top 25 hashtags based on the relevancy
of tweets to the event
o Experiment with all the similarity metrics
o Manually annotated the tweets of these hashtags
as relevant/irrelevant (Gold Standard)
o Ranking Evaluation Metrics
o Mean Average Precision
o NDCG
Evaluation –
Strategy
44

Evaluation
Evaluated tweets comprising of top-relevant
hashtags detected for dynamic topics
• NDCG - 92% at top-5 Mean Average
Precision
46

Conclusions
• Semantic Technologies for Real-time filtering of Social
Data
– Wikipedia as a Dynamic Knowledge base for events
– Determining relevant hashtags using Asymmetric similarity
measure
– More hashtags in turn increase the coverage of Tweets for
events
• Hashtag Analysis
– Co-occurrence technique can be used to detect event
relevant hashtags
– More popular hashtags are easier to be detected via co-
occurrence
47

Thanks
Contact: @pavankaps
pavan@knoesis.org

Knoesis-Semantic filtering-Tutorials

Recomendados

Recomendados

Más contenido relacionado

Destacado

Destacado (13)

Similar a Knoesis-Semantic filtering-Tutorials

Similar a Knoesis-Semantic filtering-Tutorials (20)

Más de Pavan Kapanipathi

Más de Pavan Kapanipathi (9)

Último

Último (20)

Knoesis-Semantic filtering-Tutorials

Notas del editor