Event-based systems have loose coupling within space, time and
synchronization, providing a scalable infrastructure for
information exchange and distributed workflows. However,
event-based systems are tightly coupled, via event subscriptions
and patterns, to the semantics of the underlying event schema and
values. The high degree of semantic heterogeneity of events in
large and open deployments such as smart cities and the sensor
web makes it difficult to develop and maintain event-based
systems. In order to address semantic coupling within event-based
systems, we propose vocabulary free subscriptions together with
the use of approximate semantic matching of events. This paper
examines the requirement of event semantic decoupling and
discusses approximate semantic event matching and the
consequences it implies for event processing systems. We
introduce a semantic event matcher and evaluate the suitability of
an approximate hybrid matcher based on both thesauri-based and
distributional semantics-based similarity and relatedness
measures. The matcher is evaluated over a structured
representation of Wikipedia and Freebase events. Initial
evaluations show that the approach matches events with a
maximal combined precision-recall F1 score of 75.89% on
average in all experiments with a subscription set of 7
subscriptions. The evaluation shows how a hybrid approach to
semantic event matching outperforms a single similarity measure
approach.
WordPress Websites for Engineers: Elevate Your Brand
Approximate Semantic Matching of Heterogeneous Events
1. Digital Enterprise Research Institute www.deri.ie
Approximate Semantic Matching of
Heterogeneous Events
Souleiman Hasan, Sean O’Riain, Edward Curry
Digital Enterprise Research Institute (DERI), National University of Ireland, Galway (NUIG)
In Proceedings of the 6th ACM International Conference on Distributed Event-Based
Systems (DEBS 2012), Berlin, Germany, 2012.
Stefan.Decker@deri.org
http://www.StefanDecker.org/
Copyright 2010 Digital Enterprise Research Institute. All rights reserved.
2. Outline
Digital Enterprise Research Institute www.deri.ie
Introduction Experiments
Smart Environments Wikipedia
Motivational Scenario Freebase
Related Work Conclusions
Proposal Q&A
Approximate Semantic Matching
2 of 34
3. Smart Environments
Digital Enterprise Research Institute www.deri.ie
Smart Homes, Grids, Cities…
Internet-of-Things, Sensor Web…
by 2020 50 billion devices connected to mobile networks (OECD, 2012)
Non-technical users
High heterogeneity
Trend for dynamic data-driven decision making
Event/Situation of Interest
Event/Situation of Interest Soccer match played in Berlin
New free parking space near me
........
3 of 34
4. Motivational Scenario- Enterprise
Digital Enterprise Research Institute www.deri.ie
CIO
CSO
Situation of Interest
Company CO2 emissions
performance Energy usage by
global IT
department
Helpdesk
Various terms used:
energy consumption,
energy usage…. PUE of the
Data Center in
room, space, zone…
Dublin
Maintenance Personnel
Dynamic Environments:
New events from kWhs used by
equipments joining and server 172.16.0.8
leaving
Building
Data Center
4 of 34
5. Requirements
Digital Enterprise Research Institute www.deri.ie
Handling of semantically heterogeneous events
Handling of dynamic environments with event
types by sources joining and leaving
Low cost of rules management
Usability
Precision
5 of 34
6. Event Processing
Digital Enterprise Research Institute www.deri.ie
Situation of Interest
When a floor is empty and its energy usage for an hour is above
threshold w.r.t budget then it is an excessive usage
User
Translation
Non-technical users with natural
Developer
language needs
CEP Engine Separated from the engine
UI
Rules tied to vocabulary
EVENT PROCESSING RULE
EPL Interface
Rules
and Parser
Repository
Execution
INSERT INTO ExcessiveEnergyUsageByFloor Pattern Matcher Repository
High cost in case of
SELECT a.floor as floor
FROM PATTERN
heterogeneity or change
[(a=FloorEmptySensor -> every b=DeviceEnergyUsageSensor
Single Event Templates
(a.floor=b.floor))] Matcher Repository
.WIN:TIME(1 hour)
GROUP BY a.floor
WHERE (b.usage) > GetAcceptableThreshold(a.budgetValue) ERP
PC NO XDG26359
Floor: 1st
usage: 3 kWh
VM: vmdgsit01.deri.ie
Floor: 1st BMS
usage: 15 kWh
6 of 34
7. Exact Event Processing Paradigm
Digital Enterprise Research Institute www.deri.ie
Requirement Addressing by the paradigm
Semantic Heterogeneity Does not scale out to high
heterogeneous environments
Dynamic Environment Does not scale out to high dynamic
environments
Rule Management High cost on large heterogeneity and
dynamicity
Usability Low
Precision 100% (typically)
7 of 34
8. Decoupling in Event Systems
Digital Enterprise Research Institute www.deri.ie
Space Producers and consumers don’t know each other
Time Participants don’t need to be actively involved in the interaction th
same time
Synchronization Event producers and consumers don’t get
blocked to send/receive events
Space
Time
Event Event
Producer Consumer
Synchronization
8 of 34
9. Decoupling in Event Systems
Digital Enterprise Research Institute www.deri.ie
Principle
“Removal of explicit dependencies between participants”
(Eugster et al., 2003)
Outcome
Scalability
Space
Time
Event Event
Producer Consumer
Synchronization
9 of 34
10. Semantic Coupling
Digital Enterprise Research Institute www.deri.ie
Current event-based systems keep explicit semantic
dependency between participants
Limited scalability in highly heterogeneous and dynamic
environment
Space
Time
Event Event
Producer Synchronization Consumer
Semantic
(Event types, property, values)
10 of 34
11. Current Approaches
Digital Enterprise Research Institute www.deri.ie
Ontology-based
(Petrovic et al., 2003), (Zhang & Ye, 2008)…
Does not “remove explicit dependency”
Hard to achieve ontology agreement a priori at large-scale of
heterogeneity and dynamicism
Medium usability, 100% precision typically
Fuzzy sets
(Liu & Jacobsen, 2002)
Address only event numerical values vs. string values
subscriptions
Medium usability, High precision
11 of 34
12. Proposed Approach
Digital Enterprise Research Institute www.deri.ie
Approximate semantic matching of events
Event Types & properties
Type(s) possible mappings
Properties
Values
Subscription Values possible
Type(s) mappings
Properties
Values
Pick best overall
mapping
Post-matching event
processing
12 of 34
13. Background
Digital Enterprise Research Institute www.deri.ie
Semantic Similarity
f: Terms X Terms [0,1]
term1, term2 are Terms
f(term1, term2)=0 absolute semantic mismatch
f(term1,term2)=1 exact match
E.g. Football Match and Soccer Match are similar
Relatedness: a general case of similarity
E.g. Football Match and Referee related but not similar
Thesaurus-based: e.g. WordNet-based
Distributional semantics-based: e.g. Wikipedia ESA
The more Wikipedia articles two terms occurs in, the more
related they are
13 of 34
14. Proposed Approach Instantiation
Digital Enterprise Research Institute www.deri.ie
Football Match Types & properties
possible mappings
2010 FIFA World
Howard Webb
type Cup Final
referee name Values possible
mappings
Spain National event
team
Football Team
team Pick best overall
location Netherlands National mapping
location Football Team
Johannesburg
Post-matching event
FNB stadium processing
Subscription
Event type “”Soccer Match
Event team “Spain”
Event place “South Africa”
14 of 34
15. Proposed Approach Instantiation
Digital Enterprise Research Institute www.deri.ie
Event Subscription Types & properties
possible mappings
type type
name place Values possible
referee team mappings
team
location Pick best overall
mapping
1
0.9 Lin
0.8 Post-matching event
0.7 Jiang&Conrath
processing
Precision
0.6 Leacock&Chodorow
0.5
0.4 Lesk
0.3
Path
0.2
0.1 Resnik
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Gloss Vector
Recall WuPalmer
15 of 34
16. Proposed Approach Instantiation
Digital Enterprise Research Institute www.deri.ie
Event Subscription Types & properties
possible mappings
type type
name place Values possible
referee team mappings
team
location Pick best overall
mapping
Determine top m correspondence candidates Post-matching event
RankSimJiiang&Conrath(ps, pe) processing
Measure properties relatedness
fP=Min(1,m-RankSimJiiang&Conrath(ps, pe) +1)*WikipediaESA(ps, pe))
16 of 34
17. Proposed Approach Instantiation
Digital Enterprise Research Institute www.deri.ie
Event Subscription Types & properties
possible mappings
type type
name place Values possible
referee team mappings
team
location Pick best overall
mapping
type type Top 1
location 90% place Post-matching event
processing
team team
type type Top 2
name 40% place
referee team
17 of 34
18. Proposed Approach Instantiation
Digital Enterprise Research Institute www.deri.ie
Event Subscription Types & properties
possible mappings
Football Match Soccer Match
Howard Webb
Spain National Football Team South Africa Values possible
Johannesburg Spain mappings
FNB stadium
Netherlands National Football Team
Pick best overall
mapping
Measure values relatedness fV=WikipediaESA(Vs, Ve)
Post-matching event
processing
18 of 34
19. Proposed Approach Instantiation
Digital Enterprise Research Institute www.deri.ie
Event Subscription Types & properties
possible mappings
Football Match Soccer Match
Howard Webb
Spain National Football Team South Africa Values possible
Johannesburg Spain mappings
FNB stadium
Netherlands National Football Team
Pick best overall
mapping
Spain National Football 95% Spain
Team Post-matching event
processing
Netherlands National 30% Spain
Football Team
19 of 34
20. Proposed Approach Instantiation
Digital Enterprise Research Institute www.deri.ie
Event Subscription Types & properties
possible mappings
type type
name place Values possible
referee team mappings
team
location Pick best overall
mapping
Football Match Soccer Match
Howard Webb
Spain National Football Team South Africa Post-matching event
Johannesburg Spain processing
FNB stadium
Netherlands National Football Team
Calculate statements relatedness
fSTMT =fP(ps, pe)*fV(vs, ve)
20 of 34
21. Proposed Approach Instantiation
Digital Enterprise Research Institute www.deri.ie
Event Subscription Types & properties
possible mappings
type type
name place Values possible
referee team mappings
team
location Pick best overall
mapping
Football Match Soccer Match
Howard Webb
Spain National Football Team South Africa Post-matching event
Johannesburg Spain processing
FNB stadium
Netherlands National Football Team
Determine correspondent event statement
Corre by Max fSTMT
21 of 34
22. Proposed Approach Instantiation
Digital Enterprise Research Institute www.deri.ie
Types & properties
Rank within a window possible mappings
Complex Event Processing
Values possible
… mappings
Pick best overall
mapping
Post-matching event
processing
22 of 34
23. Experiments Overview
Digital Enterprise Research Institute www.deri.ie
Methodology
Prepare an event set that reflect required semantic heterogeneity
(Wikipedia events)
Prepare gold standard set of subscriptions that stress multiple
aspects of semantic coupling
Validate suitability of semantic approximation from precision
perspective
Use a different event set and same subscriptions to validate low
maintainability cost (Freebase events)
Evaluation Criteria
Average interpolated Precision-Recall Curve on 11 recall points
Maximal F1 Score over the average curve
23 of 34
24. Experiment 1- Wikipedia Events
Digital Enterprise Research Institute www.deri.ie
Event Set Statistics
Source structured Wikipedia Infoboxes, DBpedia
31 August 2011
Collection Triples directly associated to instances of
dbpedia-owl:Event class
Data model RDF
Total # of events 20,156
Total # of distinct event types 4,950
Total # of distinct event properties 1,459
Total # of distinct event values 500,717
Total # of triples 1,502,599
Average # of distinct type per event 7.42
Average # of distinct property per event 30.52
Average # of distinct value per event 54.16
Average # of triple per event 64.67
24 of 34
25. Experiment 1- Wikipedia Events
Digital Enterprise Research Institute www.deri.ie
Example Event Types
Football Match
Race
Music Festival
Space Mission
Election
10th-Century BC Conflicts
Academic Conference
Aviation Accident
…
25 of 34
26. Experiment 1- Subscription Set
Digital Enterprise Research Institute www.deri.ie
Manually created gold standard set of subscriptions
ID Description Subscription # of # of Event type Event Literals and
relevant needed approximation properties resources
events exact approximation approximation
rules
1 Football matches event type "Football Match" 1 1 NO NO NO
played by Spain in the event team "Spain national football
FNB stadium team"
event stadium "FNB Stadium"
2 Football matches event type "Football Match" 2 2 NO YES NO
played in the FNB event place "FNB Stadium"
stadium
3 Events taking place in event type "Event" 219 5 NO YES Syntactic
Wembley stadium event place "Wembley Stadium"
4 Charity events taking event type "Charity" 29 6 YES YES Semantic
place in Wembley event place "Wembley Stadium" + Syntactic
stadium
5 Charity Rock events event type "Charity" 2 2 YES YES Semantic
taking place in event type "Rock" + Syntactic
Wembley stadium event place "Wembley Stadium"
6 Football matches event type "Football Match" 505 603 NO YES Background
played in the UK event stadium "United Kingdom" Knowledge
7 Football matches event type "Football Match" 20 123,774 NO YES Background
played by a South event team "South America" Knowledge
American team in event stadium "Europe"
Europe
26 of 34
27. Experiment 1- Subscription Set
Digital Enterprise Research Institute www.deri.ie
Event properties
Manually created gold standard set of subscriptions
approximation
approximation
approximation
Subscription
# of relevant
Literals and
# of needed
Description
ID Description Template # of # of Event type Event Literals and
exact rules
Event type
resources
relevant needed approximation properties resources
events exact approximation approximation
rules
events
1 Football matches event type "Football Match" 1 1 NO NO NO
ID
played by Spain in the event team "Spain national football
FNB stadium team"
event stadium "FNB Stadium"
3 Events taking event type 219 5 NO YES Syntactic
2 Football matches event type "Football Match" 2 2 NO YES NO
place in Wembley place "FNB Stadium"
played in the FNB event
"Event"
stadium
stadium event place
3 Events taking place in event type "Event" 219 5 NO YES Syntactic
Wembley stadium
"Wembley
event place "Wembley Stadium"
4 Charity events taking
Stadium"
event type "Charity" 29 6 YES YES Semantic
place in Wembley event place "Wembley Stadium" + Syntactic
stadium event type "Event"
Subscription
5 Charity Rock events event place "Wembley Stadium"
event type "Charity" 2 2 YES YES Semantic
taking place in event type "Rock" + Syntactic
Wembley stadium ?event rdf:type dbpedia-owl:Event.
event place "Wembley Stadium"
SPARQL pattern 1
6 Football matches ?event dbpprop:stadium
event type "Football Match" 505 dbpedia:Wembley_Stadium.
603 NO YES Background
played in the UK event stadium "United Kingdom" Knowledge
?event rdf:type dbpedia-owl:Event.
SPARQL pattern 2
7 Football matches event type "Football Match" 20 123,774 NO YES Background
played by a South ?event dbpedia-owl:location
event team "South America" dbpedia:Wembley_Stadium.
Knowledge
American team in event stadium "Europe"
… Europe …
27 of 34
28. Experiment 1- Results
Digital Enterprise Research Institute www.deri.ie
1
0.9
0.8
0.7
Precision
0.6
0.5 Events taking place in Wembley stadium
0.4
0.3 Need for a hybrid matcher that
0.2
0.1 combines both
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Recall
45%
Jiang&Conrath 40% Wikipedia ESA
35%
Frequency
30%
25% 1
20% 0.9
15% 0.8
10% 0.7
Precision
0.6
5%
0.5
Football matches played in the UK
0%
0.4
0 2^ -25 2^ -20 2^ -15 2^ -10
0.3 2^ -5 1
0.2
Semantic similarity or relatedness score
0.1
(log scale) 0
Jiang&Conrath WikipediaESA
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Recall
Jiang&Conrath Wikipedia ESA
28 of 34
29. Experiment 1- Results
Digital Enterprise Research Institute www.deri.ie
Hybrid matcher outperforms a single similarity or
relatedness measure matcher.
Matcher Jiang&Conrath Wikipedia ESA Hybrid
Maximal F1 Score 70.06% 44.26% 75.45%
Recall 80% 80% 90%
Precision 62.31% 30.59% 64.94%
1
0.9
0.8
0.7
Precision
0.6
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Recall
Jiang&Conrath Wikipedia ESA Hybrid
29 of 34
30. Experiment 2- Freebase Event Set
Digital Enterprise Research Institute www.deri.ie
Event Set Statistics
Source Freebase events dump 1 December 2011,
triples current
Collection Triples directly associated to instances of
“fbase:time.event" class
Data model RDF
Total # of events 84,529
Total # of distinct event types 858
Total # of distinct event properties 1,242
Total # of distinct event values 1,199,627
Total # of triples 1,859,338
Average # of distinct type per event 3.33
Average # of distinct property per event 10.67
Average # of distinct value per event 21.66
Average # of triple per event 21.99
30 of 34
31. Experiment 2- Subscription Set
Digital Enterprise Research Institute www.deri.ie
Same as in Experiment 1.
ID Description Subscription # of # of Event type Event Literals and
relevant needed approximation properties resources
events exact approximation approximation
rules
1 Football matches event type "Football Match" 1 1 YES YES NO
played by Spain in the event team "Spain national football
FNB stadium team"
event stadium "FNB Stadium"
2 Football matches event type "Football Match" 8 2 YES YES NO
played in the FNB event place "FNB Stadium"
stadium
3 Events taking place in event type "Event" 29 5 NO YES NO
Wembley stadium event place "Wembley Stadium"
4 Charity events taking event type "Charity" 0 - - - -
place in Wembley event place "Wembley Stadium"
stadium
5 Charity Rock events event type "Charity" 0 - - - -
taking place in event type "Rock"
Wembley stadium event place "Wembley Stadium"
6 Football matches event type "Football Match" 34 1,398 YES YES Background
played in the UK event stadium "United Kingdom" Knowledge
7 Football matches event type "Football Match" 2 219,600 YES YES Background
played by a South event team "South America" Knowledge
American team in event stadium "Europe"
Europe
31 of 34
32. Experiment 2- Results
Digital Enterprise Research Institute www.deri.ie
Hybrid matcher gives similar results in Freebase as in
DBpedia
Matcher Jiang&Conrath Wikipedia ESA Hybrid
Maximal F1 Score 44.60% 70.73% 76.33%
Recall 60% 80% 80%
Precision 35.49% 63.39% 72.98%
1
0.9
0.8
0.7
Precision
0.6
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Recall
Jiang&Conrath Wikipedia ESA Hybrid
32 of 34
33. Conclusions
Digital Enterprise Research Institute www.deri.ie
Approximate semantic matcher addresses subscriptions/
rules maintainability cost in heterogeneous and dynamic
environments
Approximate semantic matcher is suitable when less than
100% precision is acceptable
Approximate Semantic
Exact Matcher
Matcher
Number of Required Subscriptions 345,000 7
Maximal F1-Score 100% 75.89%
A hybrid matcher outperforms a single similarity or
relatedness measure matcher.
33 of 34
34. Future Work
Digital Enterprise Research Institute www.deri.ie
Need to enhance subscription set for more
representativeness.
Approximate semantic matcher generates “uncertain”
results whose impacts on further event processing
functions such as CEP needs to be studied
34 of 34