Aggregated, Interoperable and Multi-Domain User Profiles for the Social Web
1. Digital Enterprise Research Institute www.deri.ie
Aggregated, Interoperable and Multi-Domain
User Profiles for the Social Web
Fabrizio Orlandi, John G. Breslin, Alexandre Passant
I-Semantics – Graz, Austria – 5-7 Sept. 2012
Copyright 2011 Digital Enterprise Research Institute. All rights reserved.
Enabling Networked Knowledge
2. User Profiling on the Social Web
Digital Enterprise Research Institute www.deri.ie
Disconnected
social
websites
Isolated
data
silos
http://www.w3.org
Enabling Networked Knowledge
4. Our Solution
Digital Enterprise Research Institute www.deri.ie
Interlink social websites
Integration
& Merge and model user data
User Modelling
User Profile
Personalise users’ experience
using their profile
Recommendations Adaptive Systems
Search Personalisation
Enabling Networked Knowledge
5. Linking Open Data
Digital Enterprise Research Institute www.deri.ie
The Web of Data: a continuously evolving “open corpus”
LOD Cloud by R. Cyganiak
5 and A. Jentzsch
Enabling Networked Knowledge
6. Representing User Profiles of Interest
Digital Enterprise Research Institute www.deri.ie
dbp: Semantic_Web
foaf:topic_interest
wi:topic
0.7
foaf: Person wo:weight_value
wi:preference
wo:weight
wi:Weighted_Interest wo:Weight
wo:scale
opm: wasDerivedFrom
1.0 wo:Scale
wo:max_weight
sioc:UserAccount
0.0 wo:min_weight
Enabling Networked Knowledge
6
8. Service-specific Data Collector
Digital Enterprise Research Institute www.deri.ie
Facebook and Twitter sources
OAuth 2.0 user authentication system
PHP libraries: Facebook PHP-SDK, Twitter-async
Data collected from APIs: (up to 1 year back)
– User messages, posts, comments
– Likes
– Check-in
– Profile information
Enabling Networked Knowledge
8
9. Data Analyser & Profile Generator
Digital Enterprise Research Institute www.deri.ie
Enabling Networked Knowledge
9
10. Data Analyser & Profile Generator
Digital Enterprise Research Institute www.deri.ie
Natural Language Processing tool: Zemanta
Used to spot entities on the collected data and link to DBpedia
List of entities as interests
Named entities (DBpedia URIs), their occurrences and metadata (provenance)
are recorded.
Interest Weighting Strategy
Based on frequency and time distance.
– Frequency => counting the number of occurrences
– Time Distance => using Exponential Time Decay function
t/τ
x(t) x0 e mean lifetime
RDF representation of interests and weights
Enabling Networked Knowledge
10
12. Profiles Aggregator
Digital Enterprise Research Institute www.deri.ie
Aggregation of the different platform-specific profiles in one global
user profile of interests
Easy aggregation of the interests using RDF
Triples merged in the triplestore
Provenance of the interests preserved
Aggregation of the weights
Gi Ws wis
Weight of i in s
s
Global weight
interest i Source s Weight of source s
Enabling Networked Knowledge
12
13. DBpedia Resources vs. Categories
Digital Enterprise Research Institute www.deri.ie
A user profile as a ranked list of DBpedia Resources or Categories
Dbpedia Resources weight DBpedia Categories weight
The_Clash 0.82 Buzzwords 0.48
Alternative_rock 0.71 Semantic_Web 0.87
Semantic_Web 0.48 Web_Services 0.48
Social_media 0.42 World_Wide_Web 0.39
Linked_Data 0.39 Hypermedia 0.39
… … … …
Enabling Networked Knowledge
14. Categories weighting-schemes
Digital Enterprise Research Institute www.deri.ie
1st Strategy (Cat1):
Weights of the Resources/Interests propagated to the related
Categories
Cat1 Weight = Sum of the weights of the Category’s
Resources
2nd Strategy (Cat2):
Same as 1st Strategy but with discount for “broad” Categories
1 1
Cat Discount
log ( SP ) log ( SC )
where: SP = Set of Pages belonging to the Category,
SC = Set of Sub-Categories.
Enabling Networked Knowledge
15. Experiment
Digital Enterprise Research Institute www.deri.ie
6 types of user profiles evaluated:
2 types of DBpedia entities
– Categories vs. Resources
2 types of weighting-scheme for category-based methods
– Cat1: Interests Weight Propagation
– Cat2: Interests Weight Propagation w/ Cat. Discount
2 types of exponential Time Decay function
– Short mean lifetime 120 days
– Long mean lifetime 360 days
Enabling Networked Knowledge
16. Experiment
Digital Enterprise Research Institute www.deri.ie
6 types of user profiles evaluated:
Res Cat
Cat1 Cat2
Res-120 Res-360 Cat1-120 Cat1-360 Cat2-120 Cat2-360
Enabling Networked Knowledge
17. User-based Evaluation
Digital Enterprise Research Institute www.deri.ie
21 users:
21 to 45 years old – 76% IT students/researchers
Average User Activity:
Enabling Networked Knowledge
17
18. User-based Evaluation
Digital Enterprise Research Institute www.deri.ie
We asked users to rate the top 10 interests generated for each of
the 6 profiling strategies
Question:
“Please rate how relevant is each concept for representing your
personal interests and context…”
Rating:
0 (not at all or don't know), 1 (low), 2, 3, 4, 5 (high)
Rating converted to a (0…10) scale
Performance evaluated with:
MRR (Mean Reciprocal Rank)
P@10 (Precision at K = 10)
Comparison with a Baseline
A traditional approach based on “keyword frequency”
Enabling Networked Knowledge
18
21. Cat1 vs. Cat2 (Cat.Discount)
Digital Enterprise Research Institute www.deri.ie
Enabling Networked Knowledge
22. t120 vs. t360
Digital Enterprise Research Institute www.deri.ie
Enabling Networked Knowledge
23. Evaluation
Digital Enterprise Research Institute www.deri.ie
On average for:
200 Tweets
200 Facebook posts, and items.
~106 interests - DBpedia Resources
~720 interests – DBpedia Categories (~7 times)
Statistical significance (t-Test & Wilcoxon’s test) for:
Resources vs. Categories (p<0.05)
Any method vs. Baseline (p<0.05)
Not for time decay (p~0.2) and Cat1 vs. Cat2
Enabling Networked Knowledge
24. Conclusions
Digital Enterprise Research Institute www.deri.ie
User profiles generated with DBpedia Resources are more accurate than
with Categories.
Using Categories generates 7 times more entities than using Resources
(and comparable accuracy)
Useful for Recommendation Systems.
Semantics + disambiguation + time decay function outperforms traditional
keyword-based methods.
Insight:
Sometimes Resources “too specific” and Categories “too broad”:
=> Mixed approach to be explored.
TODO: Evaluation in different scenarios (e.g. Recommendations)
Enabling Networked Knowledge
25. Thanks
Digital Enterprise Research Institute www.deri.ie
Contacts:
Fabrizio Orlandi
http://bit.ly/orlandi
fabrizio.orlandi@deri.org
@BadmotorF
Enabling Networked Knowledge