This paper was presented at the Fifth International Workshop on Resource Discovery (RED 2012: http://www.labf.usb.ve/RED2012/) at ESWC 2012 (http://2012.eswc-conferences.org/) Conference in Heraklion, Crete, Greece on 27 May 2012.
The full paper can be found at: http://ceur-ws.org/Vol-862/REDp5.pdf
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Discovering Semantic Equivalence of People behind Online Profiles (RED 2012 - ESWC 2012)
1. Digital Enterprise Research Institute
www.deri.ie
Discovering Semantic Equivalence
of People behind
Online Profiles
Keith Cortis, Simon Scerri, Ismael Rivera, Siegfried
Handschuh
REsource Discovery (RED),
Workshop at ESWC 2012
27th May 2012
Copyright 2011 Digital Enterprise Research Institute. All rights reserved.
Enabling Networked Knowledge
2. Motivation
Digital Enterprise Research Institute
www.deri.ie
Current situation:
Personal data is
unnecessarily duplicated
over different platforms
No possibility to merge or
port such data
Separate handling of this
data
Social Networking Sites as Walled
Gardens – David Simonds
Enabling Networked Knowledge
3. Problem Specification
Digital Enterprise Research Institute
www.deri.ie
No common standards exist for modelling profile data in
online accounts
Personal data (known contacts and presence
information) is dynamic and continuously changing
Enabling Networked Knowledge
4. Objectives
Digital Enterprise Research Institute
www.deri.ie
Aim: User represented through one digital identity
Main Challenge: Discovery of semantic equivalence
between contacts described in online profiles
Proposal: Use a comprehensive ontology framework for
handling online profile data
Enabling Networked Knowledge
6. Related Work Comparison
Digital Enterprise Research Institute
www.deri.ie
Existing Profile Linking Approaches based on:
o
o
Specific Inverse Functional Properties (e.g. email address)
o
Syntactic matching of all profile attributes
o
User’s friends
Semantic relatedness between text, depending on Knowledge
Bases (KB) such as Wikipedia
Our Approach: Similarity measure based on user’s
Personal Information Model (PIM)
PIM
Enabling Networked Knowledge
7. Approach (1)
Digital Enterprise Research Institute
www.deri.ie
A
User Profile Data
B
Ontology Mapping
C
Matching Attributes
D
Value Matching
Indirect String Matching
Linguistic
Analysis
2
Syntactic Matching
Direct String Matching
1
3
4
Semantic
Search
Extension
Ontologyenhanced
Attribute
Weighting
Online Profile Resolution
Enabling Networked Knowledge
8. Approach (2)
Digital Enterprise Research Institute
www.deri.ie
A
User Profile Data
B
Ontology Mapping
C
Matching Attributes
D
Value Matching
Indirect String Matching
Linguistic
Analysis
2
Syntactic Matching
Direct String Matching
1
3
4
Semantic
Search
Extension
Ontologyenhanced
Attribute
Weighting
Online Profile Resolution
Enabling Networked Knowledge
9. Approach (3)
Digital Enterprise Research Institute
www.deri.ie
A
User Profile Data
B
Ontology Mapping
C
Matching Attributes
D
Value Matching
Indirect String Matching
Linguistic
Analysis
2
Syntactic Matching
Direct String Matching
1
3
4
Semantic
Search
Extension
Ontologyenhanced
Attribute
Weighting
Online Profile Resolution
Enabling Networked Knowledge
10. Approach (3)
Digital Enterprise Research Institute
www.deri.ie
Identity-related online profile information - NCO
Presence and online post data for the user – DLPO
Enabling Networked Knowledge
11. Approach (3)
Digital Enterprise Research Institute
www.deri.ie
Account Ontology (DAO) – for modelling service account
representations
DLPO
representative
Contact
DAO
LivePost
MultimediaPost
PresencePost
WebDocumentPost
Message
Account
source
source
hasCredentials
Credentials
nao:externalIdentifier
rdfs:label
rdfs:label
userID
password
xsd:string
hasCustomAttribute
NCO
PersonContact
photo
key
sound
foafUrl
OrganizationContact
rdfs:Resource websiteUrl
blogUrl
nie:DataObject
EmailAddress hasEmailAddressbelongsToGroup
ContactGroup
PostalAddress hasPostalAddress
PhoneNumber hasPhoneNumber hasLocation
geo:Point
hasIMAccount
Name
IMAccount
hasName
Enabling Networked Knowledge
12. Approach (4)
Digital Enterprise Research Institute
www.deri.ie
A
User Profile Data
B
Ontology Mapping
C
Matching Attributes
D
Value Matching
Indirect String Matching
Linguistic
Analysis
2
Syntactic Matching
Direct String Matching
1
3
4
Semantic
Search
Extension
Ontologyenhanced
Attribute
Weighting
Online Profile Resolution
Enabling Networked Knowledge
13. Approach (4)
Digital Enterprise Research Institute
www.deri.ie
A
User Profile Data
B
Ontology Mapping
C
Matching Attributes
D
Value Matching
Indirect String Matching
Linguistic
Analysis
2
Syntactic Matching
Direct String Matching
1
3
4
Semantic
Search
Extension
Ontologyenhanced
Attribute
Weighting
Online Profile Resolution
Enabling Networked Knowledge
14. Approach (4)
Digital Enterprise Research Institute
www.deri.ie
A
User Profile Data
B
Ontology Mapping
C
Matching Attributes
D
Value Matching
Indirect String Matching
Linguistic
Analysis
2
Syntactic Matching
Direct String Matching
1
3
4
Semantic
Search
Extension
Ontologyenhanced
Attribute
Weighting
Online Profile Resolution
Enabling Networked Knowledge
15. Approach (4)
Digital Enterprise Research Institute
www.deri.ie
A
User Profile Data
B
Ontology Mapping
C
Matching Attributes
D
Value Matching
Indirect String Matching
Linguistic
Analysis
2
Syntactic Matching
Direct String Matching
1
3
4
Semantic
Search
Extension
Ontologyenhanced
Attribute
Weighting
Online Profile Resolution
Enabling Networked Knowledge
16. Approach (4)
Digital Enterprise Research Institute
www.deri.ie
A
User Profile Data
B
Ontology Mapping
C
Matching Attributes
D
Value Matching
Indirect String Matching
Linguistic
Analysis
2
Syntactic Matching
Direct String Matching
1
3
4
Semantic
Search
Extension
Ontologyenhanced
Attribute
Weighting
Online Profile Resolution
Enabling Networked Knowledge
17. Approach (5)
Digital Enterprise Research Institute
www.deri.ie
A
User Profile Data
B
Ontology Mapping
C
Matching Attributes
D
Value Matching
Indirect String Matching
Linguistic
Analysis
2
Syntactic Matching
Direct String Matching
1
3
4
Semantic
Search
Extension
Ontologyenhanced
Attribute
Weighting
Online Profile Resolution
Enabling Networked Knowledge
18. Implementation
Digital Enterprise Research Institute
www.deri.ie
Transformation
Linguistic Analysis
ANNIE
Information
Extraction System
Large KB
Gazetteer
Lookup
“DERI, Lower Dangan, Galway, Ireland”
PIM
Organisation
Street
City
Country
Enabling Networked Knowledge
20. Summary
Digital Enterprise Research Institute
www.deri.ie
Objectives
o
o
Future Work
Aggregated profile data is
lifted onto a unique PIM
representation and
integrated in a super profile
o
Integration of further online
accounts
o
Semantic extension to the
syntactic-based profile
attribute matching
o
Definition of a metric
o
Analysis of online posts
from multiple accounts
o
Determination of semantic
equivalence between
contacts described in online
profiles
Evaluation of artefact
Thank you for your attention
keith.cortis@deri.org
Enabling Networked Knowledge
Notas del editor
-Users are currently required to create and separately manage duplicated personal data in numerous, heterogeneous online account services-Walled Garden: separate handling of data results in creating a wall around connections and personal data as reflected in the image -> portability, identity, linkability, privacy-Personal data In these accounts: static identity-related information to more dynamic information, as well as physical and online presence.
-Focus of study not a straightforward task:1. no common standards exist for modelling profile data in online accounts -> retrieval and integration of federated heterogeneous personal data is instantly a hard task 2. some personal data is dynamic (known contacts and presence information) -> Dealing with the multiple user digital identities can result in being a complex task
Aim: enable user to create, aggregate and merge multiple online profiles in one digital identity -One digital identity through Digital.Meuserware: i)a single access point to the user’s personal information sphere, ii) refers to personal data on a user’s multiple devices such as laptops, tablets and smartphones (after challenge) – online profile, their attributes and shared posts.Focus: Integration of multiple user online profiles - (e.g. health, bank, government, social related) but currently our focus is on social networksProposal: This comes in the form of a comprehensive ontology framework, which serves as a standard format for handling static and dynamic profile data (a set of re-used, extended and new vocabularies)
Pyramid of the OSCAF Ontologies – adopted by di.meframwork (reused, extended, new) PIM representation uses these ontologies. – based on PIMO, NCO, DLPO For the problem in question (multiple identity integration), of particular relevance are the:NCO : modelling profile attributesPIMO: modelling user’s interests & who knows whom (NCO, PIMO all are established) - glues together knowledge represented by all the other domain upper-level ontologiesLivePost Ontology: modelling online posts (just 1 of a no. of new ontologies being engineered)Other targeting domains: user presence (DPO), context (DCON), history (DUHO), rules (DRMO), devices (DDO), accounts (DAO)In Di.me a no. of established ontologies have been brought together to offer a representation solution tailored for the project's objectives (reused, extended, new)
-IFP : a property which uniquely identifies a user : linking based on IFP only is shallow since users can create multiple accounts within the same social network, with a diff email-Personal Information Model - an instance of PIMO ontology : main KB for semantic matching, knowledge from external KBs-PIM: initially populated with any personal info integrated from a part. online account/crawled from a device. If there is no match of a particular entity, a new instance is created. (there will be one user profile initially)-Adv of PIM: contains info that is of direct interest to the user, thus more relevant to user than external KB – bound to yield more accurate results-remote KBs such as DPBedia or any other dataset that is part of the LOD cloud, will be accessed to determine any possible semantic relationship if no data exists in PIM
Online profile matching approach involvesfour successive processes as outlined in the image presented.
-Retrieve user’s profile information available through the service account APIs. Info targeted: user’s own identity-related information, online posts, contact’s info. - All crawled info. is aggregated into what we refer to as the user’s ‘super profile’
Mapping of attributes for each represented online profile with the equivalent attributes for the super profile -The use of ontologies and RDF (main data representation) -> mapping we pursue considers both syntactic as well as semantic similarities in between online profile data
Identity-related online profile information is stored as an instance of the NCO ontology – represents info that is related to a part contactPresence and online post data for the user is stored as instances of DLPO – represents personal presence info that is popularly shared in online accounts e.g. stat msg, checkin, etc.
Contacts (NCO instance) and Liveposts (DLPO instance) are linked to instances of accounts (dao:Account), that refer to a particular account e.g. di.me, LinkedIn, Facebook, Twitter
-Matching the user profile attributes - we consider the data both at a semantic and syntactic level. It involves four successive processes as outlined in (C)
1. Linguistic Analysis: - on the profile attributes that may contain complex/unstructured information such as a postal address, unlike the ones with an atomic value (person’s name, phone number). Required for discovering further knowledge from a particular value. Also, hyperlink resolution if not enough info within profile.
2. Syntactic Matching: -Value Matching: for attr. of a non-string literal type (e.g. dob or geo pos), since these have a strict, predefined structure -Direct String Matching: for attr. of type ‘string’, if their ontology type (e.g. name, addr) is either known beforehand or discovered through NER -Indirect String Matching: applied if attr. entity remains unknown even after NER is performed, over all PIM instances, regardless of their type -string matching metric – Monge and Elkan: user profile attribute values online to attributes stored in PIM KB
3. Semantic Search Extension: -To find if 2 attributes are semantically related, given that they don’t syntactically match. -user’s PIM is the main KB used, whilst remote KBs e.g. DBPedia or any other dataset in LOD cloud will also be used to determine any possible semantic relationship, if required data not found within the PIM.
4. Ontology-enhanced Attribute Weighting: an appropriate metric is required for weighting the attributes which were syntactically and/or semantically matched
-Based on the ontology attribute weighting metric, we establish a threshold which determines semantic equivalence between user online profile and their personal identity which is already known and represented at the PIM level.-Given that 2 profiles are sem. eq., a user can be suggested to merge profile info that’s known over multiple online accounts-Integration of semantically-equivalent personal info across distributed sources will create unique user representation in the PIM
XSPARQL - transformation between the XML social data into our RDF representation (Turtle) is declaratively expressed in a XSPARQL queryJSONLib– used to translate JSON into XMLANNIE – contains several main processing resources for common NLP tasks, such as a: tokeniser, sentence splitter, POS tagger, gazetteer, finite state transducer, orthomatcher and coreference resolver -> pre-defined gazetteers for common entity types (e.g. location, organizations, etc.), which we extended with acr. or abbr. where necessaryLarge KB Gazetteer - to make use of the information stored within the user’s PIM, since it can get populated dynamically by loading any ontology from RDF data.
-User’s Personal Information Model (PIM) - glues together personal info from different sources in this case:-from an online account (OnlineAccountX) & the user’s super profile (Digital.MeAccount)-attributes of the user online profiles will be mapped to their corresponding properties within the di.me ontology framework-five identity-related profile attributes mapped within NCO (affiliation, organization, phone numer, person name, postal address) -e.g. label of org within the nco:org property i.e. ’Digital Enterprise Research Institute’ is matched against other org instances within the PIM The super profile instance ’DERI’ is one example of other PIM instances having the same type.-Presence-related profile info. available in the form of a complex type ’livepost’, is composed of… - ”Having a beer with Anna @ESWC12 in Iraklion” -> Status & Checkin & Event Post -> result of Linguistic analysis on online post -Semantic search example:-user’s addr in super profile listed as ‘Iraklion, is related to a pimo:City instance – ‘Heraklion’-user’s addr in online profile is ‘GR’, is related to pimo:Country instance –’Greece.’-two addr’s don’t syntactically match but are semantically related-through PIM KB, system knows that city and country instances related to both addr’s are related through ‘locatedWithin’ property -> partial semantic searchAdv of using ontologies: - resources can be linked at the semantic level, rather than the syntactic or format level.pimo:groundingOccurrence property, which relates an ’abstract’ but unique subject to one or more of its occurrences.-upper part of Fig. T-Box -> the ontological classes and attributes / lower part of Fig. A-Box -> egs of how the ontologiescan be used in practice -straight lines between the A- and T-box denote an instance-of relationship
Integration of further online service accounts to our current system e.g. Health (RunKeeper), bank, government, social related accounts (Foursquare, Dropbox, Flickr)Metric: takes into account all the resulting weighted matches which were syntactically and/or semantically matched or partially matched>Threshold: determines whether two or more online profile refer to the same person-Evaluation: performed on 3 levels: syntactic matching, ii) semantic matching, and iii) a combination of
-Overall di.me Objective: integrating all personal data in a personal information sphere by a single, user-controlled single point of access: the di.meuserware.-Our part in di.me: WP3 – Objectives and Tasks mentioned in slide