Rough Set Semantics for Identity Management on the Web
1. Rough Set Semantics for
Identity management on the Web
Wouter Beek
(wouterbeek.com)
Stefan Schlobach
Frank van Harmelen
2. Problems of identity
• Statements only hold in certain contexts (no substitution salva
veritate)
• Identity is mistaken for representation.
• Identity is mistaken for (close) relatedness.
But more importantly:
• Semantics: identity assertion (claim about meaning)
• Pragmatics: data linking (import additional properties)
• Due to: Open World Assumption
4. Can Leibniz help?
• Indiscernibility of identicals (Leibniz’ principle)
• 𝑎 = 𝑏 → ∀𝜙 𝜙 𝑎 = 𝜙 𝑏
• Identity of indiscernibles
• ∀𝜙 𝜙 𝑎 = 𝜙 𝑏 → 𝑎 = 𝑏
• Trivially true, since 𝜆𝑥. (𝑥 = 𝑏) is one of the 𝜙’s
5. Solutions (as identified in the literature) [1/2]
1) Weaken owl:sameAs
E.g. skos:closeMatch
2) Extend owl:sameAs
Annotate with Fuzzyness or uncertainty.
3) Make contexts explicit
E.g. use named graphs
E.g. use namespaces
“That is the star that can be seen in the morning, but not in the
evening”@geolocation
6. Solutions (as identified in the literature) [2/2]
4) Use domain-specific identity relations
“x and y have the same medical use” @medicine
“x and y are the same molecule” @chemistry
5) Change modeling practice
Notification upon read.
Require reciprocal confirmation upon change.
“On the Web of Data, anybody can say anything about anything.”
[Van Harmelen]
7. Indiscernibility
Identity is the smallest equivalence relation.
Indiscernibility: resources are the same w.r.t. a limited set of predicates.
Indiscernibility is an equivalence relation (reasoning!), although not
necessarily the smallest one.
Every indiscernibility relation is also an identity relation, but over a different
domain:
• Example: Take the set of people and property 𝑃𝑖 ⊆ 𝑃𝑒𝑜𝑝𝑙𝑒 × 𝐼𝑛𝑐𝑜𝑚𝑒.
Context {𝑃𝑖 } induces the identity relation between income-groups.
8. Indiscernibility 1
Two resources are indiscernible w.r.t. a set of predicates
𝑃 ⊆ 𝑃 𝐺 (predicate terms in G), if they share the predicate-object
pairs for 𝑃.
𝐼𝑁𝐷 𝑃 =
𝑥, 𝑦 ∈ 𝑆 2 ∀ 𝑝∈𝑃 (𝑓 𝑝 𝑥 = 𝑓 𝑝 (𝑦))}
𝐺
where 𝑓 𝑝 𝑥 = {𝑦| 𝐼 𝑥 , 𝑦 ∈ 𝐸𝑥𝑡 𝐼 𝑝 }
Example: “Wouter and Stefan have the same employer, so they are
indiscernible w.r.t. predicate hasEmployer.
9. Indiscernibility 2
• We take a given identity relation and partition it into subsets (i.e.
identity sub-relations) which are described in terms of the vocabulary.
• Subsets of the given identity relation are 𝑃∗ -indiscernible, for sets of
predicates 𝑃∗ ⊆ ℘ 𝑃 𝐺
Example:
• “(Wouter and Albert) and (Stefan and Paul) belong to the same
identity sub-relation, since they are indiscernible w.r.t. the same
collections of properties.
• Wouter and Albert are “employedAs PhD”; Stefan and Paul are
“employedAs Assistant Professor”.
14. Quality
| ≈𝐿 |
∝ ≈ =
|≈𝐻|
• Based on the rough set approximation ≈ 𝐿 , ≈ 𝐻 .
• Since a consistently applied identity relation has relatively many
partition sets that contain either no identity pairs (small value for
| ≈ 𝐻 |) or only identity pairs (large value for | ≈ 𝐿 |), a more consistent
identity relation has a higher quality metric.
15. Generalizations
• This works for any binary relation (not only owl:sameAs).
• We only discussed the identity of non-property resources, but properties
can also be identical.
• We skipped the treatment of blank nodes and typed literals (which have
special identity criteria).
• The indiscernibility ‘language’ can be made must stronger, allowing more
fine-grained identity sub-relations:
•
•
•
•
Length-1 paths, e.g. “Wouter lives in the Netherlands.”
Length-2 paths, e.g. “Wouter lives in a country which borders Germany.”
Length-𝑛 paths.
Intervals in the value space of typed literals, e.g. “was published between 1901 and
1905”
• Natural language translation, e.g. “lives in Germany” and “lives in Deutschland”
17. Indiscernibility 1 (generalized)
Two resources are indiscernible w.r.t a set of PPMs 𝑃 ⊂ 𝑃 𝐺𝑛 , if
they share the properties denoted by 𝑃.
𝐼𝑁𝐷 𝑃 =
𝑥, 𝑦 ∈ 𝑆 2 ∀ 𝑝∈
𝐺
𝑃
(𝑓 𝑝 𝑥 ≍ 𝑓 𝑝 (𝑦))}
Example: “Wouter and Stefan have the same employer, so they are
indiscernible w.r.t. has-employer.
Details:
• 𝑃 =
𝑝1 ,…,𝑝 𝑛 ∈𝑃
𝑝1 × ⋯ × 𝑝 𝑛
18. Indiscernibility 2 (generalized)
We take a given set of pairs (e.g. an identity relation) and partition it
into subsets which are described in terms of the schema.
Subsets of the given (identity) relation are 𝑃 -indiscernible, for sets of
PPNs 𝑃∗ ⊆ ℘ 𝑃 𝐺𝑛
20. Conclusion
Problem:
• There is a conflict between semantics and pragmatics of identity.
• This will not be fixed in the short term by using extensions to existing
logics (e.g. contexts, fuzziness, probability).
Solution:
• Identify different identity relations automatically, and in terms of the
domain predicates (no extra constructs are needed!).
• Define the meaning of a specific identity relation in terms of its
indiscernibility criteria.