IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
Stardog Linked Data Catalog
1. Stardog
Linked Data Catalog
Héctor Pérez-Urbina
Edgar Rodríguez-Díaz
Clark & Parsia, LLC
{hector, edgar}@clarkparsia.com
2. Who are we?
● Clark & Parsia is a semantic software startup
● HQ in Washington, DC & office in Boston
● Provides software development and integration
services
● Specializing in Semantic Web, web services, and
advanced AI technologies for federal and
enterprise customers
http://clarkparsia.com/
Twitter: @candp
3. What's SLDC?
● Stardog Linked Data Catalog
● A catalog of data sources
○ Semi structured
○ Relational
○ Object-oriented
○ ...
● Provides a coherent view over existing data
repositories so that users and/or
applications can easily find them and query
them
7. Semantic Technologies
● W3C standards
○ RDF(S), OWL, SPARQL
● Lower operational costs and raise productivity
○ Cooperation without coordination
○ Appropriate abstractions
○ Declarative is better than imperative
○ Correctness when it matters; sloppiness
when it doesn’t
8. Data Model
● Similar to DCAT from W3C
○ Catalog entries
● Enhanced with
○ SSD
○ VoID datasets
○ SKOS background models
○ Axioms & rules
9. Modeling the Domain
● Use of axioms to model
relationships between
classes
○ :Query subClassOf :
Resource
○ :Entry subClassOf :
Resource
● Retrieve the resources
user :u can see
○ SELECT ?resource
WHERE { ?resource
type :Resource . }
10. Security
● Authentication
○ Shiro-Based implementation
○ Extensible to LDAP and/or AD
● Authorization
○ Eat-your-own-food approach
○ Reasoning-Based
○ Use of axioms & rules
12. Deriving Permissions
● If a user has a permission role containing a
read permission associated to a resource,
then the user has the same permission over
the resource
:permissionRole(?user,?role),
:readPermission(?role,?resource) ->
:readUserPermission(?user,?resource)
● Everybody has read access to public
resources
:User(?user),
:PublicResource(?resource) ->
:readUserPermission(?user,?resource)
13. Deriving Permissions
● User :user1 has delete permissions over any
source
○ :deleteUserPermission(?user,:anySource),
:DataSource(?source) ->
:deleteUserPermission(?user,?source)
○ :user1 :deleteUserPermission :anySource
● Everybody has all permissions to the resources
they created
○ :resourceCreator(?user,?resource) ->
:allUserPermissions(?user,?resource)
○ :allUserPermissions(?user,?resource) ->
:readUserPermission(?user,?resource)
○ ...
14. Impact of Reasoning
Can user :user1 delete resource :source1?
ASK WHERE {
{ :user1 :deleteUserPermission :source1 . }
UNION
{ :user1 :permissionRole ?role .
?role :deletePermission :source1 . }
UNION
{ :user1 :resourceCreator :source1 . }
UNION
{ :user1 :deleteUserPermission :anyResource . }
UNION
{ :user1 :allUserPermissions :source1 . }
UNION
{ ... }
UNION
...
15. Impact of Reasoning
● Are you sure you're not missing anything?
● New awesome way of getting delete permissions
you came up with yesterday
● Model knowledge where it belongs and let the
reasoner do the work for you:
ASK WHERE {
{ :user1 :deleteUserPermission :source1 . }
}
16. Too much Inference?
When I say
:deleteUserPermission domain :User
:deleteUserPermission range :Resource
I mean that for every triple
:user1 :deleteUserPermission :resource1
the individual :user1 must be an instance of :
User and :resource1 of :Resource.
But the reasoner doesn't find the error!!
17. Typing Constraint
Only users can have delete user permissions
● :deleteUserPermission domain :User
● :user1 :deleteUserPermission :resource1
18. Typing Constraint
Only users can have delete user permissions
● :deleteUserPermission domain :User
● :user1 :deleteUserPermission :resource1
OWA CWA
Consistent true false
Infer that Assume that
Reason :user1 type :User :user1 type not :User
19. CWA or OWA?
● Which one?
○ Of course use both!
● Some axioms should be interpreted under
CWA
:deleteUserPermission domain :User
● And others under OWA
:SuperUser subClassOf :User
● So the right thing happens
:user1 :deleteUserPermission :resource1
:user1 type :SuperUser
20. SLDC for Data Integration
● SLDC provides descriptions of data sources,
relationships between them, and information
to query them
● We can treat data sources as an integrated
single data source
○ Distributed querying
○ AI analytics
● Virtual, materialized, hybrid
24. Summing Up
● SLDC is a linked data catalog
○ Manage a variety of sources
○ Find sources
○ Query sources
● Implemented using Semantic Technologies
○ Reasoning
■ Axioms & Rules
○ Data validation
○ Data integration
26. Why?
● Large organizations
○ Disparate departments
○ Independent, isolated sources
● Where is what?
○ Do we have a data source about clients?
○ Where is it?
● Who created what?
○ Who owns it?
● Who has access to what?
○ Do I have access to it?
○ Who do I talk to to get it?
27. Source Management
● Management
○ Create, delete, update, clone
● Import
○ RDF, HTML, XML
● Subscription
○ Endpoint location
● Categorization
○ Categories
○ External vocabularies
● Sharing
○ To specific users
○ Public
28. Querying Sources
● Querying metadata
○ Queries about the catalog itself
● External query
○ Querying a particular source
● Integrated query
○ Querying a set of integrated sources
● Query management
● Query sharing
● Results export
30. Last but not least
● NLP processing
○ Entity/Event extraction from natural language
source descriptions
○ Better source classification & search
● Graph algorithms
○ What's the shortest path between these
resources?
● Clustering
○ Can we discover similar sources based on a
given criteria?
31. Axioms
● It's not always about simple taxonomies...
● What about domain/range axioms?
○ :someProperty domain :SomeClass
○ :a :someProperty :b
○ :SomeClass(x)?
● What about complex subclass chains?
○ :SomeClass subClassOf :someProperty
some :OtherClass
○ :someProperty some :OtherClass subClassOf
:AnotherClass
○ :a type :SomeClass
○ :AnotherClass(x)?
● What about cardinality constraints, universal
quantification, datatype reasoning, ...?
32. Data Validation
● Fundamental data management problem
○ Verify data integrity and correctness
○ Data corruption can lead to failures in applications, errors
in decision making, security vulnerabilities, etc.
● Relevant in many scenarios
○ Storing data for stand-alone applications
○ Exchanging data in distributed settings
● For some use cases, data validation is critical but
we still want to do it intelligently
33. Participation Constraint
Each resource must have been created by a user
● :Resource subClassOf inv(resourceCreator) some
:User
● :resource1 type :Resource
OWA CWA
Consistent true false
Infer that
Assume that
● _:b : _:b :resourceCreator :
Reason resourceCreator :
resource1
resource1
is false
● _:b type :Resource
34. Uniqueness Constraint
Each data source must belong to at most one
catalog entry
● :dataSource inverseFunctional
● :entry1 :dataSource :dataSource1
● :entry2 :dataSource :dataSource1
OWA CWA
Consistent true false
Assume that
Infer that
Reason :entry1 sameAs :entry2
:entry1 sameAs :entry2
is false