SlideShare una empresa de Scribd logo
1 de 34
Descargar para leer sin conexión
Stardog
Linked Data Catalog
      Héctor Pérez-Urbina
     Edgar Rodríguez-Díaz

       Clark & Parsia, LLC
 {hector, edgar}@clarkparsia.com
Who are we?
● Clark & Parsia is a semantic software startup
● HQ in Washington, DC & office in Boston
● Provides software development and integration
  services
● Specializing in Semantic Web, web services, and
  advanced AI technologies for federal and
  enterprise customers

        http://clarkparsia.com/
        Twitter: @candp
What's SLDC?
● Stardog Linked Data Catalog
● A catalog of data sources
    ○ Semi structured
    ○ Relational
    ○ Object-oriented
    ○ ...
● Provides a coherent view over existing data
  repositories so that users and/or
  applications can easily find them and query
  them
Use Cases
● Sources
   ○ Management, import, subscription,
     categorization, sharing
● Query
   ○ Management, sharing, results export
   ○ Querying
      ■ Metadata, external sources, integration
● Locating sources
   ○ Search, browse
● NLP/AI
   ○ Entity extraction, graph algorithms, clustering
     analysis
Application layer




  Middleware layer




NLP/AI analytics layer




     Data layer
Demo
Semantic Technologies
● W3C standards
   ○ RDF(S), OWL, SPARQL
● Lower operational costs and raise productivity
   ○ Cooperation without coordination
   ○ Appropriate abstractions
   ○ Declarative is better than imperative
   ○ Correctness when it matters; sloppiness
     when it doesn’t
Data Model
● Similar to DCAT from W3C
   ○ Catalog entries
● Enhanced with
   ○ SSD
   ○ VoID datasets
   ○ SKOS background models
   ○ Axioms & rules
Modeling the Domain
● Use of axioms to model
  relationships between
  classes
   ○ :Query subClassOf :
     Resource
   ○ :Entry subClassOf :
     Resource
● Retrieve the resources
  user :u can see
   ○ SELECT ?resource
     WHERE { ?resource
     type :Resource . }
Security
● Authentication
   ○ Shiro-Based implementation
   ○ Extensible to LDAP and/or AD
● Authorization
   ○ Eat-your-own-food approach
   ○ Reasoning-Based
   ○ Use of axioms & rules
Deriving Permissions
● Users have permission
  roles
● Permission roles have
  permission relations with
  resources
Deriving Permissions
● If a user has a permission role containing a
  read permission associated to a resource,
  then the user has the same permission over
  the resource
     :permissionRole(?user,?role),
     :readPermission(?role,?resource) ->
     :readUserPermission(?user,?resource)
● Everybody has read access to public
  resources
     :User(?user),
     :PublicResource(?resource) ->
     :readUserPermission(?user,?resource)
Deriving Permissions
● User :user1 has delete permissions over any
  source
   ○ :deleteUserPermission(?user,:anySource),
     :DataSource(?source) ->
     :deleteUserPermission(?user,?source)
   ○ :user1 :deleteUserPermission :anySource
● Everybody has all permissions to the resources
  they created
   ○ :resourceCreator(?user,?resource) ->
     :allUserPermissions(?user,?resource)
   ○ :allUserPermissions(?user,?resource) ->
     :readUserPermission(?user,?resource)
   ○ ...
Impact of Reasoning
Can user :user1 delete resource :source1?
     ASK WHERE {
         { :user1 :deleteUserPermission :source1 . }
         UNION
         { :user1 :permissionRole ?role .
           ?role :deletePermission :source1 . }
         UNION
         { :user1 :resourceCreator :source1 . }
         UNION
         { :user1 :deleteUserPermission :anyResource . }
         UNION
         { :user1 :allUserPermissions :source1 . }
         UNION
         { ... }
         UNION
         ...
Impact of Reasoning
● Are you sure you're not missing anything?
● New awesome way of getting delete permissions
  you came up with yesterday
● Model knowledge where it belongs and let the
  reasoner do the work for you:
    ASK WHERE {
        { :user1 :deleteUserPermission :source1 . }
    }
Too much Inference?
When I say
   :deleteUserPermission domain :User
   :deleteUserPermission range :Resource
I mean that for every triple
  :user1 :deleteUserPermission :resource1
the individual :user1 must be an instance of :
User and :resource1 of :Resource.

But the reasoner doesn't find the error!!
Typing Constraint
Only users can have delete user permissions
 ● :deleteUserPermission domain :User
 ● :user1 :deleteUserPermission :resource1
Typing Constraint
Only users can have delete user permissions
  ● :deleteUserPermission domain :User
  ● :user1 :deleteUserPermission :resource1


                     OWA                  CWA
Consistent            true                 false

             Infer that          Assume that
Reason       :user1 type :User   :user1 type not :User
CWA or OWA?
● Which one?
   ○ Of course use both!
● Some axioms should be interpreted under
  CWA
        :deleteUserPermission domain :User
● And others under OWA
        :SuperUser subClassOf   :User
● So the right thing happens
        :user1 :deleteUserPermission :resource1
        :user1 type :SuperUser
SLDC for Data Integration
● SLDC provides descriptions of data sources,
  relationships between them, and information
  to query them
● We can treat data sources as an integrated
  single data source
    ○ Distributed querying
    ○ AI analytics
● Virtual, materialized, hybrid
Stardog Linked Data Catalog
Stardog Linked Data Catalog
Mappings
● Simple
   ○ pops:Employee subClassOf foaf:Person
   ○ pops:Project equivalentTo foaf:Project
   ○ pops:hasEmployee subPropertyOf foaf:member
● SWRL-Based
   ○ pops:firstName(?person, ?first),
     pops:lastName(?person, ?last),
     swrlb:concat(?name, ?first, " ", ?last) ->
     foaf:name(?person, ?name)
   ○ pops:worksOnProject(?person,?project),
     pops:ActiveProject(?project) ->
     foaf:currentProject(?person,?project)
Summing Up
● SLDC is a linked data catalog
    ○ Manage a variety of sources
    ○ Find sources
    ○ Query sources
● Implemented using Semantic Technologies
    ○ Reasoning
       ■ Axioms & Rules
    ○ Data validation
    ○ Data integration
Questions?
Why?
● Large organizations
   ○ Disparate departments
   ○ Independent, isolated sources
● Where is what?
   ○ Do we have a data source about clients?
   ○ Where is it?
● Who created what?
   ○ Who owns it?
● Who has access to what?
   ○ Do I have access to it?
   ○ Who do I talk to to get it?
Source Management
● Management
    ○ Create, delete, update, clone
● Import
    ○ RDF, HTML, XML
● Subscription
    ○ Endpoint location
● Categorization
    ○ Categories
    ○ External vocabularies
● Sharing
    ○ To specific users
    ○ Public
Querying Sources
● Querying metadata
    ○ Queries about the catalog itself
● External query
    ○ Querying a particular source
● Integrated query
    ○ Querying a set of integrated sources
● Query management
● Query sharing
● Results export
Finding Sources
● Browse
   ○ Facets
   ○ Pelorus
● Search
   ○ Text-based search
   ○ Rich query language
Last but not least
● NLP processing
   ○ Entity/Event extraction from natural language
     source descriptions
   ○ Better source classification & search
● Graph algorithms
   ○ What's the shortest path between these
     resources?
● Clustering
   ○ Can we discover similar sources based on a
     given criteria?
Axioms
● It's not always about simple taxonomies...
● What about domain/range axioms?
   ○ :someProperty domain :SomeClass
   ○ :a :someProperty :b
   ○ :SomeClass(x)?
● What about complex subclass chains?
   ○ :SomeClass subClassOf :someProperty
     some :OtherClass
   ○ :someProperty some :OtherClass subClassOf
     :AnotherClass
   ○ :a type :SomeClass
   ○ :AnotherClass(x)?
● What about cardinality constraints, universal
  quantification, datatype reasoning, ...?
Data Validation
● Fundamental data management problem
   ○ Verify data integrity and correctness
   ○ Data corruption can lead to failures in applications, errors
     in decision making, security vulnerabilities, etc.
● Relevant in many scenarios
   ○ Storing data for stand-alone applications
   ○ Exchanging data in distributed settings
● For some use cases, data validation is critical but
  we still want to do it intelligently
Participation Constraint
Each resource must have been created by a user
 ● :Resource subClassOf inv(resourceCreator) some
   :User
 ● :resource1 type :Resource


                     OWA                         CWA
Consistent             true                       false

             Infer that
                                        Assume that
                 ● _:b :                _:b :resourceCreator :
Reason             resourceCreator :
                                        resource1
                   resource1
                                        is false
                 ● _:b type :Resource
Uniqueness Constraint
Each data source must belong to at most one
catalog entry
 ● :dataSource inverseFunctional
 ● :entry1 :dataSource :dataSource1
 ● :entry2 :dataSource :dataSource1

                     OWA                      CWA
Consistent            true                    false

                                    Assume that
             Infer that
Reason       :entry1 sameAs :entry2
                                    :entry1 sameAs :entry2
                                    is false

Más contenido relacionado

La actualidad más candente

Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics
Approaching Join Index: Presented by Mikhail Khludnev, Grid DynamicsApproaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics
Approaching Join Index: Presented by Mikhail Khludnev, Grid DynamicsLucidworks
 
Access Control for HTTP Operations on Linked Data
Access Control for HTTP Operations on Linked DataAccess Control for HTTP Operations on Linked Data
Access Control for HTTP Operations on Linked DataLuca Costabello
 
Solr search engine with multiple table relation
Solr search engine with multiple table relationSolr search engine with multiple table relation
Solr search engine with multiple table relationJay Bharat
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to ElasticsearchClifford James
 
A Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
A Multifaceted Look At Faceting - Ted Sullivan, LucidworksA Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
A Multifaceted Look At Faceting - Ted Sullivan, LucidworksLucidworks
 
Gerry McNicol Graph Databases
Gerry McNicol Graph DatabasesGerry McNicol Graph Databases
Gerry McNicol Graph DatabasesGerry McNicol
 
Elasticsearch Basics
Elasticsearch BasicsElasticsearch Basics
Elasticsearch BasicsShifa Khan
 
Webinar: Modern Techniques for Better Search Relevance with Fusion
Webinar: Modern Techniques for Better Search Relevance with FusionWebinar: Modern Techniques for Better Search Relevance with Fusion
Webinar: Modern Techniques for Better Search Relevance with FusionLucidworks
 
ElasticSearch AJUG 2013
ElasticSearch AJUG 2013ElasticSearch AJUG 2013
ElasticSearch AJUG 2013Roy Russo
 
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...Olaf Hartig
 
Enterprise Search Europe 2015: Fishing the big data streams - the future of ...
Enterprise Search Europe 2015:  Fishing the big data streams - the future of ...Enterprise Search Europe 2015:  Fishing the big data streams - the future of ...
Enterprise Search Europe 2015: Fishing the big data streams - the future of ...Charlie Hull
 
Cool bonsai cool - an introduction to ElasticSearch
Cool bonsai cool - an introduction to ElasticSearchCool bonsai cool - an introduction to ElasticSearch
Cool bonsai cool - an introduction to ElasticSearchclintongormley
 
Neural Architectures for Named Entity Recognition
Neural Architectures for Named Entity RecognitionNeural Architectures for Named Entity Recognition
Neural Architectures for Named Entity RecognitionRrubaa Panchendrarajan
 
A Survey of Elasticsearch Usage
A Survey of Elasticsearch UsageA Survey of Elasticsearch Usage
A Survey of Elasticsearch UsageGreg Brown
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrTrey Grainger
 
Elasticsearch - Devoxx France 2012 - English version
Elasticsearch - Devoxx France 2012 - English versionElasticsearch - Devoxx France 2012 - English version
Elasticsearch - Devoxx France 2012 - English versionDavid Pilato
 
NoSQL Databases, Not just a Buzzword
NoSQL Databases, Not just a Buzzword NoSQL Databases, Not just a Buzzword
NoSQL Databases, Not just a Buzzword Haitham El-Ghareeb
 
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...Olaf Hartig
 

La actualidad más candente (20)

Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics
Approaching Join Index: Presented by Mikhail Khludnev, Grid DynamicsApproaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics
Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics
 
Access Control for HTTP Operations on Linked Data
Access Control for HTTP Operations on Linked DataAccess Control for HTTP Operations on Linked Data
Access Control for HTTP Operations on Linked Data
 
Solr vs ElasticSearch
Solr vs ElasticSearchSolr vs ElasticSearch
Solr vs ElasticSearch
 
Solr search engine with multiple table relation
Solr search engine with multiple table relationSolr search engine with multiple table relation
Solr search engine with multiple table relation
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to Elasticsearch
 
A Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
A Multifaceted Look At Faceting - Ted Sullivan, LucidworksA Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
A Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
 
Gerry McNicol Graph Databases
Gerry McNicol Graph DatabasesGerry McNicol Graph Databases
Gerry McNicol Graph Databases
 
Elasticsearch Basics
Elasticsearch BasicsElasticsearch Basics
Elasticsearch Basics
 
Webinar: Modern Techniques for Better Search Relevance with Fusion
Webinar: Modern Techniques for Better Search Relevance with FusionWebinar: Modern Techniques for Better Search Relevance with Fusion
Webinar: Modern Techniques for Better Search Relevance with Fusion
 
ElasticSearch AJUG 2013
ElasticSearch AJUG 2013ElasticSearch AJUG 2013
ElasticSearch AJUG 2013
 
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...
 
elasticsearch
elasticsearchelasticsearch
elasticsearch
 
Enterprise Search Europe 2015: Fishing the big data streams - the future of ...
Enterprise Search Europe 2015:  Fishing the big data streams - the future of ...Enterprise Search Europe 2015:  Fishing the big data streams - the future of ...
Enterprise Search Europe 2015: Fishing the big data streams - the future of ...
 
Cool bonsai cool - an introduction to ElasticSearch
Cool bonsai cool - an introduction to ElasticSearchCool bonsai cool - an introduction to ElasticSearch
Cool bonsai cool - an introduction to ElasticSearch
 
Neural Architectures for Named Entity Recognition
Neural Architectures for Named Entity RecognitionNeural Architectures for Named Entity Recognition
Neural Architectures for Named Entity Recognition
 
A Survey of Elasticsearch Usage
A Survey of Elasticsearch UsageA Survey of Elasticsearch Usage
A Survey of Elasticsearch Usage
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solr
 
Elasticsearch - Devoxx France 2012 - English version
Elasticsearch - Devoxx France 2012 - English versionElasticsearch - Devoxx France 2012 - English version
Elasticsearch - Devoxx France 2012 - English version
 
NoSQL Databases, Not just a Buzzword
NoSQL Databases, Not just a Buzzword NoSQL Databases, Not just a Buzzword
NoSQL Databases, Not just a Buzzword
 
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
 

Similar a Stardog Linked Data Catalog

SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 CareerBuilder.com
 
AI from your data lake: Using Solr for analytics
AI from your data lake: Using Solr for analyticsAI from your data lake: Using Solr for analytics
AI from your data lake: Using Solr for analyticsDataWorks Summit
 
SDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whyKorea Sdec
 
"Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications""Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications"Pinar Alper
 
Instant search - A hands-on tutorial
Instant search  - A hands-on tutorialInstant search  - A hands-on tutorial
Instant search - A hands-on tutorialGanesh Venkataraman
 
Cloudera Federal Forum 2014: Tracking Provenance in Hadoop Clusters
Cloudera Federal Forum 2014: Tracking Provenance in Hadoop ClustersCloudera Federal Forum 2014: Tracking Provenance in Hadoop Clusters
Cloudera Federal Forum 2014: Tracking Provenance in Hadoop ClustersCloudera, Inc.
 
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ PanoraysQuick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ PanoraysDemi Ben-Ari
 
Are Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
Are Linked Datasets fit for Open-domain Question Answering? A Quality AssessmentAre Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
Are Linked Datasets fit for Open-domain Question Answering? A Quality AssessmentHarsh Thakkar
 
Data Discovery and Metadata
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadatamarkgrover
 
Beyond Kaggle: Solving Data Science Challenges at Scale
Beyond Kaggle: Solving Data Science Challenges at ScaleBeyond Kaggle: Solving Data Science Challenges at Scale
Beyond Kaggle: Solving Data Science Challenges at ScaleTuri, Inc.
 
Reflected intelligence evolving self-learning data systems
Reflected intelligence  evolving self-learning data systemsReflected intelligence  evolving self-learning data systems
Reflected intelligence evolving self-learning data systemsTrey Grainger
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartMukesh Singh
 
Amundsen: From discovering to security data
Amundsen: From discovering to security dataAmundsen: From discovering to security data
Amundsen: From discovering to security datamarkgrover
 
Building a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineBuilding a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineTrey Grainger
 
API Design & Security in django
API Design & Security in djangoAPI Design & Security in django
API Design & Security in djangoTareque Hossain
 
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017MLconf
 
Recommendation engines
Recommendation enginesRecommendation engines
Recommendation enginesGeorgian Micsa
 
Democratizing Data within your organization - Data Discovery
Democratizing Data within your organization - Data DiscoveryDemocratizing Data within your organization - Data Discovery
Democratizing Data within your organization - Data DiscoveryMark Grover
 
A Framework for Dynamic Data Source Identification and Orchestration on the Web
A Framework for Dynamic Data Source Identification and Orchestration on the WebA Framework for Dynamic Data Source Identification and Orchestration on the Web
A Framework for Dynamic Data Source Identification and Orchestration on the Webmashups
 

Similar a Stardog Linked Data Catalog (20)

SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018
 
AI from your data lake: Using Solr for analytics
AI from your data lake: Using Solr for analyticsAI from your data lake: Using Solr for analytics
AI from your data lake: Using Solr for analytics
 
SDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the why
 
"Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications""Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications"
 
Instant search - A hands-on tutorial
Instant search  - A hands-on tutorialInstant search  - A hands-on tutorial
Instant search - A hands-on tutorial
 
Cloudera Federal Forum 2014: Tracking Provenance in Hadoop Clusters
Cloudera Federal Forum 2014: Tracking Provenance in Hadoop ClustersCloudera Federal Forum 2014: Tracking Provenance in Hadoop Clusters
Cloudera Federal Forum 2014: Tracking Provenance in Hadoop Clusters
 
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ PanoraysQuick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
 
Are Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
Are Linked Datasets fit for Open-domain Question Answering? A Quality AssessmentAre Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
Are Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
 
Data Discovery and Metadata
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadata
 
Beyond Kaggle: Solving Data Science Challenges at Scale
Beyond Kaggle: Solving Data Science Challenges at ScaleBeyond Kaggle: Solving Data Science Challenges at Scale
Beyond Kaggle: Solving Data Science Challenges at Scale
 
Reflected intelligence evolving self-learning data systems
Reflected intelligence  evolving self-learning data systemsReflected intelligence  evolving self-learning data systems
Reflected intelligence evolving self-learning data systems
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @Lendingkart
 
Amundsen: From discovering to security data
Amundsen: From discovering to security dataAmundsen: From discovering to security data
Amundsen: From discovering to security data
 
Building a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineBuilding a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engine
 
API Design & Security in django
API Design & Security in djangoAPI Design & Security in django
API Design & Security in django
 
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017
 
Recommendation engines
Recommendation enginesRecommendation engines
Recommendation engines
 
Sebastian Hellmann
Sebastian HellmannSebastian Hellmann
Sebastian Hellmann
 
Democratizing Data within your organization - Data Discovery
Democratizing Data within your organization - Data DiscoveryDemocratizing Data within your organization - Data Discovery
Democratizing Data within your organization - Data Discovery
 
A Framework for Dynamic Data Source Identification and Orchestration on the Web
A Framework for Dynamic Data Source Identification and Orchestration on the WebA Framework for Dynamic Data Source Identification and Orchestration on the Web
A Framework for Dynamic Data Source Identification and Orchestration on the Web
 

Más de Clark & Parsia LLC

Stardog 1.1: Easier, Smarter, Faster RDF Database
Stardog 1.1: Easier, Smarter, Faster RDF DatabaseStardog 1.1: Easier, Smarter, Faster RDF Database
Stardog 1.1: Easier, Smarter, Faster RDF DatabaseClark & Parsia LLC
 
Validating Linked Data with OWL
Validating Linked Data with OWLValidating Linked Data with OWL
Validating Linked Data with OWLClark & Parsia LLC
 
PelletServer: REST and Semantic Technologies
PelletServer: REST and Semantic TechnologiesPelletServer: REST and Semantic Technologies
PelletServer: REST and Semantic TechnologiesClark & Parsia LLC
 
PelletDb: Scalable Reasoning for Enterprise Semantics
PelletDb: Scalable Reasoning for Enterprise SemanticsPelletDb: Scalable Reasoning for Enterprise Semantics
PelletDb: Scalable Reasoning for Enterprise SemanticsClark & Parsia LLC
 
Automated Planning as a Semantic Technology
Automated Planning as a Semantic TechnologyAutomated Planning as a Semantic Technology
Automated Planning as a Semantic TechnologyClark & Parsia LLC
 
SemTech 2010: Pelorus Platform
SemTech 2010: Pelorus PlatformSemTech 2010: Pelorus Platform
SemTech 2010: Pelorus PlatformClark & Parsia LLC
 

Más de Clark & Parsia LLC (9)

Stardog 1.1: Easier, Smarter, Faster RDF Database
Stardog 1.1: Easier, Smarter, Faster RDF DatabaseStardog 1.1: Easier, Smarter, Faster RDF Database
Stardog 1.1: Easier, Smarter, Faster RDF Database
 
Stardog talk-dc-march-17
Stardog talk-dc-march-17Stardog talk-dc-march-17
Stardog talk-dc-march-17
 
Validating Linked Data with OWL
Validating Linked Data with OWLValidating Linked Data with OWL
Validating Linked Data with OWL
 
Terp: An OWL-friendly SPARQL
Terp: An OWL-friendly SPARQLTerp: An OWL-friendly SPARQL
Terp: An OWL-friendly SPARQL
 
PelletServer: REST and Semantic Technologies
PelletServer: REST and Semantic TechnologiesPelletServer: REST and Semantic Technologies
PelletServer: REST and Semantic Technologies
 
PelletDb: Scalable Reasoning for Enterprise Semantics
PelletDb: Scalable Reasoning for Enterprise SemanticsPelletDb: Scalable Reasoning for Enterprise Semantics
PelletDb: Scalable Reasoning for Enterprise Semantics
 
Automated Planning as a Semantic Technology
Automated Planning as a Semantic TechnologyAutomated Planning as a Semantic Technology
Automated Planning as a Semantic Technology
 
Empire: JPA for RDF & SPARQL
Empire: JPA for RDF & SPARQLEmpire: JPA for RDF & SPARQL
Empire: JPA for RDF & SPARQL
 
SemTech 2010: Pelorus Platform
SemTech 2010: Pelorus PlatformSemTech 2010: Pelorus Platform
SemTech 2010: Pelorus Platform
 

Último

Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 

Último (20)

Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 

Stardog Linked Data Catalog

  • 1. Stardog Linked Data Catalog Héctor Pérez-Urbina Edgar Rodríguez-Díaz Clark & Parsia, LLC {hector, edgar}@clarkparsia.com
  • 2. Who are we? ● Clark & Parsia is a semantic software startup ● HQ in Washington, DC & office in Boston ● Provides software development and integration services ● Specializing in Semantic Web, web services, and advanced AI technologies for federal and enterprise customers http://clarkparsia.com/ Twitter: @candp
  • 3. What's SLDC? ● Stardog Linked Data Catalog ● A catalog of data sources ○ Semi structured ○ Relational ○ Object-oriented ○ ... ● Provides a coherent view over existing data repositories so that users and/or applications can easily find them and query them
  • 4. Use Cases ● Sources ○ Management, import, subscription, categorization, sharing ● Query ○ Management, sharing, results export ○ Querying ■ Metadata, external sources, integration ● Locating sources ○ Search, browse ● NLP/AI ○ Entity extraction, graph algorithms, clustering analysis
  • 5. Application layer Middleware layer NLP/AI analytics layer Data layer
  • 7. Semantic Technologies ● W3C standards ○ RDF(S), OWL, SPARQL ● Lower operational costs and raise productivity ○ Cooperation without coordination ○ Appropriate abstractions ○ Declarative is better than imperative ○ Correctness when it matters; sloppiness when it doesn’t
  • 8. Data Model ● Similar to DCAT from W3C ○ Catalog entries ● Enhanced with ○ SSD ○ VoID datasets ○ SKOS background models ○ Axioms & rules
  • 9. Modeling the Domain ● Use of axioms to model relationships between classes ○ :Query subClassOf : Resource ○ :Entry subClassOf : Resource ● Retrieve the resources user :u can see ○ SELECT ?resource WHERE { ?resource type :Resource . }
  • 10. Security ● Authentication ○ Shiro-Based implementation ○ Extensible to LDAP and/or AD ● Authorization ○ Eat-your-own-food approach ○ Reasoning-Based ○ Use of axioms & rules
  • 11. Deriving Permissions ● Users have permission roles ● Permission roles have permission relations with resources
  • 12. Deriving Permissions ● If a user has a permission role containing a read permission associated to a resource, then the user has the same permission over the resource :permissionRole(?user,?role), :readPermission(?role,?resource) -> :readUserPermission(?user,?resource) ● Everybody has read access to public resources :User(?user), :PublicResource(?resource) -> :readUserPermission(?user,?resource)
  • 13. Deriving Permissions ● User :user1 has delete permissions over any source ○ :deleteUserPermission(?user,:anySource), :DataSource(?source) -> :deleteUserPermission(?user,?source) ○ :user1 :deleteUserPermission :anySource ● Everybody has all permissions to the resources they created ○ :resourceCreator(?user,?resource) -> :allUserPermissions(?user,?resource) ○ :allUserPermissions(?user,?resource) -> :readUserPermission(?user,?resource) ○ ...
  • 14. Impact of Reasoning Can user :user1 delete resource :source1? ASK WHERE { { :user1 :deleteUserPermission :source1 . } UNION { :user1 :permissionRole ?role . ?role :deletePermission :source1 . } UNION { :user1 :resourceCreator :source1 . } UNION { :user1 :deleteUserPermission :anyResource . } UNION { :user1 :allUserPermissions :source1 . } UNION { ... } UNION ...
  • 15. Impact of Reasoning ● Are you sure you're not missing anything? ● New awesome way of getting delete permissions you came up with yesterday ● Model knowledge where it belongs and let the reasoner do the work for you: ASK WHERE { { :user1 :deleteUserPermission :source1 . } }
  • 16. Too much Inference? When I say :deleteUserPermission domain :User :deleteUserPermission range :Resource I mean that for every triple :user1 :deleteUserPermission :resource1 the individual :user1 must be an instance of : User and :resource1 of :Resource. But the reasoner doesn't find the error!!
  • 17. Typing Constraint Only users can have delete user permissions ● :deleteUserPermission domain :User ● :user1 :deleteUserPermission :resource1
  • 18. Typing Constraint Only users can have delete user permissions ● :deleteUserPermission domain :User ● :user1 :deleteUserPermission :resource1 OWA CWA Consistent true false Infer that Assume that Reason :user1 type :User :user1 type not :User
  • 19. CWA or OWA? ● Which one? ○ Of course use both! ● Some axioms should be interpreted under CWA :deleteUserPermission domain :User ● And others under OWA :SuperUser subClassOf :User ● So the right thing happens :user1 :deleteUserPermission :resource1 :user1 type :SuperUser
  • 20. SLDC for Data Integration ● SLDC provides descriptions of data sources, relationships between them, and information to query them ● We can treat data sources as an integrated single data source ○ Distributed querying ○ AI analytics ● Virtual, materialized, hybrid
  • 23. Mappings ● Simple ○ pops:Employee subClassOf foaf:Person ○ pops:Project equivalentTo foaf:Project ○ pops:hasEmployee subPropertyOf foaf:member ● SWRL-Based ○ pops:firstName(?person, ?first), pops:lastName(?person, ?last), swrlb:concat(?name, ?first, " ", ?last) -> foaf:name(?person, ?name) ○ pops:worksOnProject(?person,?project), pops:ActiveProject(?project) -> foaf:currentProject(?person,?project)
  • 24. Summing Up ● SLDC is a linked data catalog ○ Manage a variety of sources ○ Find sources ○ Query sources ● Implemented using Semantic Technologies ○ Reasoning ■ Axioms & Rules ○ Data validation ○ Data integration
  • 26. Why? ● Large organizations ○ Disparate departments ○ Independent, isolated sources ● Where is what? ○ Do we have a data source about clients? ○ Where is it? ● Who created what? ○ Who owns it? ● Who has access to what? ○ Do I have access to it? ○ Who do I talk to to get it?
  • 27. Source Management ● Management ○ Create, delete, update, clone ● Import ○ RDF, HTML, XML ● Subscription ○ Endpoint location ● Categorization ○ Categories ○ External vocabularies ● Sharing ○ To specific users ○ Public
  • 28. Querying Sources ● Querying metadata ○ Queries about the catalog itself ● External query ○ Querying a particular source ● Integrated query ○ Querying a set of integrated sources ● Query management ● Query sharing ● Results export
  • 29. Finding Sources ● Browse ○ Facets ○ Pelorus ● Search ○ Text-based search ○ Rich query language
  • 30. Last but not least ● NLP processing ○ Entity/Event extraction from natural language source descriptions ○ Better source classification & search ● Graph algorithms ○ What's the shortest path between these resources? ● Clustering ○ Can we discover similar sources based on a given criteria?
  • 31. Axioms ● It's not always about simple taxonomies... ● What about domain/range axioms? ○ :someProperty domain :SomeClass ○ :a :someProperty :b ○ :SomeClass(x)? ● What about complex subclass chains? ○ :SomeClass subClassOf :someProperty some :OtherClass ○ :someProperty some :OtherClass subClassOf :AnotherClass ○ :a type :SomeClass ○ :AnotherClass(x)? ● What about cardinality constraints, universal quantification, datatype reasoning, ...?
  • 32. Data Validation ● Fundamental data management problem ○ Verify data integrity and correctness ○ Data corruption can lead to failures in applications, errors in decision making, security vulnerabilities, etc. ● Relevant in many scenarios ○ Storing data for stand-alone applications ○ Exchanging data in distributed settings ● For some use cases, data validation is critical but we still want to do it intelligently
  • 33. Participation Constraint Each resource must have been created by a user ● :Resource subClassOf inv(resourceCreator) some :User ● :resource1 type :Resource OWA CWA Consistent true false Infer that Assume that ● _:b : _:b :resourceCreator : Reason resourceCreator : resource1 resource1 is false ● _:b type :Resource
  • 34. Uniqueness Constraint Each data source must belong to at most one catalog entry ● :dataSource inverseFunctional ● :entry1 :dataSource :dataSource1 ● :entry2 :dataSource :dataSource1 OWA CWA Consistent true false Assume that Infer that Reason :entry1 sameAs :entry2 :entry1 sameAs :entry2 is false