1. Semantic Automated Discovery and Integration
SADI Services Tutorial
Mark Wilkinson
Isaac Peral Senior Researcher in Biological Informatics
Centro de Biotecnología y Genómica de Plantas, UPM, Madrid, Spain
Adjunct Professor of Medical Genetics, University of British Columbia
Vancouver, BC, Canada.
3. A lot of important information cannot be represented
on the Semantic Web
For example, all of the data that results from
analytical algorithms and statistical analyses
(I’m purposely excluding databases from the list of examples
for reasons I will discuss in a moment)
8. Traditional definitions of The Deep Web
include databases that have Web FORM interfaces.
HOWEVER
The Life Science Semantic Web community
is encouraging the establishment of SPARQL endpoints
as the way to serve that same data to the world
(i.e. NOT through Web Services)
11. “We need to commit specific hardware for
that [mySQL] service. We don’t use the
same servers for mySQL as for the
Website...”
“...we resolve the situation by asking the
user to stop hammering the server. This
might involve temporary ban on the IP...”
- ENSEMBL Helpdesk
12. So... There appears to be good reasons
why most data providers do not expose
their databases for public query!
16. Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes
To: Mark <markw@illuminae.com>
Date: Tue, 19 Feb 2013 13:11:22 +0100
A message posted to the Bio2RDF
Subject: SPARQL or not?
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
mailing list last week from Jerven
From: "Mark Wilkinson" <markw@illuminae.com>
Message-ID: <op.wsq5g8jenbznux@bioinformatica-mark>
User-Agent: Opera Mail/12.14 (Linux)
Bolleman, one of the team-members
X-Antivirus: AVG for E-mail 2012.0.2238 [2639/5614]
X-AVG-ID: ID798D8A94-2992BC71
behind UniProt’s push for RDF...
Hi Bio2RDF maintainers,
I keep on noticing this rather expensive query.
CONSTRUCT
{ <http://bio2rdf.org/search/Paget> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://bio2rdf.org/bio2rdf_resource:SearchResults> .
<http://bio2rdf.org/search/Paget> <http://bio2rdf.org/bio2rdf_resource:hasSearchResult> ?s .
<http://bio2rdf.org/search/Paget> <http://www.w3.org/2000/01/rdf-schema#seeAlso> ?s .
?s <http://www.w3.org/2000/01/rdf-schema#label> ?label .
?s <http://purl.org/dc/elements/1.1/title> ?title .
?s <http://purl.org/dc/terms/title> ?dctermstitle .
?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type .
?s <http://www.w3.org/2004/02/skos/core#prefLabel> ?skoslabel .
?s ?p ?o .}
WHERE
{ ?s ?p ?o
FILTER contains(str(?o), ""Paget"")
OPTIONAL
{ ?s <http://www.w3.org/2000/01/rdf-schema#label> ?label }
OPTIONAL
{ ?s <http://purl.org/dc/elements/1.1/title> ?title }
OPTIONAL
{ ?s <http://purl.org/dc/terms/title> ?dctermstitle }
OPTIONAL
{ ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type }
OPTIONAL
{ ?s <http://www.w3.org/2004/02/skos/core#prefLabel> ?skoslabel }
}
OFFSET 0
LIMIT 500
It comes from the example queries on the bio2rdf landing page.
Its extremely resource consuming and totally useless as it will never ever run in time.
Can you please change this query to something useful and workable. And at least cache the results if you ever get them.
Regards,
Jerven
17. Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes
To: Mark <markw@illuminae.com>
Date: Tue, 19 Feb 2013 13:11:22 +0100
Subject: SPARQL or not?
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
From: "Mark Wilkinson" <markw@illuminae.com>
Message-ID: <op.wsq5g8jenbznux@bioinformatica-mark>
User-Agent: Opera Mail/12.14 (Linux)
X-Antivirus: AVG for E-mail 2012.0.2238 [2639/5614]
X-AVG-ID: ID798D8A94-2992BC71
I keep noticing this
Hi Bio2RDF maintainers,
I keep on noticing this rather expensive query. rather expensive query
CONSTRUCT
{ <http://bio2rdf.org/search/Paget> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://bio2rdf.org/bio2rdf_resource:SearchResults> .
<http://bio2rdf.org/search/Paget> <http://bio2rdf.org/bio2rdf_resource:hasSearchResult> ?s .
<http://bio2rdf.org/search/Paget> <http://www.w3.org/2000/01/rdf-schema#seeAlso> ?s .
?s <http://www.w3.org/2000/01/rdf-schema#label> ?label .
?s <http://purl.org/dc/elements/1.1/title> ?title .
?s <http://purl.org/dc/terms/title> ?dctermstitle .
?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type .
?s <http://www.w3.org/2004/02/skos/core#prefLabel> ?skoslabel .
?s ?p ?o .}
WHERE
{ ?s ?p ?o
FILTER contains(str(?o), ""Paget"")
OPTIONAL
{ ?s <http://www.w3.org/2000/01/rdf-schema#label> ?label }
OPTIONAL
{ ?s <http://purl.org/dc/elements/1.1/title> ?title }
OPTIONAL
{ ?s <http://purl.org/dc/terms/title> ?dctermstitle }
OPTIONAL
{ ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type }
OPTIONAL
{ ?s <http://www.w3.org/2004/02/skos/core#prefLabel> ?skoslabel }
}
OFFSET 0
LIMIT 500
It comes from the example queries on the bio2rdf landing page.
Its extremely resource consuming and totally useless as it will never ever run in time.
Can you please change this query to something useful and workable. And at least cache the results if you ever get them.
Regards,
Jerven
18. Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes
To: Mark <markw@illuminae.com>
Date: Tue, 19 Feb 2013 13:11:22 +0100
Subject: SPARQL or not?
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
From: "Mark Wilkinson" <markw@illuminae.com>
Message-ID: <op.wsq5g8jenbznux@bioinformatica-mark>
User-Agent: Opera Mail/12.14 (Linux)
X-Antivirus: AVG for E-mail 2012.0.2238 [2639/5614]
X-AVG-ID: ID798D8A94-2992BC71
Hi Bio2RDF maintainers,
I keep on noticing this rather expensive query.
CONSTRUCT
{ <http://bio2rdf.org/search/Paget> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://bio2rdf.org/bio2rdf_resource:SearchResults> .
<http://bio2rdf.org/search/Paget> <http://bio2rdf.org/bio2rdf_resource:hasSearchResult> ?s .
<http://bio2rdf.org/search/Paget> <http://www.w3.org/2000/01/rdf-schema#seeAlso> ?s .
?s <http://www.w3.org/2000/01/rdf-schema#label> ?label .
?s <http://purl.org/dc/elements/1.1/title> ?title .
?s <http://purl.org/dc/terms/title> ?dctermstitle .
?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type .
?s <http://www.w3.org/2004/02/skos/core#prefLabel> ?skoslabel .
?s ?p ?o .}
WHERE
{ ?s ?p ?o
FILTER contains(str(?o), ""Paget"")
OPTIONAL
It comes from THE EXAMPLE QUERIES
{ ?s <http://www.w3.org/2000/01/rdf-schema#label> ?label }
OPTIONAL on the Bio2RDF landing page
{ ?s <http://purl.org/dc/elements/1.1/title> ?title }
OPTIONAL
{ ?s <http://purl.org/dc/terms/title> ?dctermstitle }
OPTIONAL
{ ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type }
(my emphasis added)
OPTIONAL
{ ?s <http://www.w3.org/2004/02/skos/core#prefLabel> ?skoslabel }
}
OFFSET 0
LIMIT 500
It comes from the example queries on the bio2rdf landing page.
Its extremely resource consuming and totally useless as it will never ever run in time.
Can you please change this query to something useful and workable. And at least cache the results if you ever get them.
Regards,
Jerven
19. Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes
To: Mark <markw@illuminae.com>
Date: Tue, 19 Feb 2013 13:11:22 +0100
Subject: SPARQL or not?
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
From: "Mark Wilkinson" <markw@illuminae.com>
Message-ID: <op.wsq5g8jenbznux@bioinformatica-mark>
User-Agent: Opera Mail/12.14 (Linux)
X-Antivirus: AVG for E-mail 2012.0.2238 [2639/5614]
X-AVG-ID: ID798D8A94-2992BC71
Hi Bio2RDF maintainers,
I keep on noticing this rather expensive query.
CONSTRUCT
{ <http://bio2rdf.org/search/Paget> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://bio2rdf.org/bio2rdf_resource:SearchResults> .
<http://bio2rdf.org/search/Paget> <http://bio2rdf.org/bio2rdf_resource:hasSearchResult> ?s .
<http://bio2rdf.org/search/Paget> <http://www.w3.org/2000/01/rdf-schema#seeAlso> ?s .
?s <http://www.w3.org/2000/01/rdf-schema#label> ?label .
?s <http://purl.org/dc/elements/1.1/title> ?title .
?s <http://purl.org/dc/terms/title> ?dctermstitle .
?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type .
?s <http://www.w3.org/2004/02/skos/core#prefLabel> ?skoslabel .
?s ?p ?o .}
WHERE
{ ?s ?p ?o
FILTER contains(str(?o), ""Paget"")
OPTIONAL
{ ?s <http://www.w3.org/2000/01/rdf-schema#label> ?label }
OPTIONAL
{ ?s <http://purl.org/dc/elements/1.1/title> ?title }
OPTIONAL
It’s extremely resource-
{ ?s <http://purl.org/dc/terms/title> ?dctermstitle }
OPTIONAL
{ ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type }
consuming and totally useless as
}
OPTIONAL
{ ?s <http://www.w3.org/2004/02/skos/core#prefLabel> ?skoslabel } it will never run in time
OFFSET 0
LIMIT 500
It comes from the example queries on the bio2rdf landing page.
Its extremely resource consuming and totally useless as it will never ever run in time.
Can you please change this query to something useful and workable. And at least cache the results if you ever get them.
Regards,
Jerven
20. So even people who are world-leaders in RDF and SPARQL
write “expensive” and “useless” queries
that (already!) are making life difficult for
SPARQL endpoint providers
I believe that situation will only get worse
as more people begin to use the Semantic Web
and as SPARQL itself becomes richer and more SQL-like
21. In My Opinion
History tells us, and this story IMO supports,
that SPARQL endpoints might not be widely adopted
by source bioinformatics data providers
Historically, the majority of bioinformatics data hosts
have opted for API/Service-based
access to their resources
22. In My Opinion
Moreover, I am still obsessed with interoperability!
Having a unified way to discover, and access,
bioinformatics resources
whether they be databases or algorithms
just seems like a Good Thing™
23. In My Opinion
So we need to find a way to make Web Services
play nicely with the Semantic Web
28. causally related with
http://semanticscience.org/resource/SIO_000243
SIO_000243:
<owl:ObjectProperty rdf:about="&resource;SIO_000243">
<rdfs:label xml: lang="en"> is causally related with</rdfs:label>
<rdf:type rdf:resource="&owl;SymmetricProperty"/>
<rdf:type rdf:resource="&owl;TransitiveProperty"/>
<dc:description xml:lang="en"> A transitive, symmetric, temporal relation
in which one entity is causally related with another non-identical entity.
</dc:description>
<rdfs:subPropertyOf rdf:resource="&resource;SIO_000322"/>
</owl:ObjectProperty>
29. causally related with
http://semanticscience.org/resource/SIO_000243
SIO_000243:
<owl:ObjectProperty rdf:about="&resource;SIO_000243">
<rdfs:label xml: lang="en"> is causally related with</rdfs:label>
<rdf:type rdf:resource="&owl;SymmetricProperty"/>
<rdf:type rdf:resource="&owl;TransitiveProperty"/>
<dc:description xml:lang="en"> A transitive, symmetric, temporal relation
in which one entity is causally related with another non-identical entity.
</dc:description>
<rdfs:subPropertyOf rdf:resource="&resource;SIO_000322"/>
</owl:ObjectProperty>
30. There are many suggestions for how to bring the Deep Web
into the Semantic Web using Semantic Web Services (SWS)
OWL-S
SAWSDL
WSDL-S
Others...
31. There are many suggestions for how to bring the Deep Web
into the Semantic Web using Semantic Web Services (SWS)
Describe input data
Describe output data
Describe how the system manipulates the data
Describe how the world changes as a result
32. There are many suggestions for how to bring the Deep Web
into the Semantic Web using Semantic Web Services (SWS)
Describe input data Usually through
“semantic annotation”
Describe output data of XML Schema
Describe how the system manipulates the data
Describe how the world changes as a result
33. There are many suggestions for how to bring the Deep Web
into the Semantic Web using Semantic Web Services (SWS)
Describe input data In the least-semantic
case, the input and
output data is “vanilla”
Describe output data XML
Describe how the system manipulates the data
Describe how the world changes as a result
34. There are many suggestions for how to bring the Deep Web
into the Semantic Web using Semantic Web Services (SWS)
Describe input data In the “most semantic”
case (WSDL) RDF is
converted into XML,
Describe output data then back to RDF again
Describe how the system manipulates the data
Describe how the world changes as a result
35. There are many suggestions for how to bring the Deep Web
into the Semantic Web using Semantic Web Services (SWS)
Describe input data The rigidity of XML
Schema is the
antithesis of the
Describe output data Semantic Web!
Describe how the system manipulates the data
Describe how the world changes as a result
36. There are many suggestions for how to bring the Deep Web
into the Semantic Web using Semantic Web Services (SWS)
Describe input data So... Perhaps we
shouldn’t be using XML
Describe output data Schema at all...??
Describe how the system manipulates the data
Describe how the world changes as a result
37. There are many suggestions for how to bring the Deep Web
into the Semantic Web using Semantic Web Services (SWS)
Describe input data
Describe output data
Describe how the system manipulates the data
HARD!
Describe how the world changes as a result
38. There are many suggestions for how to bring the Deep Web
into the Semantic Web using Semantic Web Services (SWS)
Describe input data
Describe output data
Describe how the system manipulates the data
Un-necessary?
Describe how the world changes as a result
40. Scientific Web Services
are DIFFERENT!
Lord, Phillip, et al. The Semantic Web–ISWC 2004 (2004): 350-364.
41. “The service interfaces within bioinformatics are relatively
simple. An extensible or constrained interoperability framework
is likely to suffice for current demands: a fully generic
framework is currently not necessary.”
Lord, Phillip, et al. The Semantic Web–ISWC 2004 (2004): 350-364.
42. Scientific Web Services are DIFFERENT
They’re simpler!
Rather than waiting for a solution to the more general problem
(which may be years away... or more!)
can we solve the Semantic Web Service problem
within the scientific domain
while still being fully standards-compliant?
44. v.v. being Semantic Webby,
what is missing from this list?
Describe input data
Describe output data
Describe how the system manipulates the data
Describe how the world changes as a result
46. causally related with
http://semanticscience.org/resource/SIO_000243
The Semantic Web works because of relationships!
47. causally related with
http://semanticscience.org/resource/SIO_000243
The Semantic Web works because of relationships!
In 2008 I proposed that, in the Semantic Web world,
algorithms should be viewed as “exposing” relationships
between the input and output data
49. SADI AACTCTTCGTAGTG...
has_seq_string
sequence
AACTCTTCGTAGTG...
has_seq_string has
homology
sequence
to
BLAST
Terminal Flower
type species
SADI requires you to explicitly declare
as part of your analytical output, gene A. thal.
the biological relationship that your
algorithm “exposed”.
50. Another “philosophical” decision was
to abandon XML Schema
In a world that is moving towards
RDF representations of all data
it makes no sense to convert semantically rich RDF
into semantic-free Schema-based XML
then back into RDF again
51. The final philosophical decision was
to abandon SOAP
The bioinformatics community seems to be
very receptive to pure-HTTP interfaces
(e.g. the popularity of REST-like APIs)
So SADI uses simple HTTP POST
of just the RDF input data
(no message scaffold whatsoever)
54. ID Name Height Weight Age
24601 Jean Valjean 1.8m 84kg 45
7474505B Jake Blues 1.73m 101kg 31
6 — 1.88m 75kg 39
... ... ... ... ...
55. ID Name Height Weight Age
24601 Jean Valjean 1.8m 84kg 45
7474505B Jake Blues 1.73m 101kg 31
6 — 1.88m 75kg 39
... ... ... ... ...
56. OWL-DL Classes
ID Name Height Weight Age
24601 Jean Valjean 1.8m 84kg 45
7474505B Jake Blues 1.73m 101kg 31
6 — 1.88m 75kg 39
... ... ... ... ...
57. Property restrictions
in OWL Class definition
ID Name Height Weight Age
24601 Jean Valjean 1.8m 84kg 45
7474505B Jake Blues 1.73m 101kg 31
6 — 1.88m 75kg 39
... ... ... ... ...
58. ID Name Height Weight Age
24601 Jean Valjean 1.8m 84kg 45
7474505B Jake Blues 1.73m 101kg 31
6 — 1.88m 75kg 39
... ... ... ... ...
59. A reasoner determines that Patient #24601
is an OWL Individual of the Input service Class
ID Name Height Weight Age
24601 Jean Valjean 1.8m 84kg 45
7474505B Jake Blues 1.73m 101kg 31
6 — 1.88m 75kg 39
... ... ... ... ...
60. NOTE THE URI OF THE INPUT INDIVIDUAL
Patient:24601
ID Name Height Weight Age
24601 Jean Valjean 1.8m 84kg 45
7474505B Jake Blues 1.73m 101kg 31
6 — 1.88m 75kg 39
... ... ... ... ...
61. ID Name Height Weight Age BMI
24601 Jean Valjean 1.8m 84kg 45 25.9
7474505B Jake Blues 1.73m 101kg 31
6 — 1.88m 75kg 39
... ... ... ... ...
62. NOTE THE URI OF THE OUTPUT INDIVIDUAL
Patient:24601
ID Name Height Weight Age BMI
24601 Jean Valjean 1.8m 84kg 45 25.9
7474505B Jake Blues 1.73m 101kg 31
6 — 1.88m 75kg 39
... ... ... ... ...
63.
64. The URI of the input is linked by a
meaningful predicate to the output
(either literal output or another URI)
65. Therefore, by connecting SADI services
together in a workflow you end-up with an
unbroken chain of Linked Data
68. The SHARE registry
indexes all of the input/output/relationship
triples that can be generated by all known services
This is how SHARE discovers services
69. We wanted to duplicate
a real, peer-reviewed, bioinformatics analysis
simply by building a model in the Web
describing what the answer
(if one existed)
would look like
72. Gordon, P.M.K., Soliman, M.A., Bose, P., Trinh, Q., Sensen, C.W., Riabowol, K.: Interspecies
data mining to predict novel ING-protein interactions in human. BMC genomics. 9, 426 (2008).
73. Original Study Simplified
Using what is known about interactions in fly & yeast
predict new interactions with your
protein of interest
74. “Pseudo-code” Abstracted Workflow
Given a protein P in Species X
Find proteins similar to P in Species Y
Retrieve interactors in Species Y
Sequence-compare Y-interactors with Species X genome
(1) Keep only those with homologue in X
Find proteins similar to P in Species Z
Retrieve interactors in Species Z
Sequence-compare Z-interactors with (1)
Putative interactors in Species X
76. Modeling the science...
ProbableInteractor
is homologous to (
Potential Interactor from ModelOrganism1…)
and
Potential Interactor from ModelOrganism2…)
Probable Interactor is defined in OWL as a subClass - something that appears
as a potential interactor in both comparator model organisms.
77. Running the Web Science Experiment
In a local data-file
provide the protein we are interested in
and the two species we wish to use in our comparison
taxon:9606 a i:OrganismOfInterest . # human
uniprot:Q9UK53 a i:ProteinOfInterest . # ING1
taxon:4932 a i:ModelOrganism1 . # yeast
taxon:7227 a i:ModelOrganism2 . # fly
78. The tricky bit is...
In the abstract, the search
for homology is “generic” –
ANY Protein, ANY model
system
But when the machine does
the experiment, it will need
to use (at least) two
organism-specific resources
because the answer requires
information from two
taxon:4932 a i:ModelOrganism1 . # yeast
declared species taxon:7227 a i:ModelOrganism2 . # fly
79. This is the question we ask:
(the query language here is SPARQL)
PREFIX i: <http://sadiframework.org/ontologies/InteractingProteins.owl#>
SELECT ?protein
FROM <file:/local/workflow.input.n3>
WHERE {
?protein a i:ProbableInteractor .
}
The URL of our OWL model (ontology) defining Probable Interactors
80. Each relationship (property-restriction)
in the OWL Class is then matched
with a SADI Service
The matched SADI Service can
generate data that fulfils that
property restriction
(i.e. produces triples with that S/P/O pattern)
81. SHARE chains these SADI services
into an analytical workflow...
...the outputs from that workflow are
Instances (OWL Individuals) of
Probable Interactors
82. SHARE derived (and executed) the following workflow automatically
These are different
SADI Web Services...
...selected at run-time
based on the same model
83.
84. Keys to Success:
1: Use standards
2: Focus on predicates, not classes
3: Use these predicates to define, rather than assert, classes
4: Make sure all URIs resolve, and resolve to something useful
5: Never leave the RDF world... (abandon vanilla XML,
even for Web Services!)
6: Use reasoners... Everywhere... Always!
90. Taverna
• Contextual service discovery
• Automatic RDF serialization and
deserialization beetween SADI and non-SADI
services
• Note that Taverna is not as rich a client as
SHARE. The reason is that SHARE will
aggregate and re-reason after every service
invocation. There is no (automatic) data
aggregation in Taverna.
91. Using SADI services – building a workflow
The next step in the workflow is to find a SADI service that takes the
genes from getKEGGGenesByPathway and returns the proteins
that those genes code for.
92. Using SADI services – building a workflow
Right-click on the service output port and click Find services that
consume KEGG_Record…
93. Using SADI services – building a workflow
Select getUniprotByKeggGene from the list of SADI services and
click Connect.
94. Using SADI services – building a workflow
The getUniprotByKeggGene service is added to the workflow and
automatically connected to the output from
getKEGGGenesByPathway.
95. Using SADI services – building a workflow
Add a new workflow output called protein and connect the output
from the getUniprotByKeggGene service to it.
96. Using SADI services – building a workflow
The next step in the workflow is to find a SADI service that takes the
proteins and returns sequences of those proteins. Right-click on the
encodes output port and click Find services that consume
UniProt_Record…
97. Using SADI services – building a workflow
The UniProt info service attaches the property hasSequence so
select this service and click Connect.
98. Using SADI services – building a workflow
The UniProt info service is added to the workflow and automatically
connected to the output from getUniprotByKeggGene .
99. Using SADI services – building a workflow
Add a new workflow output called sequence and connect the output
from the hasSequence output from the UniProt info service to it.
100. Using SADI services – building a workflow
The KEGG pathway were interested in is "hsa00232”, so we’ll add it as
a constant value. Right-click on the KEGG_PATHWAY_Record
input port and click Constant value.
101. Using SADI services – building a workflow
Enter the value hsa00232 and click OK.
102. Using SADI services – building a workflow
The workflow is now complete and ready to run.
103. IO Informatics Knowledge Explorer plug-in
• “Bootstrapping” of semantics using known
URI schema (identifiers.org, LSRN, Bio2RDF,
etc.)
• Contextual service discovery
• Automatic packaging of appropriate data
from your data-store and automated service
invocation using that data.
•This uses some not-widely-known services and
metadata that is in the SHARE registry!!
104. The SADI plug-in to the
IO Informatics’
Knowledge Explorer
...a quick explanation of how
we “boot-strap” semantics...
106. Sentient Knowledge Explorer is a retrieval, integration,
visualization, query, and exploration environment for semantically
rich data
107. Most imported data-sets will already have
properties (e.g. “encodes”)
…and the data will already be typed
(e.g. “Gene” or “Protein”)
…so finding SADI Services to consume that
data is ~trivial
112. In the case of LSRN URIs, they resolve to:
<lsrn:DragonDB_Locus_Record rdf:about="http://lsrn.org/DragonDB_Locus:CHO"
<dc:identifier>CHO</dc:identifier>
<sio:SIO_000671> <!-- has identifier -->
<lsrn:DragonDB_Locus_Identifier>
<sio:SIO_000300>CHO</sio:SIO_000300> <!-- has value -->
</lsrn:DragonDB_Locus_Identifier>
</sio:SIO_000671>
</lsrn:DragonDB_Locus_Record>
</rdf:RDF>
113. In the case of LSRN URIs, they resolve to:
<lsrn:DragonDB_Locus_Record rdf:about="http://lsrn.org/DragonDB_Locus:CHO
<dc:identifier>CHO</dc:identifier>
<sio:SIO_000671> <!-- has identifier -->
<lsrn:DragonDB_Locus_Identifier>
<sio:SIO_000300>CHO</sio:SIO_000300> <!-- has value -->
</lsrn:DragonDB_Locus_Identifier>
</sio:SIO_000671>
</lsrn:DragonDB_Locus_Record>
</rdf:RDF> The Semantic Science Integrated Ontology
(Dumontier) has a model for how to describe
database records, including explicitly making
the record identifier an attribute of that
record; in our LSRN metadata, we also
explicitly rdf:type both records and identifiers.
114. Now we have enough information to start exploring global data...
123. HTTP POST the URI to the SHARE
Resolver Service
It will (try to) return you SIO-compliant
RDF metadata about that URI
(this is a typical SADI service)
The resolver currently recognizes a few
different sharted-URI schemes
(e.g. Bio2RDF, Identifiers.org)
and can be updated with new patterns
124. Next problem:
Knowledge Explorer
and therefore the plug-in
are written in C#
All of our interfaces are
described in OWL
C# reasoners are
extremely limited at this
time
125. This problem manifests itself in two ways:
1. An individual on the KE canvas has all the
properties required by a Service in the registry, but
is not rdf:typed as that Service’s input type how
do you discover that Service so that you can add it
to the menu?
2. For a selected Service from the menu, how does the
plug-in know which data-elements it needs to
extract from KE to send to that service in order to
fulfil it’s input property-restrictions?
126. If I select a canvas node, and ask SADI to
find services, it will...
128. Nevertheless:
(a) The service can be discovered based on JUST this node selection
(b) The service can be invoked based on JUST this node selection
129. Voila!
How did the plug-in discover the service,
and determine which data was required to
access that service based on an OWL Class
definition, without a reasoner?
130. SELECT ?x, ?y
FROM knowledge_explorer_database
WHERE {
?x foaf:name ?y
}
Convert Input OWL Class def’n
into an ~equivalent SPARQL query
Service Description
INPUT OWL Class Store together
NamedIndividual: things with with index
a “name” property INDEX
from “foaf” ontology
The service Registry
provides a
OUTPUT OWL Class “greeting”
GreetedIndividual: things with property based
a “greeting” property
from “hello” ontology
on a “name”
property
131. Just to ensure that I don’t over-trivialize this point,
the REAL SPARQL query that extracts the input for this service is...
133. Summary
While the Knowledge Explorer plug-in has similar
functionality to other tools we have built for SADI, it
takes advantage of some features of the SADI Registry,
and SADI in general, that are not widely-known.
We hope that the availability of these features
encourages development of SADI tooling in other
languages that have limited access to reasoning.