SlideShare una empresa de Scribd logo
1 de 134
Semantic Automated Discovery and Integration
           SADI Services Tutorial


                     Mark Wilkinson
    Isaac Peral Senior Researcher in Biological Informatics
    Centro de Biotecnología y Genómica de Plantas, UPM, Madrid, Spain
    Adjunct Professor of Medical Genetics, University of British Columbia
    Vancouver, BC, Canada.
Part I

MOTIVATION
A lot of important information cannot be represented

                   on the Semantic Web




    For example, all of the data that results from

    analytical algorithms and statistical analyses



    (I’m purposely excluding databases from the list of examples
               for reasons I will discuss in a moment)
Varying estimates
put the size of the
Deep Web between
500 and 800 times
larger than the
surface Web
On the WWW
“automation” of
access to Deep Web
data happens through

“Web Services”
Traditional definitions of The Deep Web
 include databases that have Web FORM interfaces.

                     HOWEVER

       The Life Science Semantic Web community
is encouraging the establishment of SPARQL endpoints
    as the way to serve that same data to the world
            (i.e. NOT through Web Services)
I am quite puzzled by this...
Historically, most* bio/informatics
     databases do not allow
     direct public SQL access

                  *yes, I know there are some exceptions!
“We need to commit specific hardware for
that [mySQL] service. We don’t use the
same servers for mySQL as for the
Website...”

“...we resolve the situation by asking the
user to stop hammering the server. This
might involve temporary ban on the IP...”


                         - ENSEMBL Helpdesk
So... There appears to be good reasons
why most data providers do not expose
   their databases for public query!
Are SPARQL endpoints somehow
      “safer” or “better”?
One of the early-adopters of RDF/SPARQL
in the bioinformatics domain was UniProt
How are things going for them?
Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes
To: Mark <markw@illuminae.com>
Date: Tue, 19 Feb 2013 13:11:22 +0100
                                                                                           A message posted to the Bio2RDF
Subject: SPARQL or not?
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
                                                                                           mailing list last week from Jerven
From: "Mark Wilkinson" <markw@illuminae.com>
Message-ID: <op.wsq5g8jenbznux@bioinformatica-mark>
User-Agent: Opera Mail/12.14 (Linux)
                                                                                           Bolleman, one of the team-members
X-Antivirus: AVG for E-mail 2012.0.2238 [2639/5614]
X-AVG-ID: ID798D8A94-2992BC71
                                                                                           behind UniProt’s push for RDF...
Hi Bio2RDF maintainers,

I keep on noticing this rather expensive query.

CONSTRUCT
 { <http://bio2rdf.org/search/Paget> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://bio2rdf.org/bio2rdf_resource:SearchResults> .
   <http://bio2rdf.org/search/Paget> <http://bio2rdf.org/bio2rdf_resource:hasSearchResult> ?s .
   <http://bio2rdf.org/search/Paget> <http://www.w3.org/2000/01/rdf-schema#seeAlso> ?s .
   ?s <http://www.w3.org/2000/01/rdf-schema#label> ?label .
   ?s <http://purl.org/dc/elements/1.1/title> ?title .
   ?s <http://purl.org/dc/terms/title> ?dctermstitle .
   ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type .
   ?s <http://www.w3.org/2004/02/skos/core#prefLabel> ?skoslabel .
   ?s ?p ?o .}
WHERE
 { ?s ?p ?o
   FILTER contains(str(?o), ""Paget"")
   OPTIONAL
    { ?s <http://www.w3.org/2000/01/rdf-schema#label> ?label }
   OPTIONAL
    { ?s <http://purl.org/dc/elements/1.1/title> ?title }
   OPTIONAL
    { ?s <http://purl.org/dc/terms/title> ?dctermstitle }
   OPTIONAL
    { ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type }
   OPTIONAL
    { ?s <http://www.w3.org/2004/02/skos/core#prefLabel> ?skoslabel }
 }
OFFSET 0
LIMIT 500

It comes from the example queries on the bio2rdf landing page.
Its extremely resource consuming and totally useless as it will never ever run in time.

Can you please change this query to something useful and workable. And at least cache the results if you ever get them.

Regards,
Jerven
Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes
To: Mark <markw@illuminae.com>
Date: Tue, 19 Feb 2013 13:11:22 +0100
Subject: SPARQL or not?
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
From: "Mark Wilkinson" <markw@illuminae.com>
Message-ID: <op.wsq5g8jenbznux@bioinformatica-mark>
User-Agent: Opera Mail/12.14 (Linux)
X-Antivirus: AVG for E-mail 2012.0.2238 [2639/5614]
X-AVG-ID: ID798D8A94-2992BC71
                                                                                                           I keep noticing this
Hi Bio2RDF maintainers,

I keep on noticing this rather expensive query.                                                          rather expensive query
CONSTRUCT
 { <http://bio2rdf.org/search/Paget> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://bio2rdf.org/bio2rdf_resource:SearchResults> .
   <http://bio2rdf.org/search/Paget> <http://bio2rdf.org/bio2rdf_resource:hasSearchResult> ?s .
   <http://bio2rdf.org/search/Paget> <http://www.w3.org/2000/01/rdf-schema#seeAlso> ?s .
   ?s <http://www.w3.org/2000/01/rdf-schema#label> ?label .
   ?s <http://purl.org/dc/elements/1.1/title> ?title .
   ?s <http://purl.org/dc/terms/title> ?dctermstitle .
   ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type .
   ?s <http://www.w3.org/2004/02/skos/core#prefLabel> ?skoslabel .
   ?s ?p ?o .}
WHERE
 { ?s ?p ?o
   FILTER contains(str(?o), ""Paget"")
   OPTIONAL
    { ?s <http://www.w3.org/2000/01/rdf-schema#label> ?label }
   OPTIONAL
    { ?s <http://purl.org/dc/elements/1.1/title> ?title }
   OPTIONAL
    { ?s <http://purl.org/dc/terms/title> ?dctermstitle }
   OPTIONAL
    { ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type }
   OPTIONAL
    { ?s <http://www.w3.org/2004/02/skos/core#prefLabel> ?skoslabel }
 }
OFFSET 0
LIMIT 500

It comes from the example queries on the bio2rdf landing page.
Its extremely resource consuming and totally useless as it will never ever run in time.

Can you please change this query to something useful and workable. And at least cache the results if you ever get them.

Regards,
Jerven
Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes
To: Mark <markw@illuminae.com>
Date: Tue, 19 Feb 2013 13:11:22 +0100
Subject: SPARQL or not?
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
From: "Mark Wilkinson" <markw@illuminae.com>
Message-ID: <op.wsq5g8jenbznux@bioinformatica-mark>
User-Agent: Opera Mail/12.14 (Linux)
X-Antivirus: AVG for E-mail 2012.0.2238 [2639/5614]
X-AVG-ID: ID798D8A94-2992BC71


Hi Bio2RDF maintainers,

I keep on noticing this rather expensive query.

CONSTRUCT
 { <http://bio2rdf.org/search/Paget> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://bio2rdf.org/bio2rdf_resource:SearchResults> .
   <http://bio2rdf.org/search/Paget> <http://bio2rdf.org/bio2rdf_resource:hasSearchResult> ?s .
   <http://bio2rdf.org/search/Paget> <http://www.w3.org/2000/01/rdf-schema#seeAlso> ?s .
   ?s <http://www.w3.org/2000/01/rdf-schema#label> ?label .
   ?s <http://purl.org/dc/elements/1.1/title> ?title .
   ?s <http://purl.org/dc/terms/title> ?dctermstitle .
   ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type .
   ?s <http://www.w3.org/2004/02/skos/core#prefLabel> ?skoslabel .
   ?s ?p ?o .}
WHERE
 { ?s ?p ?o
   FILTER contains(str(?o), ""Paget"")
   OPTIONAL
                                                                                                        It comes from THE EXAMPLE QUERIES
    { ?s <http://www.w3.org/2000/01/rdf-schema#label> ?label }
   OPTIONAL                                                                                                  on the Bio2RDF landing page
    { ?s <http://purl.org/dc/elements/1.1/title> ?title }
   OPTIONAL
    { ?s <http://purl.org/dc/terms/title> ?dctermstitle }
   OPTIONAL
    { ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type }
                                                                                                                             (my emphasis added)
   OPTIONAL
    { ?s <http://www.w3.org/2004/02/skos/core#prefLabel> ?skoslabel }
 }
OFFSET 0
LIMIT 500

It comes from the example queries on the bio2rdf landing page.
Its extremely resource consuming and totally useless as it will never ever run in time.

Can you please change this query to something useful and workable. And at least cache the results if you ever get them.

Regards,
Jerven
Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes
To: Mark <markw@illuminae.com>
Date: Tue, 19 Feb 2013 13:11:22 +0100
Subject: SPARQL or not?
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
From: "Mark Wilkinson" <markw@illuminae.com>
Message-ID: <op.wsq5g8jenbznux@bioinformatica-mark>
User-Agent: Opera Mail/12.14 (Linux)
X-Antivirus: AVG for E-mail 2012.0.2238 [2639/5614]
X-AVG-ID: ID798D8A94-2992BC71


Hi Bio2RDF maintainers,

I keep on noticing this rather expensive query.

CONSTRUCT
 { <http://bio2rdf.org/search/Paget> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://bio2rdf.org/bio2rdf_resource:SearchResults> .
   <http://bio2rdf.org/search/Paget> <http://bio2rdf.org/bio2rdf_resource:hasSearchResult> ?s .
   <http://bio2rdf.org/search/Paget> <http://www.w3.org/2000/01/rdf-schema#seeAlso> ?s .
   ?s <http://www.w3.org/2000/01/rdf-schema#label> ?label .
   ?s <http://purl.org/dc/elements/1.1/title> ?title .
   ?s <http://purl.org/dc/terms/title> ?dctermstitle .
   ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type .
   ?s <http://www.w3.org/2004/02/skos/core#prefLabel> ?skoslabel .
   ?s ?p ?o .}
WHERE
 { ?s ?p ?o
   FILTER contains(str(?o), ""Paget"")
   OPTIONAL
    { ?s <http://www.w3.org/2000/01/rdf-schema#label> ?label }
   OPTIONAL
    { ?s <http://purl.org/dc/elements/1.1/title> ?title }
   OPTIONAL
                                                                                                            It’s extremely resource-
    { ?s <http://purl.org/dc/terms/title> ?dctermstitle }
   OPTIONAL
    { ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type }
                                                                                                        consuming and totally useless as
 }
   OPTIONAL
    { ?s <http://www.w3.org/2004/02/skos/core#prefLabel> ?skoslabel }                                       it will never run in time
OFFSET 0
LIMIT 500

It comes from the example queries on the bio2rdf landing page.
Its extremely resource consuming and totally useless as it will never ever run in time.

Can you please change this query to something useful and workable. And at least cache the results if you ever get them.

Regards,
Jerven
So even people who are world-leaders in RDF and SPARQL
         write “expensive” and “useless” queries
        that (already!) are making life difficult for
                SPARQL endpoint providers


       I believe that situation will only get worse
    as more people begin to use the Semantic Web
 and as SPARQL itself becomes richer and more SQL-like
In My Opinion


    History tells us, and this story IMO supports,
that SPARQL endpoints might not be widely adopted
       by source bioinformatics data providers


Historically, the majority of bioinformatics data hosts
           have opted for API/Service-based
                access to their resources
In My Opinion


Moreover, I am still obsessed with interoperability!



   Having a unified way to discover, and access,
             bioinformatics resources

    whether they be databases or algorithms

          just seems like a Good Thing™
In My Opinion



So we need to find a way to make Web Services
      play nicely with the Semantic Web
Design Pattern for
Web Services on the Semantic Web
Part II

SADI “PHILOSOPHY” AND DESIGN
The Semantic Web



    causally related to
The important bit
The link is explicitly labeled




        causally related to




                   ???
causally related with
                      http://semanticscience.org/resource/SIO_000243




SIO_000243:

<owl:ObjectProperty rdf:about="&resource;SIO_000243">
    <rdfs:label xml:     lang="en">         is causally related with</rdfs:label>
    <rdf:type          rdf:resource="&owl;SymmetricProperty"/>
    <rdf:type          rdf:resource="&owl;TransitiveProperty"/>
    <dc:description xml:lang="en">           A transitive, symmetric, temporal relation
in which one entity is causally related with another non-identical entity.
    </dc:description>
    <rdfs:subPropertyOf rdf:resource="&resource;SIO_000322"/>
 </owl:ObjectProperty>
causally related with
                      http://semanticscience.org/resource/SIO_000243




SIO_000243:

<owl:ObjectProperty rdf:about="&resource;SIO_000243">
    <rdfs:label xml:     lang="en">         is causally related with</rdfs:label>
    <rdf:type          rdf:resource="&owl;SymmetricProperty"/>
    <rdf:type          rdf:resource="&owl;TransitiveProperty"/>
    <dc:description xml:lang="en">           A transitive, symmetric, temporal relation
in which one entity is causally related with another non-identical entity.
    </dc:description>
    <rdfs:subPropertyOf rdf:resource="&resource;SIO_000322"/>
 </owl:ObjectProperty>
There are many suggestions for how to bring the Deep Web

into the Semantic Web using Semantic Web Services (SWS)


                        OWL-S

                        SAWSDL

                        WSDL-S

                       Others...
There are many suggestions for how to bring the Deep Web

into the Semantic Web using Semantic Web Services (SWS)



                  Describe input data

                  Describe output data

     Describe how the system manipulates the data

       Describe how the world changes as a result
There are many suggestions for how to bring the Deep Web

into the Semantic Web using Semantic Web Services (SWS)



                  Describe input data           Usually through
                                             “semantic annotation”
                  Describe output data          of XML Schema

     Describe how the system manipulates the data

       Describe how the world changes as a result
There are many suggestions for how to bring the Deep Web

into the Semantic Web using Semantic Web Services (SWS)



                  Describe input data         In the least-semantic
                                               case, the input and
                                             output data is “vanilla”
                  Describe output data                 XML

     Describe how the system manipulates the data

       Describe how the world changes as a result
There are many suggestions for how to bring the Deep Web

into the Semantic Web using Semantic Web Services (SWS)



                  Describe input data        In the “most semantic”
                                                case (WSDL) RDF is
                                               converted into XML,
                  Describe output data       then back to RDF again

     Describe how the system manipulates the data

       Describe how the world changes as a result
There are many suggestions for how to bring the Deep Web

into the Semantic Web using Semantic Web Services (SWS)



                  Describe input data         The rigidity of XML
                                                Schema is the
                                               antithesis of the
                  Describe output data         Semantic Web!

     Describe how the system manipulates the data

       Describe how the world changes as a result
There are many suggestions for how to bring the Deep Web

into the Semantic Web using Semantic Web Services (SWS)



                  Describe input data           So... Perhaps we
                                             shouldn’t be using XML
                  Describe output data          Schema at all...??

     Describe how the system manipulates the data

       Describe how the world changes as a result
There are many suggestions for how to bring the Deep Web

into the Semantic Web using Semantic Web Services (SWS)



                  Describe input data

                  Describe output data

     Describe how the system manipulates the data
                                                           HARD!
       Describe how the world changes as a result
There are many suggestions for how to bring the Deep Web

into the Semantic Web using Semantic Web Services (SWS)



                  Describe input data

                  Describe output data

     Describe how the system manipulates the data
                                                      Un-necessary?
       Describe how the world changes as a result
Lord, Phillip, et al. The Semantic Web–ISWC 2004 (2004): 350-364.
Scientific Web Services
                                                              are DIFFERENT!




Lord, Phillip, et al. The Semantic Web–ISWC 2004 (2004): 350-364.
“The service interfaces within bioinformatics are relatively
                           simple. An extensible or constrained interoperability framework
                                is likely to suffice for current demands: a fully generic
                                         framework is currently not necessary.”




Lord, Phillip, et al. The Semantic Web–ISWC 2004 (2004): 350-364.
Scientific Web Services are DIFFERENT

                     They’re simpler!




Rather than waiting for a solution to the more general problem

            (which may be years away... or more!)

       can we solve the Semantic Web Service problem
                 within the scientific domain
         while still being fully standards-compliant?
Other “philosophical”
   considerations
v.v. being Semantic Webby,
 what is missing from this list?



            Describe input data

            Describe output data

Describe how the system manipulates the data

 Describe how the world changes as a result
causally related with
http://semanticscience.org/resource/SIO_000243
causally related with
  http://semanticscience.org/resource/SIO_000243




The Semantic Web works because of relationships!
causally related with
    http://semanticscience.org/resource/SIO_000243




  The Semantic Web works because of relationships!




  In 2008 I proposed that, in the Semantic Web world,
algorithms should be viewed as “exposing” relationships
           between the input and output data
Web Service


AACTCTTCGTAGTG...


                      BLAST
SADI      AACTCTTCGTAGTG...
                                                        has_seq_string

                                                   sequence
AACTCTTCGTAGTG...
            has_seq_string                              has
                                                        homology
      sequence
                                                        to
                                BLAST
                                              Terminal Flower

                                               type           species

 SADI requires you to explicitly declare
 as part of your analytical output,         gene          A. thal.
 the biological relationship that your
 algorithm “exposed”.
Another “philosophical” decision was
           to abandon XML Schema


         In a world that is moving towards
          RDF representations of all data
it makes no sense to convert semantically rich RDF
      into semantic-free Schema-based XML
              then back into RDF again
The final philosophical decision was
             to abandon SOAP


The bioinformatics community seems to be
  very receptive to pure-HTTP interfaces
        (e.g. the popularity of REST-like APIs)



     So SADI uses simple HTTP POST
        of just the RDF input data
    (no message scaffold whatsoever)
Part III

SADI SERVICE DISCOVERY
AND INVOCATION
In slightly more detail...
ID            Name         Height   Weight   Age
      24601 Jean Valjean   1.8m     84kg     45
7474505B Jake Blues        1.73m    101kg    31
          6          —     1.88m    75kg     39
...                  ...    ...      ...     ...
ID            Name         Height   Weight   Age
      24601 Jean Valjean   1.8m     84kg     45
7474505B Jake Blues        1.73m    101kg    31
          6          —     1.88m    75kg     39
...                  ...    ...      ...     ...
OWL-DL Classes




  ID            Name         Height   Weight   Age
        24601 Jean Valjean   1.8m     84kg     45
  7474505B Jake Blues        1.73m    101kg    31
            6          —     1.88m    75kg     39
  ...                  ...    ...      ...     ...
Property restrictions
in OWL Class definition



     ID            Name         Height   Weight   Age
           24601 Jean Valjean   1.8m     84kg     45
     7474505B Jake Blues        1.73m    101kg    31
               6          —     1.88m    75kg     39
     ...                  ...    ...      ...     ...
ID            Name         Height   Weight   Age
      24601 Jean Valjean   1.8m     84kg     45
7474505B Jake Blues        1.73m    101kg    31
          6          —     1.88m    75kg     39
...                  ...    ...      ...     ...
A reasoner determines that Patient #24601
  is an OWL Individual of the Input service Class




ID            Name         Height   Weight   Age
      24601 Jean Valjean   1.8m     84kg     45
7474505B Jake Blues        1.73m    101kg    31
          6          —     1.88m    75kg     39
...                  ...    ...      ...     ...
NOTE THE URI OF THE INPUT INDIVIDUAL
             Patient:24601




ID            Name         Height   Weight   Age
      24601 Jean Valjean   1.8m     84kg     45
7474505B Jake Blues        1.73m    101kg    31
          6          —     1.88m    75kg     39
...                  ...    ...      ...     ...
ID            Name         Height   Weight   Age   BMI
      24601 Jean Valjean   1.8m     84kg     45    25.9
7474505B Jake Blues        1.73m    101kg    31
          6          —     1.88m    75kg     39
...                  ...    ...      ...     ...
NOTE THE URI OF THE OUTPUT INDIVIDUAL
             Patient:24601




     ID            Name         Height   Weight   Age   BMI
           24601 Jean Valjean   1.8m     84kg     45    25.9
     7474505B Jake Blues        1.73m    101kg    31
               6          —     1.88m    75kg     39
     ...                  ...    ...      ...     ...
The URI of the input is linked by a
 meaningful predicate to the output
(either literal output or another URI)
Therefore, by connecting SADI services
together in a workflow you end-up with an
     unbroken chain of Linked Data
Part IV

SADI TO THE EXTREME:
“WEB SCIENCE 2.0”
A proof-of-concept query engine & registry

 Objective: answer biologists’ questions
The SHARE registry

   indexes all of the input/output/relationship

triples that can be generated by all known services



       This is how SHARE discovers services
We wanted to duplicate
a real, peer-reviewed, bioinformatics analysis


   simply by building a model in the Web
       describing what the answer
                  (if one existed)
               would look like
...the machine had to make
     every other decision
         on it’s own
This is the study we chose:
Gordon, P.M.K., Soliman, M.A., Bose, P., Trinh, Q., Sensen, C.W., Riabowol, K.: Interspecies
data mining to predict novel ING-protein interactions in human. BMC genomics. 9, 426 (2008).
Original Study Simplified




Using what is known about interactions in fly & yeast


         predict new interactions with your
                 protein of interest
“Pseudo-code” Abstracted Workflow

Given a protein P in Species X

   Find proteins similar to P in Species Y
   Retrieve interactors in Species Y
   Sequence-compare Y-interactors with Species X genome
        (1)  Keep only those with homologue in X


   Find proteins similar to P in Species Z
   Retrieve interactors in Species Z
   Sequence-compare Z-interactors with (1)



              Putative interactors in Species X
Modeling the science...




                          OWL
Modeling the science...


       ProbableInteractor
           is homologous to (
               Potential Interactor from ModelOrganism1…)
               and
              Potential Interactor from ModelOrganism2…)




Probable Interactor is defined in OWL as a subClass - something that appears
        as a potential interactor in both comparator model organisms.
Running the Web Science Experiment


                       In a local data-file

           provide the protein we are interested in

    and the two species we wish to use in our comparison




  taxon:9606       a      i:OrganismOfInterest . # human
  uniprot:Q9UK53   a      i:ProteinOfInterest . # ING1
  taxon:4932       a      i:ModelOrganism1 . # yeast
  taxon:7227       a      i:ModelOrganism2 . # fly
The tricky bit is...

 In the abstract, the search
 for homology is “generic” –
  ANY Protein, ANY model
           system



But when the machine does
 the experiment, it will need
     to use (at least) two
organism-specific resources
because the answer requires
    information from two
                                taxon:4932   a   i:ModelOrganism1 . # yeast
       declared species         taxon:7227   a   i:ModelOrganism2 . # fly
This is the question we ask:
                  (the query language here is SPARQL)




PREFIX i: <http://sadiframework.org/ontologies/InteractingProteins.owl#>

SELECT ?protein
FROM <file:/local/workflow.input.n3>
WHERE {

     ?protein        a       i:ProbableInteractor .
}




          The URL of our OWL model (ontology) defining Probable Interactors
Each relationship (property-restriction)
in the OWL Class is then matched
with a SADI Service

The matched SADI Service can
generate data that fulfils that
property restriction
(i.e. produces triples with that S/P/O pattern)
SHARE chains these SADI services
into an analytical workflow...

...the outputs from that workflow are
Instances (OWL Individuals) of
Probable Interactors
SHARE derived (and executed) the following workflow automatically




                                              These are different
                                              SADI Web Services...

                                              ...selected at run-time
                                              based on the same model
Keys to Success:

1: Use standards

2: Focus on predicates, not classes

3: Use these predicates to define, rather than assert, classes

4: Make sure all URIs resolve, and resolve to something useful

5: Never leave the RDF world... (abandon vanilla XML,
                            even for Web Services!)

6: Use reasoners... Everywhere... Always!
Part V

THE TOOLS AVAILABLE
Part V - A

SERVICE PROVISION
Libraries
     • Perl
     • Java
     • Python



Plug-in to Protege
     • Perl service scaffolds
     • Java service scaffolds
Part V - B

CLIENTS
SHARE


 • you’ve already seen how SHARE works...
Taverna

  • Contextual service discovery

  • Automatic RDF serialization and
  deserialization beetween SADI and non-SADI
  services

  • Note that Taverna is not as rich a client as
  SHARE. The reason is that SHARE will
  aggregate and re-reason after every service
  invocation. There is no (automatic) data
  aggregation in Taverna.
Using SADI services – building a workflow
The next step in the workflow is to find a SADI service that takes the
genes from getKEGGGenesByPathway and returns the proteins
that those genes code for.
Using SADI services – building a workflow
Right-click on the service output port and click Find services that
consume KEGG_Record…
Using SADI services – building a workflow
Select getUniprotByKeggGene from the list of SADI services and
click Connect.
Using SADI services – building a workflow
The getUniprotByKeggGene service is added to the workflow and
automatically connected to the output from
getKEGGGenesByPathway.
Using SADI services – building a workflow
Add a new workflow output called protein and connect the output
from the getUniprotByKeggGene service to it.
Using SADI services – building a workflow
The next step in the workflow is to find a SADI service that takes the
proteins and returns sequences of those proteins. Right-click on the
encodes output port and click Find services that consume
UniProt_Record…
Using SADI services – building a workflow
The UniProt info service attaches the property hasSequence so
select this service and click Connect.
Using SADI services – building a workflow
The UniProt info service is added to the workflow and automatically
connected to the output from getUniprotByKeggGene .
Using SADI services – building a workflow
Add a new workflow output called sequence and connect the output
from the hasSequence output from the UniProt info service to it.
Using SADI services – building a workflow
The KEGG pathway were interested in is "hsa00232”, so we’ll add it as
a constant value. Right-click on the KEGG_PATHWAY_Record
input port and click Constant value.
Using SADI services – building a workflow
Enter the value hsa00232 and click OK.
Using SADI services – building a workflow
The workflow is now complete and ready to run.
IO Informatics Knowledge Explorer plug-in

  • “Bootstrapping” of semantics using known
  URI schema (identifiers.org, LSRN, Bio2RDF,
  etc.)

  • Contextual service discovery

  • Automatic packaging of appropriate data
  from your data-store and automated service
  invocation using that data.

  •This uses some not-widely-known services and
  metadata that is in the SHARE registry!!
The SADI plug-in to the
    IO Informatics’
  Knowledge Explorer




...a quick explanation of how
we “boot-strap” semantics...
The Knowledge Explorer
   Personal Edition,
and the SADI plug-in, are
     freely available.
Sentient Knowledge Explorer is a retrieval, integration,
visualization, query, and exploration environment for semantically
                             rich data
Most imported data-sets will already have
       properties (e.g. “encodes”)

   …and the data will already be typed
       (e.g. “Gene” or “Protein”)

…so finding SADI Services to consume that
              data is ~trivial
Now what...??

No properties...

No rdf:type...

How do I find a service using that node?

What *is* that node anyway??
In the case of LSRN URIs, they resolve to:

<lsrn:DragonDB_Locus_Record rdf:about="http://lsrn.org/DragonDB_Locus:CHO"
 <dc:identifier>CHO</dc:identifier>
 <sio:SIO_000671> <!-- has identifier -->
  <lsrn:DragonDB_Locus_Identifier>
    <sio:SIO_000300>CHO</sio:SIO_000300> <!-- has value -->
  </lsrn:DragonDB_Locus_Identifier>
 </sio:SIO_000671>
</lsrn:DragonDB_Locus_Record>
</rdf:RDF>
In the case of LSRN URIs, they resolve to:

<lsrn:DragonDB_Locus_Record rdf:about="http://lsrn.org/DragonDB_Locus:CHO
 <dc:identifier>CHO</dc:identifier>
 <sio:SIO_000671> <!-- has identifier -->
  <lsrn:DragonDB_Locus_Identifier>
    <sio:SIO_000300>CHO</sio:SIO_000300> <!-- has value -->
  </lsrn:DragonDB_Locus_Identifier>
 </sio:SIO_000671>
</lsrn:DragonDB_Locus_Record>
</rdf:RDF>                        The Semantic Science Integrated Ontology
                                  (Dumontier) has a model for how to describe
                                  database records, including explicitly making
                                  the record identifier an attribute of that
                                  record; in our LSRN metadata, we also
                                  explicitly rdf:type both records and identifiers.
Now we have enough information to start exploring global data...
Menu option provided by the plugin
Discovered the (only)
service that consumes
these kinds of records
Output is added to the graph (with
some extra logic to make visualization
of complex data structures a bit easier)
Lather, rinse,
repeat...
...and of course,
these links are
“live”
What about URIs other than LSRN?
HTTP POST the URI to the SHARE
              Resolver Service
It will (try to) return you SIO-compliant
      RDF metadata about that URI
      (this is a typical SADI service)

The resolver currently recognizes a few
   different sharted-URI schemes
    (e.g. Bio2RDF, Identifiers.org)
and can be updated with new patterns
Next problem:
  Knowledge Explorer
and therefore the plug-in
    are written in C#

All of our interfaces are
  described in OWL

    C# reasoners are
extremely limited at this
          time
This problem manifests itself in two ways:


1. An individual on the KE canvas has all the
   properties required by a Service in the registry, but
   is not rdf:typed as that Service’s input type  how
   do you discover that Service so that you can add it
   to the menu?

2. For a selected Service from the menu, how does the
   plug-in know which data-elements it needs to
   extract from KE to send to that service in order to
   fulfil it’s input property-restrictions?
If I select a canvas node, and ask SADI to
find services, it will...
The get_sequence_for_region service
required ALL of this (hidden) information
Nevertheless:
(a) The service can be discovered based on JUST this node selection

(b) The service can be invoked based on JUST this node selection
Voila!
 How did the plug-in discover the service,
and determine which data was required to
access that service based on an OWL Class
      definition, without a reasoner?
SELECT ?x, ?y
                                      FROM knowledge_explorer_database
                                      WHERE {
                                           ?x    foaf:name   ?y
                                      }

                                 Convert Input OWL Class def’n
                                 into an ~equivalent SPARQL query
Service Description
INPUT OWL Class                                               Store together
NamedIndividual: things with                                  with index
     a “name” property                     INDEX
     from “foaf” ontology
                                      The service                   Registry
                                       provides a
OUTPUT OWL Class                       “greeting”
GreetedIndividual: things with       property based
   a “greeting” property
   from “hello” ontology
                                      on a “name”
                                        property
Just to ensure that I don’t over-trivialize this point,

the REAL SPARQL query that extracts the input for this service is...
CONSTRUCT {
                     ?input a <http://sadiframework.org/ontologies/GMOD/BiopolymerRegion.owl#BiopolymerRegion> .
                     ?input <http://sadiframework.org/ontologies/GMOD/BiopolymerRegion.owl#position> ?position .
                     ?position a <http://sadiframework.org/ontologies/GMOD/RangedSequencePosition.owl#RangedSequencePosition> .
                     ?position <http://sadiframework.org/ontologies/GMOD/RangedSequencePosition.owl#coordinate> ?start .
                     ?start a <http://sadiframework.org/ontologies/GMOD/RangedSequencePosition.owl#StartPosition> .
                     ?start <http://semanticscience.org/resource/SIO_000300> ?startValue .
                     ?position <http://sadiframework.org/ontologies/GMOD/RangedSequencePosition.owl#coordinate> ?end .
                     ?end a <http://sadiframework.org/ontologies/GMOD/RangedSequencePosition.owl#EndPosition> .
                     ?end <http://semanticscience.org/resource/SIO_000300> ?endValue .
                     ?position <http://sadiframework.org/ontologies/GMOD/RangedSequencePosition.owl#in_relation_to> ?sequence .
                     ?sequence <http://semanticscience.org/resource/SIO_000210> ?feature .
                     ?feature <http://semanticscience.org/resource/SIO_000008> ?identifier .
                     ?identifier <http://semanticscience.org/resource/SIO_000300> ?featureID .

                     ?sequence <http://semanticscience.org/resource/SIO_000210> ?strand .
                     ?strand <http://semanticscience.org/resource/SIO_000093> ?strandFeature .
                     ?strandFeature a ?strandFeatureType .
                     ?strandFeature <http://semanticscience.org/resource/SIO_000008> ?strandFeatureIdentifier .
                     ?strandFeatureIdentifier <http://semanticscience.org/resource/SIO_000300> ?strandFeatureID .
                     ?strand a ?strandType .
} WHERE {
                     ?input <http://sadiframework.org/ontologies/GMOD/BiopolymerRegion.owl#position> ?position .
                     ?position <http://sadiframework.org/ontologies/GMOD/RangedSequencePosition.owl#coordinate> ?start .
                     ?start a <http://sadiframework.org/ontologies/GMOD/RangedSequencePosition.owl#StartPosition> .
                     ?start <http://semanticscience.org/resource/SIO_000300> ?startValue .
                     ?position <http://sadiframework.org/ontologies/GMOD/RangedSequencePosition.owl#coordinate> ?end .
                     ?end a <http://sadiframework.org/ontologies/GMOD/RangedSequencePosition.owl#EndPosition> .
                     ?end <http://semanticscience.org/resource/SIO_000300> ?endValue .
                     ?position <http://sadiframework.org/ontologies/GMOD/RangedSequencePosition.owl#in_relation_to> ?sequence .
                     {
                                             ?sequence <http://semanticscience.org/resource/SIO_000210> ?feature .
                                             ?feature <http://semanticscience.org/resource/SIO_000008> ?identifier .

                                         ?identifier <http://semanticscience.org/resource/SIO_000300> ?featureID .
                     } UNION {
                                         ?sequence <http://semanticscience.org/resource/SIO_000210> ?strand .
                                         ?strand <http://semanticscience.org/resource/SIO_000093> ?strandFeature .
                                         {
                                                              ?strandFeature a <http://sadiframework.org/ontologies/GMOD/Feature.owl#Feature> .
                                         } UNION {
                                                              ?strandFeature <http://semanticscience.org/resource/SIO_000008> ?strandFeatureIdentifier
.
                                                              ?strandFeatureIdentifier <http://semanticscience.org/resource/SIO_000300>
?strandFeatureID .
                                         }.
                                         {
                                                              ?strand a <http://sadiframework.org/ontologies/GMOD/Strand.owl#PlusStrand> .
                                                              ?strand a ?strandType .
                                         } UNION {
                                                              ?strand a <http://sadiframework.org/ontologies/GMOD/Strand.owl#MinusStrand> .
                                                              ?strand a ?strandType .
                                         }.
                     }.
}
Summary

  While the Knowledge Explorer plug-in has similar
 functionality to other tools we have built for SADI, it
takes advantage of some features of the SADI Registry,
   and SADI in general, that are not widely-known.


    We hope that the availability of these features
  encourages development of SADI tooling in other
   languages that have limited access to reasoning.
Luke McCarthy
Lead Developer, SADI project




Benjamin VanderValk
Developer, SADI project

Más contenido relacionado

Destacado

¡LA INTERNACIONALIZACIÓN DE LA AMAZONA!
¡LA INTERNACIONALIZACIÓN DE LA AMAZONA!¡LA INTERNACIONALIZACIÓN DE LA AMAZONA!
¡LA INTERNACIONALIZACIÓN DE LA AMAZONA!pipis397
 
ISoLA 2010: SADI Taverna plug-in
ISoLA 2010:  SADI Taverna plug-inISoLA 2010:  SADI Taverna plug-in
ISoLA 2010: SADI Taverna plug-inMark Wilkinson
 
Diseño deprogramas
Diseño deprogramasDiseño deprogramas
Diseño deprogramassenasoft
 
Recomenzar
RecomenzarRecomenzar
Recomenzarpipis397
 
Eindadvies over-de-vernieuwing-van-de-examenprogrammas-maatschappijwetenschap...
Eindadvies over-de-vernieuwing-van-de-examenprogrammas-maatschappijwetenschap...Eindadvies over-de-vernieuwing-van-de-examenprogrammas-maatschappijwetenschap...
Eindadvies over-de-vernieuwing-van-de-examenprogrammas-maatschappijwetenschap...Luc Sluijsmans
 
Part 1: Lean Clinical Workplace Design
Part 1: Lean Clinical Workplace DesignPart 1: Lean Clinical Workplace Design
Part 1: Lean Clinical Workplace DesignHenryRahn
 
Curriculum specification F5
Curriculum specification F5Curriculum specification F5
Curriculum specification F5hajahrokiah
 
Semana de la biblioteca 2011 final
Semana de la biblioteca 2011 finalSemana de la biblioteca 2011 final
Semana de la biblioteca 2011 finalPaola Padilla
 
Evaluating Hypotheses using SPARQL-DL as an abstract workflow language to cho...
Evaluating Hypotheses using SPARQL-DL as an abstract workflow language to cho...Evaluating Hypotheses using SPARQL-DL as an abstract workflow language to cho...
Evaluating Hypotheses using SPARQL-DL as an abstract workflow language to cho...Mark Wilkinson
 
Rcg Presentation
Rcg PresentationRcg Presentation
Rcg Presentationrcggroup1
 
¡UNA BOTELLA AGUA....Y QUE!
¡UNA BOTELLA AGUA....Y QUE!¡UNA BOTELLA AGUA....Y QUE!
¡UNA BOTELLA AGUA....Y QUE!pipis397
 
Web Science, SADI, and the Singularity
Web Science, SADI, and the SingularityWeb Science, SADI, and the Singularity
Web Science, SADI, and the SingularityMark Wilkinson
 
Web Science - ISoLA 2012
Web Science - ISoLA 2012Web Science - ISoLA 2012
Web Science - ISoLA 2012Mark Wilkinson
 
Heapoff memory wtf
Heapoff memory wtfHeapoff memory wtf
Heapoff memory wtfOlivier Lamy
 
Making the Most of Plug-ins - WordCamp Toronto 2008
Making the Most of Plug-ins - WordCamp Toronto 2008Making the Most of Plug-ins - WordCamp Toronto 2008
Making the Most of Plug-ins - WordCamp Toronto 2008Brendan Sera-Shriar
 
Barisan Pentadbir SMKTP(AA)
Barisan Pentadbir SMKTP(AA)Barisan Pentadbir SMKTP(AA)
Barisan Pentadbir SMKTP(AA)hajahrokiah
 

Destacado (20)

CDIS DR. FSM
CDIS    DR. FSMCDIS    DR. FSM
CDIS DR. FSM
 
¡LA INTERNACIONALIZACIÓN DE LA AMAZONA!
¡LA INTERNACIONALIZACIÓN DE LA AMAZONA!¡LA INTERNACIONALIZACIÓN DE LA AMAZONA!
¡LA INTERNACIONALIZACIÓN DE LA AMAZONA!
 
ISoLA 2010: SADI Taverna plug-in
ISoLA 2010:  SADI Taverna plug-inISoLA 2010:  SADI Taverna plug-in
ISoLA 2010: SADI Taverna plug-in
 
Diseño deprogramas
Diseño deprogramasDiseño deprogramas
Diseño deprogramas
 
Recomenzar
RecomenzarRecomenzar
Recomenzar
 
Eindadvies over-de-vernieuwing-van-de-examenprogrammas-maatschappijwetenschap...
Eindadvies over-de-vernieuwing-van-de-examenprogrammas-maatschappijwetenschap...Eindadvies over-de-vernieuwing-van-de-examenprogrammas-maatschappijwetenschap...
Eindadvies over-de-vernieuwing-van-de-examenprogrammas-maatschappijwetenschap...
 
Part 1: Lean Clinical Workplace Design
Part 1: Lean Clinical Workplace DesignPart 1: Lean Clinical Workplace Design
Part 1: Lean Clinical Workplace Design
 
Curriculum specification F5
Curriculum specification F5Curriculum specification F5
Curriculum specification F5
 
Semana de la biblioteca 2011 final
Semana de la biblioteca 2011 finalSemana de la biblioteca 2011 final
Semana de la biblioteca 2011 final
 
Evaluating Hypotheses using SPARQL-DL as an abstract workflow language to cho...
Evaluating Hypotheses using SPARQL-DL as an abstract workflow language to cho...Evaluating Hypotheses using SPARQL-DL as an abstract workflow language to cho...
Evaluating Hypotheses using SPARQL-DL as an abstract workflow language to cho...
 
Rcg Presentation
Rcg PresentationRcg Presentation
Rcg Presentation
 
¡UNA BOTELLA AGUA....Y QUE!
¡UNA BOTELLA AGUA....Y QUE!¡UNA BOTELLA AGUA....Y QUE!
¡UNA BOTELLA AGUA....Y QUE!
 
La graviola
La graviolaLa graviola
La graviola
 
Web Science, SADI, and the Singularity
Web Science, SADI, and the SingularityWeb Science, SADI, and the Singularity
Web Science, SADI, and the Singularity
 
Web Science - ISoLA 2012
Web Science - ISoLA 2012Web Science - ISoLA 2012
Web Science - ISoLA 2012
 
Heapoff memory wtf
Heapoff memory wtfHeapoff memory wtf
Heapoff memory wtf
 
Tumor Type Search
Tumor Type SearchTumor Type Search
Tumor Type Search
 
Making the Most of Plug-ins - WordCamp Toronto 2008
Making the Most of Plug-ins - WordCamp Toronto 2008Making the Most of Plug-ins - WordCamp Toronto 2008
Making the Most of Plug-ins - WordCamp Toronto 2008
 
Barisan Pentadbir SMKTP(AA)
Barisan Pentadbir SMKTP(AA)Barisan Pentadbir SMKTP(AA)
Barisan Pentadbir SMKTP(AA)
 
Red5 - PHUG Workshops
Red5 - PHUG WorkshopsRed5 - PHUG Workshops
Red5 - PHUG Workshops
 

Similar a Bio2RDF Query Example Resource Consuming

Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013François Belleau
 
#mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit
#mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit#mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit
#mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEditTerry Reese
 
Open (linked) bibliographic data edmund chamberlain (university of cambridge)
Open (linked) bibliographic data   edmund chamberlain (university of cambridge)Open (linked) bibliographic data   edmund chamberlain (university of cambridge)
Open (linked) bibliographic data edmund chamberlain (university of cambridge)RDTF-Discovery
 
Open (linked) bibliographic data
Open (linked) bibliographic dataOpen (linked) bibliographic data
Open (linked) bibliographic dataEdmund Chamberlain
 
Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012François Belleau
 
Visualize open data with Plone - eea.daviz PLOG 2013
Visualize open data with Plone - eea.daviz PLOG 2013Visualize open data with Plone - eea.daviz PLOG 2013
Visualize open data with Plone - eea.daviz PLOG 2013Antonio De Marinis
 
The Cultural Linked Data Backbone
The Cultural Linked Data BackboneThe Cultural Linked Data Backbone
The Cultural Linked Data BackboneRichard Wallis
 
BioThings API: Promoting Best-practices via a Biomedical API Development Ecos...
BioThings API: Promoting Best-practices via a Biomedical API Development Ecos...BioThings API: Promoting Best-practices via a Biomedical API Development Ecos...
BioThings API: Promoting Best-practices via a Biomedical API Development Ecos...Chunlei Wu
 
GDG Meets U event - Big data & Wikidata - no lies codelab
GDG Meets U event - Big data & Wikidata -  no lies codelabGDG Meets U event - Big data & Wikidata -  no lies codelab
GDG Meets U event - Big data & Wikidata - no lies codelabCAMELIA BOBAN
 
Presentation at the EMBL-EBI Industry RDF meeting
Presentation at the EMBL-EBI  Industry RDF meetingPresentation at the EMBL-EBI  Industry RDF meeting
Presentation at the EMBL-EBI Industry RDF meetingJohannes Keizer
 
Bits and Pieces from the UPEI Experience
Bits and Pieces from the UPEI ExperienceBits and Pieces from the UPEI Experience
Bits and Pieces from the UPEI ExperienceEvergreen ILS
 
Ed presents JSF 2.2 and WebSocket to Gameduell.
Ed presents JSF 2.2 and WebSocket to Gameduell.Ed presents JSF 2.2 and WebSocket to Gameduell.
Ed presents JSF 2.2 and WebSocket to Gameduell.Edward Burns
 
Virtuoso RDF Triple Store Analysis Benchmark & mapping tools RDF / OO
Virtuoso RDF Triple Store Analysis Benchmark & mapping tools RDF / OOVirtuoso RDF Triple Store Analysis Benchmark & mapping tools RDF / OO
Virtuoso RDF Triple Store Analysis Benchmark & mapping tools RDF / OOPaolo Cristofaro
 
Better Ruby Through Design Principles
Better Ruby Through Design PrinciplesBetter Ruby Through Design Principles
Better Ruby Through Design PrinciplesMike Gehard
 

Similar a Bio2RDF Query Example Resource Consuming (20)

Bio2RDF @ W3C HCLS2009
Bio2RDF @ W3C HCLS2009Bio2RDF @ W3C HCLS2009
Bio2RDF @ W3C HCLS2009
 
Bio2RDF@BH2010
Bio2RDF@BH2010Bio2RDF@BH2010
Bio2RDF@BH2010
 
Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013
 
#mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit
#mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit#mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit
#mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit
 
Open (linked) bibliographic data edmund chamberlain (university of cambridge)
Open (linked) bibliographic data   edmund chamberlain (university of cambridge)Open (linked) bibliographic data   edmund chamberlain (university of cambridge)
Open (linked) bibliographic data edmund chamberlain (university of cambridge)
 
Open (linked) bibliographic data
Open (linked) bibliographic dataOpen (linked) bibliographic data
Open (linked) bibliographic data
 
Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012
 
URL Design
URL DesignURL Design
URL Design
 
LOD2 Webinar Series: 3rd relase of the Stack
LOD2 Webinar Series: 3rd relase of the StackLOD2 Webinar Series: 3rd relase of the Stack
LOD2 Webinar Series: 3rd relase of the Stack
 
Visualize open data with Plone - eea.daviz PLOG 2013
Visualize open data with Plone - eea.daviz PLOG 2013Visualize open data with Plone - eea.daviz PLOG 2013
Visualize open data with Plone - eea.daviz PLOG 2013
 
The Cultural Linked Data Backbone
The Cultural Linked Data BackboneThe Cultural Linked Data Backbone
The Cultural Linked Data Backbone
 
BioThings API: Promoting Best-practices via a Biomedical API Development Ecos...
BioThings API: Promoting Best-practices via a Biomedical API Development Ecos...BioThings API: Promoting Best-practices via a Biomedical API Development Ecos...
BioThings API: Promoting Best-practices via a Biomedical API Development Ecos...
 
GDG Meets U event - Big data & Wikidata - no lies codelab
GDG Meets U event - Big data & Wikidata -  no lies codelabGDG Meets U event - Big data & Wikidata -  no lies codelab
GDG Meets U event - Big data & Wikidata - no lies codelab
 
Data in RDF
Data in RDFData in RDF
Data in RDF
 
AGROVOC, AGRIS and the CIARD RING, using RDF vocabularies and technologies f...
AGROVOC, AGRIS and the CIARD RING,  using RDF vocabularies and technologies f...AGROVOC, AGRIS and the CIARD RING,  using RDF vocabularies and technologies f...
AGROVOC, AGRIS and the CIARD RING, using RDF vocabularies and technologies f...
 
Presentation at the EMBL-EBI Industry RDF meeting
Presentation at the EMBL-EBI  Industry RDF meetingPresentation at the EMBL-EBI  Industry RDF meeting
Presentation at the EMBL-EBI Industry RDF meeting
 
Bits and Pieces from the UPEI Experience
Bits and Pieces from the UPEI ExperienceBits and Pieces from the UPEI Experience
Bits and Pieces from the UPEI Experience
 
Ed presents JSF 2.2 and WebSocket to Gameduell.
Ed presents JSF 2.2 and WebSocket to Gameduell.Ed presents JSF 2.2 and WebSocket to Gameduell.
Ed presents JSF 2.2 and WebSocket to Gameduell.
 
Virtuoso RDF Triple Store Analysis Benchmark & mapping tools RDF / OO
Virtuoso RDF Triple Store Analysis Benchmark & mapping tools RDF / OOVirtuoso RDF Triple Store Analysis Benchmark & mapping tools RDF / OO
Virtuoso RDF Triple Store Analysis Benchmark & mapping tools RDF / OO
 
Better Ruby Through Design Principles
Better Ruby Through Design PrinciplesBetter Ruby Through Design Principles
Better Ruby Through Design Principles
 

Más de Mark Wilkinson

FAIR Metrics - Presentation to NIH KC1
FAIR Metrics - Presentation to NIH KC1FAIR Metrics - Presentation to NIH KC1
FAIR Metrics - Presentation to NIH KC1Mark Wilkinson
 
Introducing the fair evaluator
Introducing the fair evaluatorIntroducing the fair evaluator
Introducing the fair evaluatorMark Wilkinson
 
FAIR Projector Builder
FAIR Projector BuilderFAIR Projector Builder
FAIR Projector BuilderMark Wilkinson
 
Tech. session : Interoperability and Data FAIRness emerges from a novel combi...
Tech. session : Interoperability and Data FAIRness emerges from a novel combi...Tech. session : Interoperability and Data FAIRness emerges from a novel combi...
Tech. session : Interoperability and Data FAIRness emerges from a novel combi...Mark Wilkinson
 
smartAPIs: EUDAT Semantic Working Group Presentation @ RDA 9th Plenary
smartAPIs:  EUDAT Semantic Working Group Presentation @ RDA 9th PlenarysmartAPIs:  EUDAT Semantic Working Group Presentation @ RDA 9th Plenary
smartAPIs: EUDAT Semantic Working Group Presentation @ RDA 9th PlenaryMark Wilkinson
 
IBC FAIR Data Prototype Implementation slideshow
IBC FAIR Data Prototype Implementation   slideshowIBC FAIR Data Prototype Implementation   slideshow
IBC FAIR Data Prototype Implementation slideshowMark Wilkinson
 
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...Mark Wilkinson
 
Building SADI Services Tutorial - SIB Workshop, Geneva, December 2015
Building SADI Services Tutorial - SIB Workshop, Geneva, December 2015Building SADI Services Tutorial - SIB Workshop, Geneva, December 2015
Building SADI Services Tutorial - SIB Workshop, Geneva, December 2015Mark Wilkinson
 
Sample data and other ur ls
Sample data and other ur lsSample data and other ur ls
Sample data and other ur lsMark Wilkinson
 
Example code for the SADI BMI Calculator Web Service
Example code for the SADI BMI Calculator Web ServiceExample code for the SADI BMI Calculator Web Service
Example code for the SADI BMI Calculator Web ServiceMark Wilkinson
 
Tutorial - Creating SADI semantic-web-services
Tutorial - Creating SADI semantic-web-servicesTutorial - Creating SADI semantic-web-services
Tutorial - Creating SADI semantic-web-servicesMark Wilkinson
 
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015Mark Wilkinson
 
Force11 JDDCP workshop presentation, @ Force2015, Oxford
Force11 JDDCP workshop presentation, @ Force2015, OxfordForce11 JDDCP workshop presentation, @ Force2015, Oxford
Force11 JDDCP workshop presentation, @ Force2015, OxfordMark Wilkinson
 
Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014Mark Wilkinson
 
Enhancing Reproducibility and Transparency in Clinical Research through Seman...
Enhancing Reproducibility and Transparency in Clinical Research through Seman...Enhancing Reproducibility and Transparency in Clinical Research through Seman...
Enhancing Reproducibility and Transparency in Clinical Research through Seman...Mark Wilkinson
 
Web Science 2.0 - in silico science
Web Science 2.0 - in silico scienceWeb Science 2.0 - in silico science
Web Science 2.0 - in silico scienceMark Wilkinson
 
SWAT4LS 2011: SADI Knowledge Explorer Plug-in
SWAT4LS 2011: SADI Knowledge Explorer Plug-inSWAT4LS 2011: SADI Knowledge Explorer Plug-in
SWAT4LS 2011: SADI Knowledge Explorer Plug-inMark Wilkinson
 
SADI in Perl - Protege Plugin Tutorial (fixed Aug 24, 2011)
SADI in Perl - Protege Plugin Tutorial (fixed Aug 24, 2011)SADI in Perl - Protege Plugin Tutorial (fixed Aug 24, 2011)
SADI in Perl - Protege Plugin Tutorial (fixed Aug 24, 2011)Mark Wilkinson
 
Technologies, methods and challenges to data sharing and aggrigation
Technologies, methods and challenges to data sharing and aggrigationTechnologies, methods and challenges to data sharing and aggrigation
Technologies, methods and challenges to data sharing and aggrigationMark Wilkinson
 

Más de Mark Wilkinson (20)

FAIR Metrics - Presentation to NIH KC1
FAIR Metrics - Presentation to NIH KC1FAIR Metrics - Presentation to NIH KC1
FAIR Metrics - Presentation to NIH KC1
 
Introducing the fair evaluator
Introducing the fair evaluatorIntroducing the fair evaluator
Introducing the fair evaluator
 
FAIR Projector Builder
FAIR Projector BuilderFAIR Projector Builder
FAIR Projector Builder
 
Tech. session : Interoperability and Data FAIRness emerges from a novel combi...
Tech. session : Interoperability and Data FAIRness emerges from a novel combi...Tech. session : Interoperability and Data FAIRness emerges from a novel combi...
Tech. session : Interoperability and Data FAIRness emerges from a novel combi...
 
smartAPIs: EUDAT Semantic Working Group Presentation @ RDA 9th Plenary
smartAPIs:  EUDAT Semantic Working Group Presentation @ RDA 9th PlenarysmartAPIs:  EUDAT Semantic Working Group Presentation @ RDA 9th Plenary
smartAPIs: EUDAT Semantic Working Group Presentation @ RDA 9th Plenary
 
IBC FAIR Data Prototype Implementation slideshow
IBC FAIR Data Prototype Implementation   slideshowIBC FAIR Data Prototype Implementation   slideshow
IBC FAIR Data Prototype Implementation slideshow
 
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
 
Building SADI Services Tutorial - SIB Workshop, Geneva, December 2015
Building SADI Services Tutorial - SIB Workshop, Geneva, December 2015Building SADI Services Tutorial - SIB Workshop, Geneva, December 2015
Building SADI Services Tutorial - SIB Workshop, Geneva, December 2015
 
Sample data and other ur ls
Sample data and other ur lsSample data and other ur ls
Sample data and other ur ls
 
Example code for the SADI BMI Calculator Web Service
Example code for the SADI BMI Calculator Web ServiceExample code for the SADI BMI Calculator Web Service
Example code for the SADI BMI Calculator Web Service
 
Sadi service
Sadi serviceSadi service
Sadi service
 
Tutorial - Creating SADI semantic-web-services
Tutorial - Creating SADI semantic-web-servicesTutorial - Creating SADI semantic-web-services
Tutorial - Creating SADI semantic-web-services
 
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
 
Force11 JDDCP workshop presentation, @ Force2015, Oxford
Force11 JDDCP workshop presentation, @ Force2015, OxfordForce11 JDDCP workshop presentation, @ Force2015, Oxford
Force11 JDDCP workshop presentation, @ Force2015, Oxford
 
Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014
 
Enhancing Reproducibility and Transparency in Clinical Research through Seman...
Enhancing Reproducibility and Transparency in Clinical Research through Seman...Enhancing Reproducibility and Transparency in Clinical Research through Seman...
Enhancing Reproducibility and Transparency in Clinical Research through Seman...
 
Web Science 2.0 - in silico science
Web Science 2.0 - in silico scienceWeb Science 2.0 - in silico science
Web Science 2.0 - in silico science
 
SWAT4LS 2011: SADI Knowledge Explorer Plug-in
SWAT4LS 2011: SADI Knowledge Explorer Plug-inSWAT4LS 2011: SADI Knowledge Explorer Plug-in
SWAT4LS 2011: SADI Knowledge Explorer Plug-in
 
SADI in Perl - Protege Plugin Tutorial (fixed Aug 24, 2011)
SADI in Perl - Protege Plugin Tutorial (fixed Aug 24, 2011)SADI in Perl - Protege Plugin Tutorial (fixed Aug 24, 2011)
SADI in Perl - Protege Plugin Tutorial (fixed Aug 24, 2011)
 
Technologies, methods and challenges to data sharing and aggrigation
Technologies, methods and challenges to data sharing and aggrigationTechnologies, methods and challenges to data sharing and aggrigation
Technologies, methods and challenges to data sharing and aggrigation
 

Último

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 

Último (20)

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 

Bio2RDF Query Example Resource Consuming

  • 1. Semantic Automated Discovery and Integration SADI Services Tutorial Mark Wilkinson Isaac Peral Senior Researcher in Biological Informatics Centro de Biotecnología y Genómica de Plantas, UPM, Madrid, Spain Adjunct Professor of Medical Genetics, University of British Columbia Vancouver, BC, Canada.
  • 3. A lot of important information cannot be represented on the Semantic Web For example, all of the data that results from analytical algorithms and statistical analyses (I’m purposely excluding databases from the list of examples for reasons I will discuss in a moment)
  • 4.
  • 5.
  • 6. Varying estimates put the size of the Deep Web between 500 and 800 times larger than the surface Web
  • 7. On the WWW “automation” of access to Deep Web data happens through “Web Services”
  • 8. Traditional definitions of The Deep Web include databases that have Web FORM interfaces. HOWEVER The Life Science Semantic Web community is encouraging the establishment of SPARQL endpoints as the way to serve that same data to the world (i.e. NOT through Web Services)
  • 9. I am quite puzzled by this...
  • 10. Historically, most* bio/informatics databases do not allow direct public SQL access *yes, I know there are some exceptions!
  • 11. “We need to commit specific hardware for that [mySQL] service. We don’t use the same servers for mySQL as for the Website...” “...we resolve the situation by asking the user to stop hammering the server. This might involve temporary ban on the IP...” - ENSEMBL Helpdesk
  • 12. So... There appears to be good reasons why most data providers do not expose their databases for public query!
  • 13. Are SPARQL endpoints somehow “safer” or “better”?
  • 14. One of the early-adopters of RDF/SPARQL in the bioinformatics domain was UniProt
  • 15. How are things going for them?
  • 16. Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes To: Mark <markw@illuminae.com> Date: Tue, 19 Feb 2013 13:11:22 +0100 A message posted to the Bio2RDF Subject: SPARQL or not? MIME-Version: 1.0 Content-Transfer-Encoding: 7bit mailing list last week from Jerven From: "Mark Wilkinson" <markw@illuminae.com> Message-ID: <op.wsq5g8jenbznux@bioinformatica-mark> User-Agent: Opera Mail/12.14 (Linux) Bolleman, one of the team-members X-Antivirus: AVG for E-mail 2012.0.2238 [2639/5614] X-AVG-ID: ID798D8A94-2992BC71 behind UniProt’s push for RDF... Hi Bio2RDF maintainers, I keep on noticing this rather expensive query. CONSTRUCT { <http://bio2rdf.org/search/Paget> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://bio2rdf.org/bio2rdf_resource:SearchResults> . <http://bio2rdf.org/search/Paget> <http://bio2rdf.org/bio2rdf_resource:hasSearchResult> ?s . <http://bio2rdf.org/search/Paget> <http://www.w3.org/2000/01/rdf-schema#seeAlso> ?s . ?s <http://www.w3.org/2000/01/rdf-schema#label> ?label . ?s <http://purl.org/dc/elements/1.1/title> ?title . ?s <http://purl.org/dc/terms/title> ?dctermstitle . ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type . ?s <http://www.w3.org/2004/02/skos/core#prefLabel> ?skoslabel . ?s ?p ?o .} WHERE { ?s ?p ?o FILTER contains(str(?o), ""Paget"") OPTIONAL { ?s <http://www.w3.org/2000/01/rdf-schema#label> ?label } OPTIONAL { ?s <http://purl.org/dc/elements/1.1/title> ?title } OPTIONAL { ?s <http://purl.org/dc/terms/title> ?dctermstitle } OPTIONAL { ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type } OPTIONAL { ?s <http://www.w3.org/2004/02/skos/core#prefLabel> ?skoslabel } } OFFSET 0 LIMIT 500 It comes from the example queries on the bio2rdf landing page. Its extremely resource consuming and totally useless as it will never ever run in time. Can you please change this query to something useful and workable. And at least cache the results if you ever get them. Regards, Jerven
  • 17. Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes To: Mark <markw@illuminae.com> Date: Tue, 19 Feb 2013 13:11:22 +0100 Subject: SPARQL or not? MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: "Mark Wilkinson" <markw@illuminae.com> Message-ID: <op.wsq5g8jenbznux@bioinformatica-mark> User-Agent: Opera Mail/12.14 (Linux) X-Antivirus: AVG for E-mail 2012.0.2238 [2639/5614] X-AVG-ID: ID798D8A94-2992BC71 I keep noticing this Hi Bio2RDF maintainers, I keep on noticing this rather expensive query. rather expensive query CONSTRUCT { <http://bio2rdf.org/search/Paget> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://bio2rdf.org/bio2rdf_resource:SearchResults> . <http://bio2rdf.org/search/Paget> <http://bio2rdf.org/bio2rdf_resource:hasSearchResult> ?s . <http://bio2rdf.org/search/Paget> <http://www.w3.org/2000/01/rdf-schema#seeAlso> ?s . ?s <http://www.w3.org/2000/01/rdf-schema#label> ?label . ?s <http://purl.org/dc/elements/1.1/title> ?title . ?s <http://purl.org/dc/terms/title> ?dctermstitle . ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type . ?s <http://www.w3.org/2004/02/skos/core#prefLabel> ?skoslabel . ?s ?p ?o .} WHERE { ?s ?p ?o FILTER contains(str(?o), ""Paget"") OPTIONAL { ?s <http://www.w3.org/2000/01/rdf-schema#label> ?label } OPTIONAL { ?s <http://purl.org/dc/elements/1.1/title> ?title } OPTIONAL { ?s <http://purl.org/dc/terms/title> ?dctermstitle } OPTIONAL { ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type } OPTIONAL { ?s <http://www.w3.org/2004/02/skos/core#prefLabel> ?skoslabel } } OFFSET 0 LIMIT 500 It comes from the example queries on the bio2rdf landing page. Its extremely resource consuming and totally useless as it will never ever run in time. Can you please change this query to something useful and workable. And at least cache the results if you ever get them. Regards, Jerven
  • 18. Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes To: Mark <markw@illuminae.com> Date: Tue, 19 Feb 2013 13:11:22 +0100 Subject: SPARQL or not? MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: "Mark Wilkinson" <markw@illuminae.com> Message-ID: <op.wsq5g8jenbznux@bioinformatica-mark> User-Agent: Opera Mail/12.14 (Linux) X-Antivirus: AVG for E-mail 2012.0.2238 [2639/5614] X-AVG-ID: ID798D8A94-2992BC71 Hi Bio2RDF maintainers, I keep on noticing this rather expensive query. CONSTRUCT { <http://bio2rdf.org/search/Paget> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://bio2rdf.org/bio2rdf_resource:SearchResults> . <http://bio2rdf.org/search/Paget> <http://bio2rdf.org/bio2rdf_resource:hasSearchResult> ?s . <http://bio2rdf.org/search/Paget> <http://www.w3.org/2000/01/rdf-schema#seeAlso> ?s . ?s <http://www.w3.org/2000/01/rdf-schema#label> ?label . ?s <http://purl.org/dc/elements/1.1/title> ?title . ?s <http://purl.org/dc/terms/title> ?dctermstitle . ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type . ?s <http://www.w3.org/2004/02/skos/core#prefLabel> ?skoslabel . ?s ?p ?o .} WHERE { ?s ?p ?o FILTER contains(str(?o), ""Paget"") OPTIONAL It comes from THE EXAMPLE QUERIES { ?s <http://www.w3.org/2000/01/rdf-schema#label> ?label } OPTIONAL on the Bio2RDF landing page { ?s <http://purl.org/dc/elements/1.1/title> ?title } OPTIONAL { ?s <http://purl.org/dc/terms/title> ?dctermstitle } OPTIONAL { ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type } (my emphasis added) OPTIONAL { ?s <http://www.w3.org/2004/02/skos/core#prefLabel> ?skoslabel } } OFFSET 0 LIMIT 500 It comes from the example queries on the bio2rdf landing page. Its extremely resource consuming and totally useless as it will never ever run in time. Can you please change this query to something useful and workable. And at least cache the results if you ever get them. Regards, Jerven
  • 19. Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes To: Mark <markw@illuminae.com> Date: Tue, 19 Feb 2013 13:11:22 +0100 Subject: SPARQL or not? MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: "Mark Wilkinson" <markw@illuminae.com> Message-ID: <op.wsq5g8jenbznux@bioinformatica-mark> User-Agent: Opera Mail/12.14 (Linux) X-Antivirus: AVG for E-mail 2012.0.2238 [2639/5614] X-AVG-ID: ID798D8A94-2992BC71 Hi Bio2RDF maintainers, I keep on noticing this rather expensive query. CONSTRUCT { <http://bio2rdf.org/search/Paget> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://bio2rdf.org/bio2rdf_resource:SearchResults> . <http://bio2rdf.org/search/Paget> <http://bio2rdf.org/bio2rdf_resource:hasSearchResult> ?s . <http://bio2rdf.org/search/Paget> <http://www.w3.org/2000/01/rdf-schema#seeAlso> ?s . ?s <http://www.w3.org/2000/01/rdf-schema#label> ?label . ?s <http://purl.org/dc/elements/1.1/title> ?title . ?s <http://purl.org/dc/terms/title> ?dctermstitle . ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type . ?s <http://www.w3.org/2004/02/skos/core#prefLabel> ?skoslabel . ?s ?p ?o .} WHERE { ?s ?p ?o FILTER contains(str(?o), ""Paget"") OPTIONAL { ?s <http://www.w3.org/2000/01/rdf-schema#label> ?label } OPTIONAL { ?s <http://purl.org/dc/elements/1.1/title> ?title } OPTIONAL It’s extremely resource- { ?s <http://purl.org/dc/terms/title> ?dctermstitle } OPTIONAL { ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type } consuming and totally useless as } OPTIONAL { ?s <http://www.w3.org/2004/02/skos/core#prefLabel> ?skoslabel } it will never run in time OFFSET 0 LIMIT 500 It comes from the example queries on the bio2rdf landing page. Its extremely resource consuming and totally useless as it will never ever run in time. Can you please change this query to something useful and workable. And at least cache the results if you ever get them. Regards, Jerven
  • 20. So even people who are world-leaders in RDF and SPARQL write “expensive” and “useless” queries that (already!) are making life difficult for SPARQL endpoint providers I believe that situation will only get worse as more people begin to use the Semantic Web and as SPARQL itself becomes richer and more SQL-like
  • 21. In My Opinion History tells us, and this story IMO supports, that SPARQL endpoints might not be widely adopted by source bioinformatics data providers Historically, the majority of bioinformatics data hosts have opted for API/Service-based access to their resources
  • 22. In My Opinion Moreover, I am still obsessed with interoperability! Having a unified way to discover, and access, bioinformatics resources whether they be databases or algorithms just seems like a Good Thing™
  • 23. In My Opinion So we need to find a way to make Web Services play nicely with the Semantic Web
  • 24. Design Pattern for Web Services on the Semantic Web
  • 26. The Semantic Web causally related to
  • 27. The important bit The link is explicitly labeled causally related to ???
  • 28. causally related with http://semanticscience.org/resource/SIO_000243 SIO_000243: <owl:ObjectProperty rdf:about="&resource;SIO_000243"> <rdfs:label xml: lang="en"> is causally related with</rdfs:label> <rdf:type rdf:resource="&owl;SymmetricProperty"/> <rdf:type rdf:resource="&owl;TransitiveProperty"/> <dc:description xml:lang="en"> A transitive, symmetric, temporal relation in which one entity is causally related with another non-identical entity. </dc:description> <rdfs:subPropertyOf rdf:resource="&resource;SIO_000322"/> </owl:ObjectProperty>
  • 29. causally related with http://semanticscience.org/resource/SIO_000243 SIO_000243: <owl:ObjectProperty rdf:about="&resource;SIO_000243"> <rdfs:label xml: lang="en"> is causally related with</rdfs:label> <rdf:type rdf:resource="&owl;SymmetricProperty"/> <rdf:type rdf:resource="&owl;TransitiveProperty"/> <dc:description xml:lang="en"> A transitive, symmetric, temporal relation in which one entity is causally related with another non-identical entity. </dc:description> <rdfs:subPropertyOf rdf:resource="&resource;SIO_000322"/> </owl:ObjectProperty>
  • 30. There are many suggestions for how to bring the Deep Web into the Semantic Web using Semantic Web Services (SWS) OWL-S SAWSDL WSDL-S Others...
  • 31. There are many suggestions for how to bring the Deep Web into the Semantic Web using Semantic Web Services (SWS) Describe input data Describe output data Describe how the system manipulates the data Describe how the world changes as a result
  • 32. There are many suggestions for how to bring the Deep Web into the Semantic Web using Semantic Web Services (SWS) Describe input data Usually through “semantic annotation” Describe output data of XML Schema Describe how the system manipulates the data Describe how the world changes as a result
  • 33. There are many suggestions for how to bring the Deep Web into the Semantic Web using Semantic Web Services (SWS) Describe input data In the least-semantic case, the input and output data is “vanilla” Describe output data XML Describe how the system manipulates the data Describe how the world changes as a result
  • 34. There are many suggestions for how to bring the Deep Web into the Semantic Web using Semantic Web Services (SWS) Describe input data In the “most semantic” case (WSDL) RDF is converted into XML, Describe output data then back to RDF again Describe how the system manipulates the data Describe how the world changes as a result
  • 35. There are many suggestions for how to bring the Deep Web into the Semantic Web using Semantic Web Services (SWS) Describe input data The rigidity of XML Schema is the antithesis of the Describe output data Semantic Web! Describe how the system manipulates the data Describe how the world changes as a result
  • 36. There are many suggestions for how to bring the Deep Web into the Semantic Web using Semantic Web Services (SWS) Describe input data So... Perhaps we shouldn’t be using XML Describe output data Schema at all...?? Describe how the system manipulates the data Describe how the world changes as a result
  • 37. There are many suggestions for how to bring the Deep Web into the Semantic Web using Semantic Web Services (SWS) Describe input data Describe output data Describe how the system manipulates the data HARD! Describe how the world changes as a result
  • 38. There are many suggestions for how to bring the Deep Web into the Semantic Web using Semantic Web Services (SWS) Describe input data Describe output data Describe how the system manipulates the data Un-necessary? Describe how the world changes as a result
  • 39. Lord, Phillip, et al. The Semantic Web–ISWC 2004 (2004): 350-364.
  • 40. Scientific Web Services are DIFFERENT! Lord, Phillip, et al. The Semantic Web–ISWC 2004 (2004): 350-364.
  • 41. “The service interfaces within bioinformatics are relatively simple. An extensible or constrained interoperability framework is likely to suffice for current demands: a fully generic framework is currently not necessary.” Lord, Phillip, et al. The Semantic Web–ISWC 2004 (2004): 350-364.
  • 42. Scientific Web Services are DIFFERENT They’re simpler! Rather than waiting for a solution to the more general problem (which may be years away... or more!) can we solve the Semantic Web Service problem within the scientific domain while still being fully standards-compliant?
  • 43. Other “philosophical” considerations
  • 44. v.v. being Semantic Webby, what is missing from this list? Describe input data Describe output data Describe how the system manipulates the data Describe how the world changes as a result
  • 46. causally related with http://semanticscience.org/resource/SIO_000243 The Semantic Web works because of relationships!
  • 47. causally related with http://semanticscience.org/resource/SIO_000243 The Semantic Web works because of relationships! In 2008 I proposed that, in the Semantic Web world, algorithms should be viewed as “exposing” relationships between the input and output data
  • 49. SADI AACTCTTCGTAGTG... has_seq_string sequence AACTCTTCGTAGTG... has_seq_string has homology sequence to BLAST Terminal Flower type species SADI requires you to explicitly declare as part of your analytical output, gene A. thal. the biological relationship that your algorithm “exposed”.
  • 50. Another “philosophical” decision was to abandon XML Schema In a world that is moving towards RDF representations of all data it makes no sense to convert semantically rich RDF into semantic-free Schema-based XML then back into RDF again
  • 51. The final philosophical decision was to abandon SOAP The bioinformatics community seems to be very receptive to pure-HTTP interfaces (e.g. the popularity of REST-like APIs) So SADI uses simple HTTP POST of just the RDF input data (no message scaffold whatsoever)
  • 52. Part III SADI SERVICE DISCOVERY AND INVOCATION
  • 53. In slightly more detail...
  • 54. ID Name Height Weight Age 24601 Jean Valjean 1.8m 84kg 45 7474505B Jake Blues 1.73m 101kg 31 6 — 1.88m 75kg 39 ... ... ... ... ...
  • 55. ID Name Height Weight Age 24601 Jean Valjean 1.8m 84kg 45 7474505B Jake Blues 1.73m 101kg 31 6 — 1.88m 75kg 39 ... ... ... ... ...
  • 56. OWL-DL Classes ID Name Height Weight Age 24601 Jean Valjean 1.8m 84kg 45 7474505B Jake Blues 1.73m 101kg 31 6 — 1.88m 75kg 39 ... ... ... ... ...
  • 57. Property restrictions in OWL Class definition ID Name Height Weight Age 24601 Jean Valjean 1.8m 84kg 45 7474505B Jake Blues 1.73m 101kg 31 6 — 1.88m 75kg 39 ... ... ... ... ...
  • 58. ID Name Height Weight Age 24601 Jean Valjean 1.8m 84kg 45 7474505B Jake Blues 1.73m 101kg 31 6 — 1.88m 75kg 39 ... ... ... ... ...
  • 59. A reasoner determines that Patient #24601 is an OWL Individual of the Input service Class ID Name Height Weight Age 24601 Jean Valjean 1.8m 84kg 45 7474505B Jake Blues 1.73m 101kg 31 6 — 1.88m 75kg 39 ... ... ... ... ...
  • 60. NOTE THE URI OF THE INPUT INDIVIDUAL Patient:24601 ID Name Height Weight Age 24601 Jean Valjean 1.8m 84kg 45 7474505B Jake Blues 1.73m 101kg 31 6 — 1.88m 75kg 39 ... ... ... ... ...
  • 61. ID Name Height Weight Age BMI 24601 Jean Valjean 1.8m 84kg 45 25.9 7474505B Jake Blues 1.73m 101kg 31 6 — 1.88m 75kg 39 ... ... ... ... ...
  • 62. NOTE THE URI OF THE OUTPUT INDIVIDUAL Patient:24601 ID Name Height Weight Age BMI 24601 Jean Valjean 1.8m 84kg 45 25.9 7474505B Jake Blues 1.73m 101kg 31 6 — 1.88m 75kg 39 ... ... ... ... ...
  • 63.
  • 64. The URI of the input is linked by a meaningful predicate to the output (either literal output or another URI)
  • 65. Therefore, by connecting SADI services together in a workflow you end-up with an unbroken chain of Linked Data
  • 66. Part IV SADI TO THE EXTREME: “WEB SCIENCE 2.0”
  • 67. A proof-of-concept query engine & registry Objective: answer biologists’ questions
  • 68. The SHARE registry indexes all of the input/output/relationship triples that can be generated by all known services This is how SHARE discovers services
  • 69. We wanted to duplicate a real, peer-reviewed, bioinformatics analysis simply by building a model in the Web describing what the answer (if one existed) would look like
  • 70. ...the machine had to make every other decision on it’s own
  • 71. This is the study we chose:
  • 72. Gordon, P.M.K., Soliman, M.A., Bose, P., Trinh, Q., Sensen, C.W., Riabowol, K.: Interspecies data mining to predict novel ING-protein interactions in human. BMC genomics. 9, 426 (2008).
  • 73. Original Study Simplified Using what is known about interactions in fly & yeast predict new interactions with your protein of interest
  • 74. “Pseudo-code” Abstracted Workflow Given a protein P in Species X Find proteins similar to P in Species Y Retrieve interactors in Species Y Sequence-compare Y-interactors with Species X genome (1)  Keep only those with homologue in X Find proteins similar to P in Species Z Retrieve interactors in Species Z Sequence-compare Z-interactors with (1)  Putative interactors in Species X
  • 76. Modeling the science... ProbableInteractor is homologous to ( Potential Interactor from ModelOrganism1…) and Potential Interactor from ModelOrganism2…) Probable Interactor is defined in OWL as a subClass - something that appears as a potential interactor in both comparator model organisms.
  • 77. Running the Web Science Experiment In a local data-file provide the protein we are interested in and the two species we wish to use in our comparison taxon:9606 a i:OrganismOfInterest . # human uniprot:Q9UK53 a i:ProteinOfInterest . # ING1 taxon:4932 a i:ModelOrganism1 . # yeast taxon:7227 a i:ModelOrganism2 . # fly
  • 78. The tricky bit is... In the abstract, the search for homology is “generic” – ANY Protein, ANY model system But when the machine does the experiment, it will need to use (at least) two organism-specific resources because the answer requires information from two taxon:4932 a i:ModelOrganism1 . # yeast declared species taxon:7227 a i:ModelOrganism2 . # fly
  • 79. This is the question we ask: (the query language here is SPARQL) PREFIX i: <http://sadiframework.org/ontologies/InteractingProteins.owl#> SELECT ?protein FROM <file:/local/workflow.input.n3> WHERE { ?protein a i:ProbableInteractor . } The URL of our OWL model (ontology) defining Probable Interactors
  • 80. Each relationship (property-restriction) in the OWL Class is then matched with a SADI Service The matched SADI Service can generate data that fulfils that property restriction (i.e. produces triples with that S/P/O pattern)
  • 81. SHARE chains these SADI services into an analytical workflow... ...the outputs from that workflow are Instances (OWL Individuals) of Probable Interactors
  • 82. SHARE derived (and executed) the following workflow automatically These are different SADI Web Services... ...selected at run-time based on the same model
  • 83.
  • 84. Keys to Success: 1: Use standards 2: Focus on predicates, not classes 3: Use these predicates to define, rather than assert, classes 4: Make sure all URIs resolve, and resolve to something useful 5: Never leave the RDF world... (abandon vanilla XML, even for Web Services!) 6: Use reasoners... Everywhere... Always!
  • 85. Part V THE TOOLS AVAILABLE
  • 86. Part V - A SERVICE PROVISION
  • 87. Libraries • Perl • Java • Python Plug-in to Protege • Perl service scaffolds • Java service scaffolds
  • 88. Part V - B CLIENTS
  • 89. SHARE • you’ve already seen how SHARE works...
  • 90. Taverna • Contextual service discovery • Automatic RDF serialization and deserialization beetween SADI and non-SADI services • Note that Taverna is not as rich a client as SHARE. The reason is that SHARE will aggregate and re-reason after every service invocation. There is no (automatic) data aggregation in Taverna.
  • 91. Using SADI services – building a workflow The next step in the workflow is to find a SADI service that takes the genes from getKEGGGenesByPathway and returns the proteins that those genes code for.
  • 92. Using SADI services – building a workflow Right-click on the service output port and click Find services that consume KEGG_Record…
  • 93. Using SADI services – building a workflow Select getUniprotByKeggGene from the list of SADI services and click Connect.
  • 94. Using SADI services – building a workflow The getUniprotByKeggGene service is added to the workflow and automatically connected to the output from getKEGGGenesByPathway.
  • 95. Using SADI services – building a workflow Add a new workflow output called protein and connect the output from the getUniprotByKeggGene service to it.
  • 96. Using SADI services – building a workflow The next step in the workflow is to find a SADI service that takes the proteins and returns sequences of those proteins. Right-click on the encodes output port and click Find services that consume UniProt_Record…
  • 97. Using SADI services – building a workflow The UniProt info service attaches the property hasSequence so select this service and click Connect.
  • 98. Using SADI services – building a workflow The UniProt info service is added to the workflow and automatically connected to the output from getUniprotByKeggGene .
  • 99. Using SADI services – building a workflow Add a new workflow output called sequence and connect the output from the hasSequence output from the UniProt info service to it.
  • 100. Using SADI services – building a workflow The KEGG pathway were interested in is "hsa00232”, so we’ll add it as a constant value. Right-click on the KEGG_PATHWAY_Record input port and click Constant value.
  • 101. Using SADI services – building a workflow Enter the value hsa00232 and click OK.
  • 102. Using SADI services – building a workflow The workflow is now complete and ready to run.
  • 103. IO Informatics Knowledge Explorer plug-in • “Bootstrapping” of semantics using known URI schema (identifiers.org, LSRN, Bio2RDF, etc.) • Contextual service discovery • Automatic packaging of appropriate data from your data-store and automated service invocation using that data. •This uses some not-widely-known services and metadata that is in the SHARE registry!!
  • 104. The SADI plug-in to the IO Informatics’ Knowledge Explorer ...a quick explanation of how we “boot-strap” semantics...
  • 105. The Knowledge Explorer Personal Edition, and the SADI plug-in, are freely available.
  • 106. Sentient Knowledge Explorer is a retrieval, integration, visualization, query, and exploration environment for semantically rich data
  • 107. Most imported data-sets will already have properties (e.g. “encodes”) …and the data will already be typed (e.g. “Gene” or “Protein”) …so finding SADI Services to consume that data is ~trivial
  • 108.
  • 109.
  • 110.
  • 111. Now what...?? No properties... No rdf:type... How do I find a service using that node? What *is* that node anyway??
  • 112. In the case of LSRN URIs, they resolve to: <lsrn:DragonDB_Locus_Record rdf:about="http://lsrn.org/DragonDB_Locus:CHO" <dc:identifier>CHO</dc:identifier> <sio:SIO_000671> <!-- has identifier --> <lsrn:DragonDB_Locus_Identifier> <sio:SIO_000300>CHO</sio:SIO_000300> <!-- has value --> </lsrn:DragonDB_Locus_Identifier> </sio:SIO_000671> </lsrn:DragonDB_Locus_Record> </rdf:RDF>
  • 113. In the case of LSRN URIs, they resolve to: <lsrn:DragonDB_Locus_Record rdf:about="http://lsrn.org/DragonDB_Locus:CHO <dc:identifier>CHO</dc:identifier> <sio:SIO_000671> <!-- has identifier --> <lsrn:DragonDB_Locus_Identifier> <sio:SIO_000300>CHO</sio:SIO_000300> <!-- has value --> </lsrn:DragonDB_Locus_Identifier> </sio:SIO_000671> </lsrn:DragonDB_Locus_Record> </rdf:RDF> The Semantic Science Integrated Ontology (Dumontier) has a model for how to describe database records, including explicitly making the record identifier an attribute of that record; in our LSRN metadata, we also explicitly rdf:type both records and identifiers.
  • 114. Now we have enough information to start exploring global data...
  • 115. Menu option provided by the plugin
  • 116. Discovered the (only) service that consumes these kinds of records
  • 117. Output is added to the graph (with some extra logic to make visualization of complex data structures a bit easier)
  • 119. ...and of course, these links are “live”
  • 120.
  • 121. What about URIs other than LSRN?
  • 122.
  • 123. HTTP POST the URI to the SHARE Resolver Service It will (try to) return you SIO-compliant RDF metadata about that URI (this is a typical SADI service) The resolver currently recognizes a few different sharted-URI schemes (e.g. Bio2RDF, Identifiers.org) and can be updated with new patterns
  • 124. Next problem: Knowledge Explorer and therefore the plug-in are written in C# All of our interfaces are described in OWL C# reasoners are extremely limited at this time
  • 125. This problem manifests itself in two ways: 1. An individual on the KE canvas has all the properties required by a Service in the registry, but is not rdf:typed as that Service’s input type  how do you discover that Service so that you can add it to the menu? 2. For a selected Service from the menu, how does the plug-in know which data-elements it needs to extract from KE to send to that service in order to fulfil it’s input property-restrictions?
  • 126. If I select a canvas node, and ask SADI to find services, it will...
  • 127. The get_sequence_for_region service required ALL of this (hidden) information
  • 128. Nevertheless: (a) The service can be discovered based on JUST this node selection (b) The service can be invoked based on JUST this node selection
  • 129. Voila! How did the plug-in discover the service, and determine which data was required to access that service based on an OWL Class definition, without a reasoner?
  • 130. SELECT ?x, ?y FROM knowledge_explorer_database WHERE { ?x foaf:name ?y } Convert Input OWL Class def’n into an ~equivalent SPARQL query Service Description INPUT OWL Class Store together NamedIndividual: things with with index a “name” property INDEX from “foaf” ontology The service Registry provides a OUTPUT OWL Class “greeting” GreetedIndividual: things with property based a “greeting” property from “hello” ontology on a “name” property
  • 131. Just to ensure that I don’t over-trivialize this point, the REAL SPARQL query that extracts the input for this service is...
  • 132. CONSTRUCT { ?input a <http://sadiframework.org/ontologies/GMOD/BiopolymerRegion.owl#BiopolymerRegion> . ?input <http://sadiframework.org/ontologies/GMOD/BiopolymerRegion.owl#position> ?position . ?position a <http://sadiframework.org/ontologies/GMOD/RangedSequencePosition.owl#RangedSequencePosition> . ?position <http://sadiframework.org/ontologies/GMOD/RangedSequencePosition.owl#coordinate> ?start . ?start a <http://sadiframework.org/ontologies/GMOD/RangedSequencePosition.owl#StartPosition> . ?start <http://semanticscience.org/resource/SIO_000300> ?startValue . ?position <http://sadiframework.org/ontologies/GMOD/RangedSequencePosition.owl#coordinate> ?end . ?end a <http://sadiframework.org/ontologies/GMOD/RangedSequencePosition.owl#EndPosition> . ?end <http://semanticscience.org/resource/SIO_000300> ?endValue . ?position <http://sadiframework.org/ontologies/GMOD/RangedSequencePosition.owl#in_relation_to> ?sequence . ?sequence <http://semanticscience.org/resource/SIO_000210> ?feature . ?feature <http://semanticscience.org/resource/SIO_000008> ?identifier . ?identifier <http://semanticscience.org/resource/SIO_000300> ?featureID . ?sequence <http://semanticscience.org/resource/SIO_000210> ?strand . ?strand <http://semanticscience.org/resource/SIO_000093> ?strandFeature . ?strandFeature a ?strandFeatureType . ?strandFeature <http://semanticscience.org/resource/SIO_000008> ?strandFeatureIdentifier . ?strandFeatureIdentifier <http://semanticscience.org/resource/SIO_000300> ?strandFeatureID . ?strand a ?strandType . } WHERE { ?input <http://sadiframework.org/ontologies/GMOD/BiopolymerRegion.owl#position> ?position . ?position <http://sadiframework.org/ontologies/GMOD/RangedSequencePosition.owl#coordinate> ?start . ?start a <http://sadiframework.org/ontologies/GMOD/RangedSequencePosition.owl#StartPosition> . ?start <http://semanticscience.org/resource/SIO_000300> ?startValue . ?position <http://sadiframework.org/ontologies/GMOD/RangedSequencePosition.owl#coordinate> ?end . ?end a <http://sadiframework.org/ontologies/GMOD/RangedSequencePosition.owl#EndPosition> . ?end <http://semanticscience.org/resource/SIO_000300> ?endValue . ?position <http://sadiframework.org/ontologies/GMOD/RangedSequencePosition.owl#in_relation_to> ?sequence . { ?sequence <http://semanticscience.org/resource/SIO_000210> ?feature . ?feature <http://semanticscience.org/resource/SIO_000008> ?identifier . ?identifier <http://semanticscience.org/resource/SIO_000300> ?featureID . } UNION { ?sequence <http://semanticscience.org/resource/SIO_000210> ?strand . ?strand <http://semanticscience.org/resource/SIO_000093> ?strandFeature . { ?strandFeature a <http://sadiframework.org/ontologies/GMOD/Feature.owl#Feature> . } UNION { ?strandFeature <http://semanticscience.org/resource/SIO_000008> ?strandFeatureIdentifier . ?strandFeatureIdentifier <http://semanticscience.org/resource/SIO_000300> ?strandFeatureID . }. { ?strand a <http://sadiframework.org/ontologies/GMOD/Strand.owl#PlusStrand> . ?strand a ?strandType . } UNION { ?strand a <http://sadiframework.org/ontologies/GMOD/Strand.owl#MinusStrand> . ?strand a ?strandType . }. }. }
  • 133. Summary While the Knowledge Explorer plug-in has similar functionality to other tools we have built for SADI, it takes advantage of some features of the SADI Registry, and SADI in general, that are not widely-known. We hope that the availability of these features encourages development of SADI tooling in other languages that have limited access to reasoning.
  • 134. Luke McCarthy Lead Developer, SADI project Benjamin VanderValk Developer, SADI project