Science in the Web, from hypothesis to result. Publishing in silico experiments IN the Web allows us to immediately and precisely disseminate new knowledge that can affect other Web Science experiments. This is the "singularity" where a new discovery is immediately put into practice
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Web Science, SADI, and the Singularity
1. Web Science 2.0
Conducting in silico research in the Web
from hypothesis to publication
Mark Wilkinson
Isaac Peral Senior Researcher in Biological Informatics
Centro de Biotecnología y Genómica de Plantas, UPM, Madrid, Spain
Adjunct Professor of Medical Genetics, University of British Columbia
Vancouver, BC, Canada.
2. Context
“While it took 2,300 years after the first
report of angina for the condition to be
commonly taught in medical curricula,
modern discoveries are being
disseminated at an increasingly rapid
pace. Focusing on the last 150 years,
the trend still appears to be linear,
approaching the axis around 2025.”
The Healthcare Singularity and the Age of Semantic Medicine,
Michael Gillam, et al, The Fourth Paradigm: Data-Intensive
Scientific Discovery Tony Hey (Editor), 2009
Slide adapted with permission from Joanne Luciano, Presentation
at Health Web Science Workshop 2012, Evanston IL, USA
June 22, 2012.
3. “The Singularity”
The X-intercept is where, the moment a discovery is made,
it is immediately put into practice
(not only medical practice, but any research endeavour...)
The Healthcare Singularity and the Age of Semantic Medicine, Michael Gillam, et al, The Fourth Paradigm: Data-Intensive Scientific Discovery Tony Hey (Editor), 2009
Slide Borrowed with Permission from Joanne Luciano, Presentation at Health Web Science Workshop 2012, Evanston IL, USA
June 22, 2012.
10. We wanted to duplicate
a real, peer-reviewed, bioinformatics analysis
simply by building a model in the Web
describing what the answer
(if one existed)
would look like
15. By clicking here you cause this incredibly
powerful computational tool called The Web
to retrieve a chunk of text and images that
can only be understood by a human...
19. We wanted to duplicate
a real, peer-reviewed, bioinformatics analysis
simply by building a model in the Web
describing what the answer
(if one existed)
would look like
22. Gordon, P.M.K., Soliman, M.A., Bose, P., Trinh, Q., Sensen, C.W., Riabowol, K.: Interspecies
data mining to predict novel ING-protein interactions in human. BMC genomics. 9, 426 (2008).
23. Original Study Simplified
Using what is known about interactions in fly & yeast
predict new interactions with your
human protein of interest
24. Abstracted
Given a protein P in Species X
Find proteins similar to P in Species Y
Retrieve interactors in Species Y
Sequence-compare Y-interactors with Species X genome
(1) Keep only those with homologue in X
Find proteins similar to P in Species Z
Retrieve interactors in Species Z
Sequence-compare Z-interactors with (1)
Putative interactors in Species X
25. Modeling the answer...
OWL
Web Ontology Language (OWL) is the
language approved by the W3C
for representing knowledge in the Web
26. Modeling the answer...
Note that every word in
this diagram is, in reality, a
URL (because it is OWL)
The model of the answer is
published in The Web
and borrows ideas from
other models published in
The Web
27. Modeling the answer...
ProbableInteractor
is homologous to (
Potential Interactor from ModelOrganism1…)
and
Potential Interactor from ModelOrganism2…)
Probable Interactor is defined in OWL as a subclass of Potential Interactor
that requires homologous pairs of interacting proteins to exist in both
comparator model organisms.
(Effectively, an intersection)
29. Running the Web Science Experiment
In a local data-file
provide the protein we are interested in
and the two species we wish to use in our comparison
taxon:9606 a i:OrganismOfInterest . # human
uniprot:Q9UK53 a i:ProteinOfInterest . # ING1
taxon:4932 a i:ModelOrganism1 . # yeast
taxon:7227 a i:ModelOrganism2 . # fly
30. The tricky bit is...
In the abstract, the
search for homology is
“generic” – ANY model
organism.
But when the machine
attempts to do the
experiment, it will have
to use a variety of
resources because the
answer requires taxon:4932 a i:ModelOrganism1 . # yeast
information from two taxon:7227 a i:ModelOrganism2 . # fly
different species
31. This is the question we ask:
(the query language here is SPARQL)
PREFIX i: <http://sadiframework.org/ontologies/InteractingProteins.owl#>
SELECT ?protein
FROM <file:/local/workflow.input.n3>
WHERE {
?protein a i:ProbableInteractor .
}
The reference (URL) to our OWL model of the answer
32. Our system then derives (and executes) the following workflow automatically
These are different
Web services!
...selected at run-time
based on the same model
47. What is the phenotype of every allele of the
Antirrhinum majus DEFICIENS gene
SELECT ?allele ?image ?desc
WHERE {
locus:DEF genetics:hasVariant ?allele .
?allele info:visualizedByImage ?image .
?image info:hasDescription ?desc
}
48. What is the phenotype of every allele of the
Antirrhinum majus DEFICIENS gene
SELECT ?allele ?image ?desc
WHERE {
locus:DEF genetics:hasVariant ?allele .
?allele info:visualizedByImage ?image .
?image info:hasDescription ?desc
}
Note that there is no “FROM” clause!
We don‟t tell it where it should get the information,
The machine has to figure that out by itself...
52. The query results are live hyperlinks
to the respective Database or images
53. Neither SADI nor SHARE
know anything about
plant biology or genetics
54. What pathways does UniProt protein P47989 belong to?
PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#>
PREFIX ont: <http://ontology.dumontierlab.com/>
PREFIX uniprot: <http://lsrn.org/UniProt:>
SELECT ?gene ?pathway
WHERE {
uniprot:P47989 pred:isEncodedBy ?gene .
?gene ont:isParticipantIn ?pathway .
}
55. What pathways does UniProt protein P47989 belong to?
PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#>
PREFIX ont: <http://ontology.dumontierlab.com/>
PREFIX uniprot: <http://lsrn.org/UniProt:>
SELECT ?gene ?pathway
WHERE {
uniprot:P47989 pred:isEncodedBy ?gene .
?gene ont:isParticipantIn ?pathway .
}
56. What pathways does UniProt protein P47989 belong to?
PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#>
PREFIX ont: <http://ontology.dumontierlab.com/>
PREFIX uniprot: <http://lsrn.org/UniProt:>
SELECT ?gene ?pathway
WHERE {
uniprot:P47989 pred:isEncodedBy ?gene .
?gene ont:isParticipantIn ?pathway .
}
Note again that there is no “From” clause…
I have not told SHARE where to look for the
answer, I am simply asking my question
60. Two different
Two different providers of
providers of pathway
gene information
information (KEGG and
(KEGG & GO);
NCBI); were found &
were found & accessed
accessed
65. Show me the latest Blood Urea Nitrogen and Creatinine levels
of patients who appear to be rejecting their transplants
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX patient: <http://sadiframework.org/ontologies/patients.owl#>
PREFIX l: <http://sadiframework.org/ontologies/predicates.owl#>
SELECT ?patient ?bun ?creat
FROM <http://sadiframework.org/ontologies/patients.rdf>
WHERE {
?patient rdf:type patient:LikelyRejecter .
?patient l:latestBUN ?bun .
?patient l:latestCreatinine ?creat .
}
66. Show me the latest Blood Urea Nitrogen (BUN) and
Creatinine levels of patients who appear to be
rejecting their transplants
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX patient: <http://sadiframework.org/ontologies/patients.owl#>
PREFIX l: <http://sadiframework.org/ontologies/predicates.owl#>
SELECT ?patient ?bun ?creat
FROM <http://sadiframework.org/ontologies/patients.rdf>
WHERE {
?patient rdf:type patient:LikelyRejecter .
?patient l:latestBUN ?bun .
?patient l:latestCreatinine ?creat .
}
67. Likely Rejecter:
A patient who has creatinine levels
that are increasing over time
- - Mark D Wilkinson‟s definition
68. Likely Rejecter:
…but there is no “likely rejecter”
column or table in our database…
only blood chemistry measurements
at various time-points
73. The machine decides
by itself
that it needs to do a
Linear Regression analysis
on the blood creatinine measurements
in order to answer your question
74. The machine decides
by itself
how and where that analysis
can be done
and does it automatically!
81. Ontologies explicitly define the kinds of
things that (can) exist…
…and what those things are “like”
i.e. what properties they have
(color, weight, shape, texture, temperature, “state”)
and what relationships they have to one another
(inside-of, adjacent-to, part-of, binds-to, controls, inhibits,
degrades, etc.)
82. So we create ………….
ontologies about biology
and health
We* publish them on the Web
* We… or anybody! Anybody can publish an ontology!
83. My definition of a Likely Rejecter is encoded in
a machine-readable document written in the OWL Ontology language
Basically:
“the regression line over creatinine measurements should have an increasing slope”
84. Our ontology refers to other ontologies (possibly published by other people)
to learn about what the properties of “regression models” are
e.g. that regression models have slopes and intercepts
and that slopes and intercepts have decimal values
85. SHARE examines the query
Looks on the Web for ontologies that describe the
problem it is trying to solve, and “reads” them
then uses that “knowledge” to figure out which
data-sources and analytical tools it needs
to answer the query
86. The way SHARE “interprets” data varies
depending on the context of the query
(i.e. which ontologies it reads – Mine? Yours?)
and on what part of the query
it is trying to answer at any given moment
(which ontological concept is relevant to that clause)
91. Example?
The data had the „qualities/properties‟ that
allowed one machine to interpret
that they were Blood Creatinine measurements
(e.g. to determine which patients were rejecting)
92. Example?
But the data also had the „qualities/properties‟ that
allowed another machine to interpret them as
Simple X/Y coordinate data
(e.g. the Linear Regression calculation tool)
93. Benefit
of late binding
Data is amenable to
constant re-interpretation
103. Every component of the model
Every component of the input data
Every component of the output data
is a URL
Therefore the question, the experiment, and the
answer, are immediately published IN the Web
104. Every component of the model
Every component of the input data
Every component of the output data
is a URL
The answer, and the knowledge derived from it,
is immediately available to search engines
and moreover, can affect the outcome of other
Web Science experiments
113. In Web Science 2.0
Model what the world would “look like”
if your hypothesis were true
Then ask “is there any data that
fits that model?”
114. Like the blind men examining an elephant
Seemingly different aspects of Web Science research
are embodied in/derived from the same “thing”
The OWL Model
115. Our vision of Web Science 2.0
Hypothesis Query
Workflow
Ontology Result
Materials &
Methods
These can be automatically derived through
provenance information during workflow execution
116.
117. Please join us!
SADI and SHARE are Open-Source projects
http://sadiframework.org
119. University of British Columbia
Luke McCarthy – Lead Dev. Edward Kawas
Everything... SADI Service auto-generator
Benjamin VanderValk Ian Wood
SHARE & SADI & Experimental modeling & Experimental modeling project
myHeath Button
Soroush Samadian
Cardiovascular data modeling and queries
120. C-BRASS Collaborators at other sites
U of New Brunswick Carleton University
Dr. Chris Baker Dr. Michel Dumontier
Alexandre Riazanov Marc-Alexandre Nolin
Leonid Chepelev
Steve Etlinger
Nichaella Kieth
Jose Cruz