SlideShare una empresa de Scribd logo
1 de 56
Descargar para leer sin conexión
Ontology-Based Data
Access: Why It is So Cool!
Josef Hardi
josef.hardi@stanford.edu
September 4, 2015
Ontology-Based Data Access is a concept developed by Diego Calvanese and
Mariano Rodriguez-Muro in KRDB Research Centre at Free University of Bozen-
Bolzano
Outline
● What is Ontology-based Data Access, or OBDA?
○ Motivation
○ System Black Box
○ Process Illustration
● Project -ontop- and Quest
● Experiment
○ Query Answering Performance
○ -ontop- vs Semantika
● Conclusion
● Q&A
Acknowledgement
Parts of the slides in this presentation are taken from
tutorial or lecture slides by:
Diego Calvanese,
Mariano Rodriguez-Muro, and
Martin Rezk
What is….
Ontology-based Data Access?
Think a scenario
Data Layer
Data Service
conceptual view
Image source: (various sources)
What is Ontology-based Data Access?
Data Access Bottleneck
Image source: Rezk, Martin. Ontologies Ontop Databases http://www.slideshare.net/MartnRezk/slides-swat4-ls
What is Ontology-based Data Access?
Query Answering
tbl_patient+2015
PatientId Name Cell_type cStage
1 Mary true 7
2 John false 6
3 Bill false 4
Cancer type is:
● NSCLC is when Cell_type is
false,
● SCLC is when Cell_type is
true.
Cancer stage is:
● I, II, III, IIIa, IIIb, IV for
NSCLC, corr. cStage: 1 - 6,
● Limited and Extensive for
SCLC, corr. cStage: 7 and 8.
There is “hidden logic” inside
the table that is specifically
used by the application. Not
for querying the data!
Query Answering
tbl_patient+2015
PatientId Name Cell_type cStage
1 Mary true 7
2 John false 6
3 Bill false 4
Name cStage
John 6
Bill 4
RESULT
select Name, cStage
from tbl_patient+2015
where Cell_type = false
and cStage >= 4;
Can we do it better?
Show me all the patients’ name and stage
status that have large tumor with at least in
a minimum stage IIIa.
Query Answering
Bridge the semantics
tbl_patient+2015
PatientId Name Cell_type cStage
1 Mary true 7
2 John false 6
3 Bill false 4
Cancer type is:
● NSCLC is when Cell_type is
false,
● SCLC is when Cell_type is
true.
Cancer stage is:
● I, II, III, IIIa, IIIb, IV for
NSCLC,
● Limited and Extensive for
SCLC.
hasStage
ISA
name
ISA
ISA
hasNeoplasm
SNOMED-CT
*SCLC = Small Cell Lung Cancer, NSCLC = Non-Small Cell Lung Cancer
Query Answering
OBDA Answering
● (Data) Sources: represents the external and independent
resources. Existing organization assets.
● Ontology: provides a unified common vocabulary. The
conceptual view of the underlying data
● Mappings: relates the terms in ontology to a set of SQL
views.
Image source: Rezk, Martin. Ontologies Ontop Databases http://www.slideshare.net/MartnRezk/slides-swat4-ls
Query Answering
OBDA Answering Black Box
● Rewriting: Create a new query which is the expanded
version of the original query, using all the defined
inclusion assertions in the ontology.
● Unfolding: Substitute each part in the expanded query
with corresponding SQL views from the given mappings.
● Evaluation: Execute the complete SQL to a target RDBMS.
Image source: Kontchakov, Roman, et.al. Ontology-based Data Access: Ontop of Databases. http://www.dcs.bbk.ac.uk/~roman/papers/ISWC13.pdf
Query Answering
OBDA Answering Illustration
Q: Show me all the Person in the hospital?
Q’: Show me
all the Person UNION
all the Nurse UNION
all the Doctor UNION
all the Patient UNION
anyone who has
Neoplasm in the hospital?
Rewritten
Look where is the source(s)
(No source)
Q’: Show me
all the Person UNION
all the Nurse UNION
all the Doctor UNION
all the Patient UNION
anyone who has
Neoplasm
in the hospital?
Get the list from table Nurse
Get the list from table Doctor
Get the list from table Patient
Get the list from table Cancer
Patient 2015
M
M
M
M
M
OBDA Answering Illustration
Substitute with SQL views
Q’: Show me
all the Person UNION
select NurseId from tbl_nurse UNION
select doc_id from tbl_doctor UNION
select pid from tbl_patient UNION
select PatientId from tbl_patient+2015
in the hospital?
OBDA Answering Illustration
Unfolded
Execute the SQL
select NurseId from tbl_nurse
UNION
select doc_id from tbl_doctor
UNION
select pid from tbl_patient
UNION
select PatientId from tbl_patient+2015
OBDA Answering Illustration
Evaluated
42!
(Computational) Price to Pay
Query answering in OBDA setting:
● PTIME in the size of ontology (efficiently
tractable)
● AC0
in the size of the data (very efficiently
tractable)
● NP-Complete in the size of query
(exponential)
*Tractable problem: there exists an algorithm that will eventually terminate in a
reasonable amount of time and return you the result.
OBDA Answering Illustration
-ontop- Project
● A platform to query relational databases using
SPARQL language,
● The implementation started in 2010,
● Supports several database systems, like: MySQL,
PostgreSQL, H2, SQL Server, Oracle, IBM DB2.
● Distributed under open-source license.
● It is currently being developed within the context of
EU Optique project.
● Fantastic add-ons: Efficient rewriting, Query
optimization, Transitive query, Rules entailment,
Cross-linked datasets.
-ontop-
-ontop- for Protege
http://ontop.inf.unibz.it/
-ontop-
Experiment
Semantika Project
http://obidea.com/semantika/
Experiment
Berlin SPARQL Benchmark (BSBM)
● A benchmark suite built around e-commerce
domain.
○ A set of products is offered by different vendors and
customers are posting product reviews.
● Consists of 12 different queries, emulating
the search and navigation pattern of a
consumer looking for a product.
● A Query-Mix consists of 25 querying actions
that simulate a product search scenario.
● No inference.
Experiment
BSBM-100
● Dataset of 100 million triples,
● Transformed into relational db schema:
offer > 5.7 million rows
person > 147 thousand rows
producer > 5 thousand rows
product > 288 thousand rows
productfeature > 47 thousand rows
productfeatureproduct > 5.5 million rows
producttype > 2 thousand rows
producttypeproduct > 1.4 million rows
review > 2.8 million rows
vendor > 2 thousand rows
Experiment
Test Databases
● MySQL - v5.6
○ Vanilla
○ Optimized
■ CREATE INDEX
■ OPTIMIZE TABLE - ANALYZE
● PostgreSQL - v9.4.4
○ Vanilla
○ Optimized
■ CREATE INDEX
■ VACUUM TABLE - ANALYZE
Experiment
Test Machine
● MacBook Pro
○ OS X Yosemite 64-bit
○ Java 8 (build 1.8.0_51-b16)
○ Intel Core i7 3 GHz
○ Memory 16 GB
○ Flash storage
○ Direct connection - no network cost
Experiment
Benchmark Flow
for each obda-endpoint do:
for each dbms do:
for each dbms-variant do:
start endpoint;
start dbms;
loop 2:
run ‘benchmark -runs 100 -w 10’;
stop dbms;
stop endpoint;
Experiment
Benchmark Result
Experiment
Conclusion
● OBDA offers a non-invasive solution to
existing (legacy) database system for
better data access service.
● A lot of interesting topics can be harvested
from OBDA use case scenarios.
○ Health and clinical domain perhaps?
● OBDA performance relies heavily on the
efficiency of the underlying data
infrastructure (both HW and SW).
Thanks! Any Questions?
Appendix:
Query Answering and
Query Rewriting
Query Answering over Database
Image source: Calvanese, Diego. Ontology-Based Data Access and Integration. https://www.essi.upc.edu/docs/slides-obda-2010-02-08
An example
Query Answering over Ontology
Image source: Calvanese, Diego. Ontology-Based Data Access and Integration. https://www.essi.upc.edu/docs/slides-obda-2010-02-08
An example
Query Answering via Rewriting
Image source: Calvanese, Diego. Ontology-Based Data Access and Integration. https://www.essi.upc.edu/docs/slides-obda-2010-02-08
Query Rewriting
Appendix:
-ontop- Add-ons
-ontop- Black Box
Image source: Kontchakov, Roman, et.al. Ontology-based Data Access: Ontop of Databases. http://www.dcs.bbk.ac.uk/~roman/papers/ISWC13.pdf
● Tree witness rewriting technique
● T-mapping optimization
● Semantic Query Optimization (SQO)
Rule Entailment
Image source: Xiao, Guohui, et.al. Rules and Ontology-based Data Access. https://www.inf.unibz.it/~calvanese/papers/xiao-rezk-rodr-calv-RR-2014.pdf
● SWRL Rules to relational algebra, expressed in SQL’99
Common Table Expressions (CTEs)
● T-Mapping extension
Appendix:
Detailed Benchmark
Report
Query-Mixed per Hour
-ontop- Semantika Native
MySQL 807 831 436
MySQL optimized 1,471 1,630 2,371
PostgreSQL 2,198 2,286 418
PostgreSQL optimized 7,576 9,204 15,500
Query per Second - MySQL
Vanilla
Q1 Q2 Q3 Q4 Q5 Q7 Q8 Q9 Q10 Q11 Q12
-ontop- 1 95 1 -- 1 88 100 -- 75 -- --
Semantika 1 101 1 -- 1 77 112 -- 95 -- --
Q1 Q2 Q3 Q4 Q5 Q7 Q8 Q9 Q10 Q11 Q12
-ontop- 30 73 26 -- 1 48 63 -- 49 -- --
Semantika 58 99 46 -- 1 95 108 -- 102 -- --
Optimized
Query per Second - PostgreSQL
Vanilla
Q1 Q2 Q3 Q4 Q5 Q7 Q8 Q9 Q10 Q11 Q12
-ontop- 4 89 4 -- 2 73 77 -- 100 -- --
Semantika 4 90 4 -- 2 96 110 -- 123 -- --
Q1 Q2 Q3 Q4 Q5 Q7 Q8 Q9 Q10 Q11 Q12
-ontop- 75 77 79 -- 9 47 60 -- 76 -- --
Semantika 88 81 82 -- 9 94 110 -- 119 -- --
Optimized
Semantika does cache better
-ontop- Semantika
Trial 1 Trial 2 Delta% Trial 1 Trial 2 Delta%
MySQL 790 807 +2% 638 831 +30%
MySQL optimized 1424 1471 +3% 983 1630 +66%
PostgreSQL 1803 2198 +22% 1254 2286 +82%
PostgreSQL optimized 5678 7576 +33% 2028 9204 +354%
Ontop could answer ALL queries
Q1 Q2 Q3 Q4 Q5 Q7 Q8 Q9 Q10 Q11 Q12
-ontop- 83 80 78 112 9 75 78 83 105 91 83
Semantika 88 81 82 -- 9 94 110 -- 119 -- --
-ontop- supports almost all features in SPARQL 1.1
Appendix:
Comparison: Mapping
Syntax
-ontop- Mappings
mappingId Reviewer
target <"&bsbm-inst;dataFromRatingSite{$publisher}/Reviewer{$nr}"> a foaf:Person;
foaf:name $name; foaf:mbox_sha1sum $mbox_sha1sum; bsbm:country <"&iso3166;{$country}"
>; dc:publisher <"&bsbm-inst;dataFromRatingSite{$publisher}/RatingSite{$publisher}">; dc:date
$publishDate .
source select nr, name, mbox_sha1sum, country, publisher, publishDate from person
mappingId Producer
target <"&bsbm-inst;dataFromProducer{$nr}/Producer{$nr}"> a bsbm:Producer; rdfs:
label $label; rdfs:comment $comment; foaf:homepage $homepage; bsbm:country <"&iso3166;
{$country}">; dc:publisher <"&bsbm-inst;dataFromProducer{$nr}/Producer{$nr}">; dc:date
$publishDate .
source select nr, label, comment, homepage, country, publisher, publishDate from
producer
● Uses Turtle syntax.
● Specification: https://babbage.inf.unibz.
it/trac/obdapublic/wiki/ObdalibObdaTurtlesyntax
● Support R2RML syntax
Semantika Mappings
<mapping tml:id="Reviewer">
<logical-table rr:tableName="person"/>
<subject-map rr:class="foaf:Person" rr:template="Reviewer(publisher,nr)"/>
<predicate-object-map rr:predicate="foaf:name" rr:column="name"/>
<predicate-object-map rr:predicate="foaf:mbox_sha1sum" rr:column="mbox_sha1sum"/>
<predicate-object-map rr:predicate="bsbm:country" rr:template="Country(country)"/>
<predicate-object-map rr:predicate="dc:publisher" rr:template="ReviewerPublisher(publisher,publisher)"/>
<predicate-object-map rr:predicate="dc:date" rr:column="publishDate"/>
</mapping>
<mapping tml:id="Producer">
<logical-table rr:tableName="producer"/>
<subject-map rr:class="bsbm:Producer" rr:template="Producer(nr,nr)"/>
<predicate-object-map rr:predicate="rdfs:label" rr:column="label"/>
<predicate-object-map rr:predicate="rdfs:comment" rr:column="comment"/>
<predicate-object-map rr:predicate="foaf:homepage" rr:column="homepage"/>
<predicate-object-map rr:predicate="bsbm:country" rr:template="Country(country)"/>
<predicate-object-map rr:predicate="dc:publisher" rr:template="ProducerPublisher(nr,nr)"/>
<predicate-object-map rr:predicate="dc:date" rr:column="publishDate"/>
</mapping>
● Uses XML format.
● Specification: https://github.com/obidea/semantika/wiki/2.-Basic-RDB-RDF-
Mapping
● Support R2RML syntax
Appendix:
Comparison: SQL
Creation
Simple SPARQL Query
SELECT ?title ?publishDate
WHERE
{ ?review bsbm:reviewFor bsbm:Producer1245/Product62033> .
?review dc:title ?title .
?review dc:date ?publishDate .
}
Ontop SQL Creation
SELECT
3 AS `titleQuestType`, NULL AS `titleLang`, QVIEW1.`title` AS `title`,
10 AS `publishDateQuestType`, NULL AS `publishDateLang`, CAST
(QVIEW1.`publishDate` AS CHAR(8000) CHARACTER SET utf8) AS
`publishDate`
FROM review QVIEW1
WHERE
(QVIEW1.`product` = '62033') AND
(QVIEW1.`producer` = '1245') AND
QVIEW1.`publisher` IS NOT NULL AND
QVIEW1.`nr` IS NOT NULL AND
QVIEW1.`title` IS NOT NULL AND
QVIEW1.`publishDate` IS NOT NULL
Semantika SQL Creation
SELECT `OBDA_VIEW1`.`title` AS `title`,
`OBDA_VIEW1`.`publishDate` AS `publishDate`
FROM `bsbm100`.`review` AS `OBDA_VIEW1`
WHERE `OBDA_VIEW1`.`publisher` IS NOT NULL AND
`OBDA_VIEW1`.`product` = 62033 AND
`OBDA_VIEW1`.`publishDate` IS NOT NULL AND
`OBDA_VIEW1`.`nr` IS NOT NULL AND
`OBDA_VIEW1`.`title` IS NOT NULL AND
`OBDA_VIEW1`.`producer` = 1245
Let’s add something more...
SELECT ?review ?title ?publishDate ?rating1 ?rating2
WHERE
{ ?review bsbm:reviewFor bsbm:Producer1245/Product62033> .
?review dc:title ?title .
?review dc:date ?publishDate .
?review bsbm:rating1 ?rating1 .
OPTIONAL { ?review bsbm:rating2 ?rating2 . }
}
Ontop SQL Creation
SELECT
1 AS `reviewQuestType`, NULL AS `reviewLang`, CONCAT('http://www4.wiwiss.fu-berlin.
de/bizer/bsbm/v01/instances/dataFromRatingSite', REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE
(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(CAST(QVIEW1.`publisher` AS CHAR
(8000) CHARACTER SET utf8),' ', '%20'),'!', '%21'),'@', '%40'),'#', '%23'),'$', '%24'),'&', '%26'),'*', '%42'), '(', '%28'), ')', '%29'), '[', '%5B'), ']', '%5D'),
',', '%2C'), ';', '%3B'), ':', '%3A'), '?', '%3F'), '=', '%3D'), '+', '%2B'), '''', '%22'), '/', '%2F'), '/Review', REPLACE(REPLACE(REPLACE(REPLACE(REPLACE
(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE
(CAST(QVIEW1.`nr` AS CHAR(8000) CHARACTER SET utf8),' ', '%20'),'!', '%21'),'@', '%40'),'#', '%23'),'$', '%24'),'&', '%26'),'*', '%42'), '(', '%28'),
')', '%29'), '[', '%5B'), ']', '%5D'), ',', '%2C'), ';', '%3B'), ':', '%3A'), '?', '%3F'), '=', '%3D'), '+', '%2B'), '''', '%22'), '/', '%2F')) AS `review`,
3 AS `titleQuestType`, NULL AS `titleLang`, QVIEW1.`title` AS `title`,
10 AS `publishDateQuestType`, NULL AS `publishDateLang`, CAST(QVIEW1.`publishDate` AS CHAR(8000) CHARACTER SET utf8) AS
`publishDate`,
4 AS `rating1QuestType`, NULL AS `rating1Lang`, CAST(QVIEW1.`rating1` AS CHAR(8000) CHARACTER SET utf8) AS `rating1`,
4 AS `rating2QuestType`, NULL AS `rating2Lang`, CAST(QVIEW2.`rating2` AS CHAR(8000) CHARACTER SET utf8) AS `rating2`
FROM (
review QVIEW1
LEFT OUTER JOIN review QVIEW2
ON (QVIEW1.`nr` = QVIEW2.`nr`) AND
(QVIEW1.`publisher` = QVIEW2.`publisher`) AND
QVIEW2.`rating2` IS NOT NULL AND
QVIEW1.`publisher` IS NOT NULL AND
QVIEW1.`nr` IS NOT NULL
)
WHERE
QVIEW1.`title` IS NOT NULL AND
QVIEW1.`nr` IS NOT NULL AND
QVIEW1.`publishDate` IS NOT NULL AND
(QVIEW1.`product` = '62033') AND
QVIEW1.`publisher` IS NOT NULL AND
QVIEW1.`rating1` IS NOT NULL AND
(QVIEW1.`producer` = '1245')
Semantika SQL Creation
SELECT CONCAT('http://www4.wiwiss.fu-berlin.
de/bizer/bsbm/v01/instances/dataFromRatingSite{1}/Review{2}',' : ','"',
`OBDA_VIEW1`.`publisher`,'" "',`OBDA_VIEW1`.`nr`,'"') AS `review`,
`OBDA_VIEW1`.`title` AS `title`,
`OBDA_VIEW1`.`publishDate` AS `publishDate`,
`OBDA_VIEW1`.`rating1` AS `rating1`,
`OBDA_VIEW1`.`rating2` AS `rating2`
FROM `bsbm100_optimized`.`review` AS `OBDA_VIEW1`
WHERE `OBDA_VIEW1`.`publisher` IS NOT NULL AND
`OBDA_VIEW1`.`product` = 62033 AND
`OBDA_VIEW1`.`publishDate` IS NOT NULL AND
`OBDA_VIEW1`.`nr` IS NOT NULL AND
`OBDA_VIEW1`.`title` IS NOT NULL AND
`OBDA_VIEW1`.`rating1` IS NOT NULL AND
`OBDA_VIEW1`.`producer` = 1245

Más contenido relacionado

La actualidad más candente

2010 03 Lodoxf Openflydata
2010 03 Lodoxf Openflydata2010 03 Lodoxf Openflydata
2010 03 Lodoxf Openflydata
Jun Zhao
 
Model-based Analysis of Large Scale Software Repositories
Model-based Analysis of Large Scale Software RepositoriesModel-based Analysis of Large Scale Software Repositories
Model-based Analysis of Large Scale Software Repositories
Markus Scheidgen
 
Deep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictions
Deep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictionsDeep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictions
Deep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictions
Valery Tkachenko
 
2009 Dils Flyweb
2009 Dils Flyweb2009 Dils Flyweb
2009 Dils Flyweb
Jun Zhao
 
LiveLinkedData - TransWebData - Nantes 2013
LiveLinkedData - TransWebData - Nantes 2013LiveLinkedData - TransWebData - Nantes 2013
LiveLinkedData - TransWebData - Nantes 2013
Luis Daniel Ibáñez
 

La actualidad más candente (19)

2010 03 Lodoxf Openflydata
2010 03 Lodoxf Openflydata2010 03 Lodoxf Openflydata
2010 03 Lodoxf Openflydata
 
Analysis of the “KDD Cup-1999” Datasets
Analysis of the  “KDD Cup-1999”  DatasetsAnalysis of the  “KDD Cup-1999”  Datasets
Analysis of the “KDD Cup-1999” Datasets
 
Using publicly available resources to build a comprehensive knowledgebase of ...
Using publicly available resources to build a comprehensive knowledgebase of ...Using publicly available resources to build a comprehensive knowledgebase of ...
Using publicly available resources to build a comprehensive knowledgebase of ...
 
Text Mining using LDA with Context
Text Mining using LDA with ContextText Mining using LDA with Context
Text Mining using LDA with Context
 
ECMFA 2016 slides
ECMFA 2016 slidesECMFA 2016 slides
ECMFA 2016 slides
 
Model-based Analysis of Large Scale Software Repositories
Model-based Analysis of Large Scale Software RepositoriesModel-based Analysis of Large Scale Software Repositories
Model-based Analysis of Large Scale Software Repositories
 
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail Science
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail ScienceSQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail Science
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail Science
 
Limits of RDBMS and Need for NoSQL in Bioinformatics
Limits of RDBMS and Need for NoSQL in BioinformaticsLimits of RDBMS and Need for NoSQL in Bioinformatics
Limits of RDBMS and Need for NoSQL in Bioinformatics
 
Improving Top-K Retrieval Algorithms Using Dynamic Programming and Longer Ski...
Improving Top-K Retrieval Algorithms Using Dynamic Programming and Longer Ski...Improving Top-K Retrieval Algorithms Using Dynamic Programming and Longer Ski...
Improving Top-K Retrieval Algorithms Using Dynamic Programming and Longer Ski...
 
Deep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictions
Deep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictionsDeep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictions
Deep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictions
 
Reference Representation in Large Metamodel-based Datasets
Reference Representation in Large Metamodel-based DatasetsReference Representation in Large Metamodel-based Datasets
Reference Representation in Large Metamodel-based Datasets
 
Text mining meets neural nets
Text mining meets neural netsText mining meets neural nets
Text mining meets neural nets
 
OntoMaven Repositories and OMG API4KP
OntoMaven Repositories and OMG API4KPOntoMaven Repositories and OMG API4KP
OntoMaven Repositories and OMG API4KP
 
2009 Dils Flyweb
2009 Dils Flyweb2009 Dils Flyweb
2009 Dils Flyweb
 
SemFacet paper
SemFacet paperSemFacet paper
SemFacet paper
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reports
 
Building Search & Recommendation Engines
Building Search & Recommendation EnginesBuilding Search & Recommendation Engines
Building Search & Recommendation Engines
 
LiveLinkedData - TransWebData - Nantes 2013
LiveLinkedData - TransWebData - Nantes 2013LiveLinkedData - TransWebData - Nantes 2013
LiveLinkedData - TransWebData - Nantes 2013
 
EVOLUTION OF ONTOLOGY-BASED MAPPINGS
EVOLUTION OF ONTOLOGY-BASED MAPPINGSEVOLUTION OF ONTOLOGY-BASED MAPPINGS
EVOLUTION OF ONTOLOGY-BASED MAPPINGS
 

Similar a Ontology-based data access: why it is so cool!

Efficient top-k queries processing in column-family distributed databases
Efficient top-k queries processing in column-family distributed databasesEfficient top-k queries processing in column-family distributed databases
Efficient top-k queries processing in column-family distributed databases
Rui Vieira
 

Similar a Ontology-based data access: why it is so cool! (20)

Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate HelpdeskDeep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
 
Big (chemical) data? No Problem!
Big (chemical) data? No Problem!Big (chemical) data? No Problem!
Big (chemical) data? No Problem!
 
It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program RepairIt Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
 
Querying and reasoning over large scale building datasets: an outline of a pe...
Querying and reasoning over large scale building datasets: an outline of a pe...Querying and reasoning over large scale building datasets: an outline of a pe...
Querying and reasoning over large scale building datasets: an outline of a pe...
 
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobilNLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil
 
SQL on Hadoop benchmarks using TPC-DS query set
SQL on Hadoop benchmarks using TPC-DS query setSQL on Hadoop benchmarks using TPC-DS query set
SQL on Hadoop benchmarks using TPC-DS query set
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
 
Triantafyllia Voulibasi
Triantafyllia VoulibasiTriantafyllia Voulibasi
Triantafyllia Voulibasi
 
Lessons learned from designing a QA Automation for analytics databases (big d...
Lessons learned from designing a QA Automation for analytics databases (big d...Lessons learned from designing a QA Automation for analytics databases (big d...
Lessons learned from designing a QA Automation for analytics databases (big d...
 
Big Data for Testing - Heading for Post Process and Analytics
Big Data for Testing - Heading for Post Process and AnalyticsBig Data for Testing - Heading for Post Process and Analytics
Big Data for Testing - Heading for Post Process and Analytics
 
Provenance for Data Munging Environments
Provenance for Data Munging EnvironmentsProvenance for Data Munging Environments
Provenance for Data Munging Environments
 
Data mining weka
Data mining wekaData mining weka
Data mining weka
 
Real time analytics @ netflix
Real time analytics @ netflixReal time analytics @ netflix
Real time analytics @ netflix
 
Efficient top-k queries processing in column-family distributed databases
Efficient top-k queries processing in column-family distributed databasesEfficient top-k queries processing in column-family distributed databases
Efficient top-k queries processing in column-family distributed databases
 
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
 
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana CloudUsing SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
 
MONAI and Open Science for Medical Imaging Deep Learning: SIPAIM 2020
MONAI and Open Science for Medical Imaging Deep Learning: SIPAIM 2020MONAI and Open Science for Medical Imaging Deep Learning: SIPAIM 2020
MONAI and Open Science for Medical Imaging Deep Learning: SIPAIM 2020
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
 
Oleksii Moskalenko "Continuous Delivery of ML Pipelines to Production"
Oleksii Moskalenko "Continuous Delivery of ML Pipelines to Production"Oleksii Moskalenko "Continuous Delivery of ML Pipelines to Production"
Oleksii Moskalenko "Continuous Delivery of ML Pipelines to Production"
 
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
 

Último

一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
wsppdmt
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
cnajjemba
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
vexqp
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
vexqp
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
q6pzkpark
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
vexqp
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 

Último (20)

一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 

Ontology-based data access: why it is so cool!

  • 1. Ontology-Based Data Access: Why It is So Cool! Josef Hardi josef.hardi@stanford.edu September 4, 2015 Ontology-Based Data Access is a concept developed by Diego Calvanese and Mariano Rodriguez-Muro in KRDB Research Centre at Free University of Bozen- Bolzano
  • 2. Outline ● What is Ontology-based Data Access, or OBDA? ○ Motivation ○ System Black Box ○ Process Illustration ● Project -ontop- and Quest ● Experiment ○ Query Answering Performance ○ -ontop- vs Semantika ● Conclusion ● Q&A
  • 3. Acknowledgement Parts of the slides in this presentation are taken from tutorial or lecture slides by: Diego Calvanese, Mariano Rodriguez-Muro, and Martin Rezk
  • 5. Think a scenario Data Layer Data Service conceptual view Image source: (various sources) What is Ontology-based Data Access?
  • 6. Data Access Bottleneck Image source: Rezk, Martin. Ontologies Ontop Databases http://www.slideshare.net/MartnRezk/slides-swat4-ls What is Ontology-based Data Access?
  • 7. Query Answering tbl_patient+2015 PatientId Name Cell_type cStage 1 Mary true 7 2 John false 6 3 Bill false 4 Cancer type is: ● NSCLC is when Cell_type is false, ● SCLC is when Cell_type is true. Cancer stage is: ● I, II, III, IIIa, IIIb, IV for NSCLC, corr. cStage: 1 - 6, ● Limited and Extensive for SCLC, corr. cStage: 7 and 8. There is “hidden logic” inside the table that is specifically used by the application. Not for querying the data!
  • 8. Query Answering tbl_patient+2015 PatientId Name Cell_type cStage 1 Mary true 7 2 John false 6 3 Bill false 4 Name cStage John 6 Bill 4 RESULT select Name, cStage from tbl_patient+2015 where Cell_type = false and cStage >= 4;
  • 9. Can we do it better? Show me all the patients’ name and stage status that have large tumor with at least in a minimum stage IIIa. Query Answering
  • 10. Bridge the semantics tbl_patient+2015 PatientId Name Cell_type cStage 1 Mary true 7 2 John false 6 3 Bill false 4 Cancer type is: ● NSCLC is when Cell_type is false, ● SCLC is when Cell_type is true. Cancer stage is: ● I, II, III, IIIa, IIIb, IV for NSCLC, ● Limited and Extensive for SCLC. hasStage ISA name ISA ISA hasNeoplasm SNOMED-CT *SCLC = Small Cell Lung Cancer, NSCLC = Non-Small Cell Lung Cancer Query Answering
  • 11. OBDA Answering ● (Data) Sources: represents the external and independent resources. Existing organization assets. ● Ontology: provides a unified common vocabulary. The conceptual view of the underlying data ● Mappings: relates the terms in ontology to a set of SQL views. Image source: Rezk, Martin. Ontologies Ontop Databases http://www.slideshare.net/MartnRezk/slides-swat4-ls Query Answering
  • 12. OBDA Answering Black Box ● Rewriting: Create a new query which is the expanded version of the original query, using all the defined inclusion assertions in the ontology. ● Unfolding: Substitute each part in the expanded query with corresponding SQL views from the given mappings. ● Evaluation: Execute the complete SQL to a target RDBMS. Image source: Kontchakov, Roman, et.al. Ontology-based Data Access: Ontop of Databases. http://www.dcs.bbk.ac.uk/~roman/papers/ISWC13.pdf Query Answering
  • 13. OBDA Answering Illustration Q: Show me all the Person in the hospital? Q’: Show me all the Person UNION all the Nurse UNION all the Doctor UNION all the Patient UNION anyone who has Neoplasm in the hospital? Rewritten
  • 14. Look where is the source(s) (No source) Q’: Show me all the Person UNION all the Nurse UNION all the Doctor UNION all the Patient UNION anyone who has Neoplasm in the hospital? Get the list from table Nurse Get the list from table Doctor Get the list from table Patient Get the list from table Cancer Patient 2015 M M M M M OBDA Answering Illustration
  • 15. Substitute with SQL views Q’: Show me all the Person UNION select NurseId from tbl_nurse UNION select doc_id from tbl_doctor UNION select pid from tbl_patient UNION select PatientId from tbl_patient+2015 in the hospital? OBDA Answering Illustration Unfolded
  • 16. Execute the SQL select NurseId from tbl_nurse UNION select doc_id from tbl_doctor UNION select pid from tbl_patient UNION select PatientId from tbl_patient+2015 OBDA Answering Illustration Evaluated
  • 17. 42! (Computational) Price to Pay Query answering in OBDA setting: ● PTIME in the size of ontology (efficiently tractable) ● AC0 in the size of the data (very efficiently tractable) ● NP-Complete in the size of query (exponential) *Tractable problem: there exists an algorithm that will eventually terminate in a reasonable amount of time and return you the result. OBDA Answering Illustration
  • 18.
  • 19. -ontop- Project ● A platform to query relational databases using SPARQL language, ● The implementation started in 2010, ● Supports several database systems, like: MySQL, PostgreSQL, H2, SQL Server, Oracle, IBM DB2. ● Distributed under open-source license. ● It is currently being developed within the context of EU Optique project. ● Fantastic add-ons: Efficient rewriting, Query optimization, Transitive query, Rules entailment, Cross-linked datasets. -ontop-
  • 23. Berlin SPARQL Benchmark (BSBM) ● A benchmark suite built around e-commerce domain. ○ A set of products is offered by different vendors and customers are posting product reviews. ● Consists of 12 different queries, emulating the search and navigation pattern of a consumer looking for a product. ● A Query-Mix consists of 25 querying actions that simulate a product search scenario. ● No inference. Experiment
  • 24. BSBM-100 ● Dataset of 100 million triples, ● Transformed into relational db schema: offer > 5.7 million rows person > 147 thousand rows producer > 5 thousand rows product > 288 thousand rows productfeature > 47 thousand rows productfeatureproduct > 5.5 million rows producttype > 2 thousand rows producttypeproduct > 1.4 million rows review > 2.8 million rows vendor > 2 thousand rows Experiment
  • 25. Test Databases ● MySQL - v5.6 ○ Vanilla ○ Optimized ■ CREATE INDEX ■ OPTIMIZE TABLE - ANALYZE ● PostgreSQL - v9.4.4 ○ Vanilla ○ Optimized ■ CREATE INDEX ■ VACUUM TABLE - ANALYZE Experiment
  • 26. Test Machine ● MacBook Pro ○ OS X Yosemite 64-bit ○ Java 8 (build 1.8.0_51-b16) ○ Intel Core i7 3 GHz ○ Memory 16 GB ○ Flash storage ○ Direct connection - no network cost Experiment
  • 27. Benchmark Flow for each obda-endpoint do: for each dbms do: for each dbms-variant do: start endpoint; start dbms; loop 2: run ‘benchmark -runs 100 -w 10’; stop dbms; stop endpoint; Experiment
  • 29. Conclusion ● OBDA offers a non-invasive solution to existing (legacy) database system for better data access service. ● A lot of interesting topics can be harvested from OBDA use case scenarios. ○ Health and clinical domain perhaps? ● OBDA performance relies heavily on the efficiency of the underlying data infrastructure (both HW and SW).
  • 32. Query Answering over Database Image source: Calvanese, Diego. Ontology-Based Data Access and Integration. https://www.essi.upc.edu/docs/slides-obda-2010-02-08
  • 34. Query Answering over Ontology Image source: Calvanese, Diego. Ontology-Based Data Access and Integration. https://www.essi.upc.edu/docs/slides-obda-2010-02-08
  • 36. Query Answering via Rewriting Image source: Calvanese, Diego. Ontology-Based Data Access and Integration. https://www.essi.upc.edu/docs/slides-obda-2010-02-08
  • 39. -ontop- Black Box Image source: Kontchakov, Roman, et.al. Ontology-based Data Access: Ontop of Databases. http://www.dcs.bbk.ac.uk/~roman/papers/ISWC13.pdf ● Tree witness rewriting technique ● T-mapping optimization ● Semantic Query Optimization (SQO)
  • 40. Rule Entailment Image source: Xiao, Guohui, et.al. Rules and Ontology-based Data Access. https://www.inf.unibz.it/~calvanese/papers/xiao-rezk-rodr-calv-RR-2014.pdf ● SWRL Rules to relational algebra, expressed in SQL’99 Common Table Expressions (CTEs) ● T-Mapping extension
  • 42. Query-Mixed per Hour -ontop- Semantika Native MySQL 807 831 436 MySQL optimized 1,471 1,630 2,371 PostgreSQL 2,198 2,286 418 PostgreSQL optimized 7,576 9,204 15,500
  • 43. Query per Second - MySQL Vanilla Q1 Q2 Q3 Q4 Q5 Q7 Q8 Q9 Q10 Q11 Q12 -ontop- 1 95 1 -- 1 88 100 -- 75 -- -- Semantika 1 101 1 -- 1 77 112 -- 95 -- -- Q1 Q2 Q3 Q4 Q5 Q7 Q8 Q9 Q10 Q11 Q12 -ontop- 30 73 26 -- 1 48 63 -- 49 -- -- Semantika 58 99 46 -- 1 95 108 -- 102 -- -- Optimized
  • 44. Query per Second - PostgreSQL Vanilla Q1 Q2 Q3 Q4 Q5 Q7 Q8 Q9 Q10 Q11 Q12 -ontop- 4 89 4 -- 2 73 77 -- 100 -- -- Semantika 4 90 4 -- 2 96 110 -- 123 -- -- Q1 Q2 Q3 Q4 Q5 Q7 Q8 Q9 Q10 Q11 Q12 -ontop- 75 77 79 -- 9 47 60 -- 76 -- -- Semantika 88 81 82 -- 9 94 110 -- 119 -- -- Optimized
  • 45. Semantika does cache better -ontop- Semantika Trial 1 Trial 2 Delta% Trial 1 Trial 2 Delta% MySQL 790 807 +2% 638 831 +30% MySQL optimized 1424 1471 +3% 983 1630 +66% PostgreSQL 1803 2198 +22% 1254 2286 +82% PostgreSQL optimized 5678 7576 +33% 2028 9204 +354%
  • 46. Ontop could answer ALL queries Q1 Q2 Q3 Q4 Q5 Q7 Q8 Q9 Q10 Q11 Q12 -ontop- 83 80 78 112 9 75 78 83 105 91 83 Semantika 88 81 82 -- 9 94 110 -- 119 -- -- -ontop- supports almost all features in SPARQL 1.1
  • 48. -ontop- Mappings mappingId Reviewer target <"&bsbm-inst;dataFromRatingSite{$publisher}/Reviewer{$nr}"> a foaf:Person; foaf:name $name; foaf:mbox_sha1sum $mbox_sha1sum; bsbm:country <"&iso3166;{$country}" >; dc:publisher <"&bsbm-inst;dataFromRatingSite{$publisher}/RatingSite{$publisher}">; dc:date $publishDate . source select nr, name, mbox_sha1sum, country, publisher, publishDate from person mappingId Producer target <"&bsbm-inst;dataFromProducer{$nr}/Producer{$nr}"> a bsbm:Producer; rdfs: label $label; rdfs:comment $comment; foaf:homepage $homepage; bsbm:country <"&iso3166; {$country}">; dc:publisher <"&bsbm-inst;dataFromProducer{$nr}/Producer{$nr}">; dc:date $publishDate . source select nr, label, comment, homepage, country, publisher, publishDate from producer ● Uses Turtle syntax. ● Specification: https://babbage.inf.unibz. it/trac/obdapublic/wiki/ObdalibObdaTurtlesyntax ● Support R2RML syntax
  • 49. Semantika Mappings <mapping tml:id="Reviewer"> <logical-table rr:tableName="person"/> <subject-map rr:class="foaf:Person" rr:template="Reviewer(publisher,nr)"/> <predicate-object-map rr:predicate="foaf:name" rr:column="name"/> <predicate-object-map rr:predicate="foaf:mbox_sha1sum" rr:column="mbox_sha1sum"/> <predicate-object-map rr:predicate="bsbm:country" rr:template="Country(country)"/> <predicate-object-map rr:predicate="dc:publisher" rr:template="ReviewerPublisher(publisher,publisher)"/> <predicate-object-map rr:predicate="dc:date" rr:column="publishDate"/> </mapping> <mapping tml:id="Producer"> <logical-table rr:tableName="producer"/> <subject-map rr:class="bsbm:Producer" rr:template="Producer(nr,nr)"/> <predicate-object-map rr:predicate="rdfs:label" rr:column="label"/> <predicate-object-map rr:predicate="rdfs:comment" rr:column="comment"/> <predicate-object-map rr:predicate="foaf:homepage" rr:column="homepage"/> <predicate-object-map rr:predicate="bsbm:country" rr:template="Country(country)"/> <predicate-object-map rr:predicate="dc:publisher" rr:template="ProducerPublisher(nr,nr)"/> <predicate-object-map rr:predicate="dc:date" rr:column="publishDate"/> </mapping> ● Uses XML format. ● Specification: https://github.com/obidea/semantika/wiki/2.-Basic-RDB-RDF- Mapping ● Support R2RML syntax
  • 51. Simple SPARQL Query SELECT ?title ?publishDate WHERE { ?review bsbm:reviewFor bsbm:Producer1245/Product62033> . ?review dc:title ?title . ?review dc:date ?publishDate . }
  • 52. Ontop SQL Creation SELECT 3 AS `titleQuestType`, NULL AS `titleLang`, QVIEW1.`title` AS `title`, 10 AS `publishDateQuestType`, NULL AS `publishDateLang`, CAST (QVIEW1.`publishDate` AS CHAR(8000) CHARACTER SET utf8) AS `publishDate` FROM review QVIEW1 WHERE (QVIEW1.`product` = '62033') AND (QVIEW1.`producer` = '1245') AND QVIEW1.`publisher` IS NOT NULL AND QVIEW1.`nr` IS NOT NULL AND QVIEW1.`title` IS NOT NULL AND QVIEW1.`publishDate` IS NOT NULL
  • 53. Semantika SQL Creation SELECT `OBDA_VIEW1`.`title` AS `title`, `OBDA_VIEW1`.`publishDate` AS `publishDate` FROM `bsbm100`.`review` AS `OBDA_VIEW1` WHERE `OBDA_VIEW1`.`publisher` IS NOT NULL AND `OBDA_VIEW1`.`product` = 62033 AND `OBDA_VIEW1`.`publishDate` IS NOT NULL AND `OBDA_VIEW1`.`nr` IS NOT NULL AND `OBDA_VIEW1`.`title` IS NOT NULL AND `OBDA_VIEW1`.`producer` = 1245
  • 54. Let’s add something more... SELECT ?review ?title ?publishDate ?rating1 ?rating2 WHERE { ?review bsbm:reviewFor bsbm:Producer1245/Product62033> . ?review dc:title ?title . ?review dc:date ?publishDate . ?review bsbm:rating1 ?rating1 . OPTIONAL { ?review bsbm:rating2 ?rating2 . } }
  • 55. Ontop SQL Creation SELECT 1 AS `reviewQuestType`, NULL AS `reviewLang`, CONCAT('http://www4.wiwiss.fu-berlin. de/bizer/bsbm/v01/instances/dataFromRatingSite', REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE (REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(CAST(QVIEW1.`publisher` AS CHAR (8000) CHARACTER SET utf8),' ', '%20'),'!', '%21'),'@', '%40'),'#', '%23'),'$', '%24'),'&', '%26'),'*', '%42'), '(', '%28'), ')', '%29'), '[', '%5B'), ']', '%5D'), ',', '%2C'), ';', '%3B'), ':', '%3A'), '?', '%3F'), '=', '%3D'), '+', '%2B'), '''', '%22'), '/', '%2F'), '/Review', REPLACE(REPLACE(REPLACE(REPLACE(REPLACE (REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE (CAST(QVIEW1.`nr` AS CHAR(8000) CHARACTER SET utf8),' ', '%20'),'!', '%21'),'@', '%40'),'#', '%23'),'$', '%24'),'&', '%26'),'*', '%42'), '(', '%28'), ')', '%29'), '[', '%5B'), ']', '%5D'), ',', '%2C'), ';', '%3B'), ':', '%3A'), '?', '%3F'), '=', '%3D'), '+', '%2B'), '''', '%22'), '/', '%2F')) AS `review`, 3 AS `titleQuestType`, NULL AS `titleLang`, QVIEW1.`title` AS `title`, 10 AS `publishDateQuestType`, NULL AS `publishDateLang`, CAST(QVIEW1.`publishDate` AS CHAR(8000) CHARACTER SET utf8) AS `publishDate`, 4 AS `rating1QuestType`, NULL AS `rating1Lang`, CAST(QVIEW1.`rating1` AS CHAR(8000) CHARACTER SET utf8) AS `rating1`, 4 AS `rating2QuestType`, NULL AS `rating2Lang`, CAST(QVIEW2.`rating2` AS CHAR(8000) CHARACTER SET utf8) AS `rating2` FROM ( review QVIEW1 LEFT OUTER JOIN review QVIEW2 ON (QVIEW1.`nr` = QVIEW2.`nr`) AND (QVIEW1.`publisher` = QVIEW2.`publisher`) AND QVIEW2.`rating2` IS NOT NULL AND QVIEW1.`publisher` IS NOT NULL AND QVIEW1.`nr` IS NOT NULL ) WHERE QVIEW1.`title` IS NOT NULL AND QVIEW1.`nr` IS NOT NULL AND QVIEW1.`publishDate` IS NOT NULL AND (QVIEW1.`product` = '62033') AND QVIEW1.`publisher` IS NOT NULL AND QVIEW1.`rating1` IS NOT NULL AND (QVIEW1.`producer` = '1245')
  • 56. Semantika SQL Creation SELECT CONCAT('http://www4.wiwiss.fu-berlin. de/bizer/bsbm/v01/instances/dataFromRatingSite{1}/Review{2}',' : ','"', `OBDA_VIEW1`.`publisher`,'" "',`OBDA_VIEW1`.`nr`,'"') AS `review`, `OBDA_VIEW1`.`title` AS `title`, `OBDA_VIEW1`.`publishDate` AS `publishDate`, `OBDA_VIEW1`.`rating1` AS `rating1`, `OBDA_VIEW1`.`rating2` AS `rating2` FROM `bsbm100_optimized`.`review` AS `OBDA_VIEW1` WHERE `OBDA_VIEW1`.`publisher` IS NOT NULL AND `OBDA_VIEW1`.`product` = 62033 AND `OBDA_VIEW1`.`publishDate` IS NOT NULL AND `OBDA_VIEW1`.`nr` IS NOT NULL AND `OBDA_VIEW1`.`title` IS NOT NULL AND `OBDA_VIEW1`.`rating1` IS NOT NULL AND `OBDA_VIEW1`.`producer` = 1245