Accessing legacy data as virtual RDF stores is a key issue in the building of the Web of Data. In recent years, the MongoDB database has become a popular actor in the NoSQL market, making it a significant potential contributor to the Web of Linked Data. Therefore, in this talk we present an article published at the DEXA 2016 conference. It addresses the question of how to access arbitrary MongoDB documents with SPARQL.
We propose a two-step method to (i) translate a SPARQL query into a pivot abstract query under MongoDB-to-RDF mappings represented in the xR2RML language, then (ii) translate the pivot query into a concrete MongoDB query.
We elaborate on the discrepancy between the expressiveness of SPARQL and the MongoDB query language, and we show that we can always come up with a rewriting that shall produce all correct answers.
A Mapping-based Method to Query MongoDB Documents with SPARQL
1. 1
Franck Michel
A Mapping-based Method to Query
MongoDB Documents with SPARQL
F. Michel, C. Faron-Zucker, J. Montagnat
Université Côte d’Azur, CNRS, Inria, I3S, France
2. 2
Franck Michel
Towards a Web of Data
From a Web of Documents
...to a Web of (Linked) Data
Interlinking of open datasets
in a common machine-readable format,
using common vocabularies
3. 3
Franck Michel
Web-scale data integration
Exponential data growth
Heterogeneous data sources
Knowledge formalized as
Domain Ontologies, Thesauri,
Taxonomies…
A Web of
Linked Open Data
NoSQL: huge potential source of Linked Open Data,
yet largely ignored/despised so far
ID NAME
6. 6
Franck Michel
SPARQL Access to Heterogeneous DBs
Previous SPARQL rewritings closely coupled with the
target QL expressiveness (SQL, XQuery):
• Support of joins, unions, nested queries, filtering, string fctn, etc.
• Semantics-preserving 1-to-1 rewriting
How not to define yet another
rewriting method for each DB?
Two-step approach to deal with the general case [3]:
1. Translate SPARQL into a pivot Abstract Query Language (AQL)
under DB-to-RDF mappings, with no assumption on the target DB capabilities
2. Translate from the AQL to the QL of the target database
9. 9
Franck Michel
The xR2RML mapping language [1]
Extends W3C R2RML and RML
Describe mappings from various types of DB to RDF
• Query the target database
• Pick data elements from query results
• Translate them to (subject, predicate, object) using arbitrary ontologies
Independent of any target database
• Allow any query language
• Allow any syntax to reference data elements within query results
(column name, JSONPath, XPath, attribute name...)
11. 11
Franck Michel
Graph Pattern
SPARQL-to-AQL rewriting [3]
Basic Graph Pattern
SELECT ?mbox WHERE {
?x foaf:mbox "john@foo.com".
?x foaf:mbox ?mbox.
FILTER { ?mbox != "john@foo.com"}
}
Triple Pattern
Rewrite a well-designed SPARQL graph pattern
into an optimized Abstract Query, under a set of
xR2RML mappings.
In turn, it should be possible to translate the
Abstract Query into “any” target database QL.
14. 14
Franck Michel
Why MongoDB?
Sources: db-engines.com
Popularity measure:
Number of results in search engines,
Google Trends, frequency of technical
discussions, job offers, profiles in
professional networks, number of tweets
15. 15
Franck Michel
Why MongoDB?
MongoDB: very popular NoSQL database today
(Probably) increasingly adopted as a general-purpose DB
• Long tail of scientific data?
Non-tabular format
• E.g. with Cassandra, a regular SQL-to-RDF would “almost” do the job
A good candidate to experiment what
it takes to publish NoSQL data into
the Web of Linked Open Data
16. 16
Franck Michel
MongoDB Query Language
Find queries
Declarative retrieval of doc. matching criteria
db.people.find({"age":{$gt: 30}}, {"code":1})
Limitations:
no join, restrictions on union and nested queries, limited comparisons
Aggregate queries
• Definition of processing pipelines:
• project, match, lookup, unwind, …
• Richer than find queries
• But more difficult to anticipate performance
We consider find queries in a first approach
17. 17
Franck Michel
SPARQL to MongoDB query rewriting
Rewrite
SPARQL
Query
Abstract
Query
xR2RML
mapping
AQL Operators:
INNER JOIN ON
LEFT JOIN ON
UNION
FILTER
LIMIT
Atomic AQ:
From: concrete MongoDB query
Where: conditions on JSONPath expressions
isNotNull, equals, sparqlFilter, OR, AND
18. 18
Franck Michel
AQL Operators:
INNER JOIN ON
LEFT JOIN ON
UNION
FILTER
LIMIT
Atomic AQ:
From: concrete MongoDB query
Where: conditions on JSONPath expressions
isNotNull, equals, sparqlFilter, OR, AND
SPARQL to MongoDB query rewriting
Rewrite
SPARQL
Query
Abstract
Query
xR2RML
mapping
AQL Operators:
INNER JOIN ON
LEFT JOIN ON
UNION: $or as root of a query document
FILTER: restrictions (no field comp.), perf. ($where)
LIMIT
Atomic AQ:
From: concrete MongoDB query
Where: conditions on JSONPath expressions
isNotNull, equals, sparqlFilter, OR, AND
19. 19
Franck Michel
SPARQL to MongoDB query rewriting
Rewrite
SPARQL
Query
Abstract
Query
xR2RML
mapping
AQL Operators:
INNER JOIN ON
LEFT JOIN ON
UNION: $or as root of a query document
FILTER: restrictions (no field comp.), perf. ($where)
LIMIT
Atomic AQ:
From: concrete MongoDB query
Where: conditions on JSONPath expressions
isNotNull, equals, sparqlFilter, OR, AND
Query Processing Engine
Query
Processing
Engine
processing burden
20. 20
Franck Michel
SPARQL to MongoDB query rewriting
Rewrite
SPARQL
Query
Abstract
Query
xR2RML
mapping
Translate each atomic AQ
Optimize/Rewrite
AQL Operators:
INNER JOIN ON
LEFT JOIN ON
UNION
FILTER
LIMIT
Conditions on JSONPath expressions
translated into MongoDB operators
Atomic AQ:
From: concrete MongoDB query
Where: conditions on JSONPath
expressions
Concrete
MongoDB
Query
Abstract
MongoDB
Query
Query
Processing
Engine
21. 21
Franck Michel
Condition on JSONPath expression - to - MongoDB QL
Example query
isNotNull($.id)
→ "id": {$exists:true, $ne:null}
equals($.emails.*, "john@foo.com")
→ "emails": {$elemMatch: {$eq:"john@foo.com"}}
Field alternative
equals($.p.["q", "r"], 10)
→ $or: [ {"p.q":{$eq: 10}}, {"p.r":{$eq: 10}} ]
JavaScript calculated array index
equals($.staff[(@.length - 1)].name, "John")
→ $and:[
{"staff":{$exists: true}},
{$where:"this.staff[this.staff.length - 1].name == 'John'"} ]
22. 22
Franck Michel
Condition on JSONPath expression - to - MongoDB QL
10 rules matching different JSONPath patterns
Translation is no piece of cake!
• JSONPath: non-standard, somewhat unclear, ambiguities
• MongoDB find queries
No join, restrictions on union, hardly support nested queries, limited comparisons
Several potential translation issues
• Unnecessary complexity: nested $or/$and
• Non-supported translation of some JSONPath expressions
• “*” stands for any array element as well as any document field
• unsupported array slice notation, …
• Misplaced operator:
• $where not in top-level query (inside an $elemMatch, $and)
23. 23
Franck Michel
Come up with a concrete MongoDB query
Unsupported translation of some JSONPath expressions
• Keep track of them during the translation process
• N stands for “Non-supported clause”
• Remove not supported pieces
• C1 ∧ … ∧ Cn ∧ N → C1 ∧ … ∧ Cn
Widens the condition → all matching documents are returned, in addition to
possibly non-matching documents.
• C1 ∨… ∨ Cn ∨ N → N
The unsupported clause is raised to the parent clause iteratively, until it is
eventually removed, or it ends up in the top-level query: worst case = the query
retrieves all documents
• …
24. 24
Franck Michel
Come up with a concrete MongoDB query
Misplaced operator: pull up $where operators to top-level
• C ∧ W → (C,W)
Top-level $and replaced with its members
• C ∨ W → UNION(C, W)
$or substituted with UNION (not a MongoDB operator).
UNION processed by the query processing engine
• C ∨ (D ∧ W) → UNION(C, D ∧ W))
• C ∧ (D ∨ W) → UNION( C ∧ D, C ∧ W)
• …
25. 25
Franck Michel
Come up with a concrete MongoDB query
Theorem.
Let C be an equality or not-null condition on a JSONPath expression.
Let Q = (Q1, …Qn) be the abstract MongoDB query produced by trans(C).
Rewritability: It is always possible to rewrite Q into a query Q’ = UNION(Q’
1,…
Q’
m) such that ∀i ∈ [1, m] Q’
i is a valid MongoDB query, i.e. Q’
i does not contain
any unsupported clause, and a $where clause only shows at
the top-level of Q’
i.
Completeness: Q’ retrieves all the certain answers, i.e. all the documents
matching condition C. If Q contains at least one unsupported clause, then Q’
may retrieve additional documents that do not match condition C.
27. 27
Franck Michel
Conclusions & perspectives
Goal: foster the development of SPARQL interfaces to
heterogeneous databases, using domain ontologies
Approach base on a pivot Abstract Query Language:
• Generalize existing works on SQL and XQuery
• Encompass all DB-independent steps of the rewriting process
Apply the process in the case of MongoDB
• Not just an application: large gap between expressiveness of SPARQL
and MongoDB query languages
28. 28
Franck Michel
Conclusions & perspectives
SW vs. NoSQL: two un-reconciliable worlds?
Different paradigms:
• SW manages highly connected graphs,
• NoSQL’s manage isolated documents, joins hardly supported
NoSQL DBs
• pragmatically gave up on consistency and rich query features
• trade-off to high throughput/availability, horizontal elasticity
Filling the gap between the two worlds is not straightforward
The experience of MongoDB shows challenges.
Huge potential source of LOD, can’t be ignored anymore
29. 29
Franck Michel
Conclusions & perspectives
Working prototype Morph-xR2RML
• Use case in Digital Humanities [2]
• https://github.com/frmichel/morph-xr2rml/
Perspectives
• Perform benchmarking
• Evaluate usability of MongoDB aggregate query [Botoeva et al. 2016],
characterize mappings wrt. find/aggregate query
[Botoeva et al. 2016] Botoeva Elena, Diego Calvanese, Benjamin Cogrel, Martin Rezk, and Guohui Xiao.
“OBDA beyond Relational DBs: A Study for MongoDB.” In 29th Int. Workshop on Description Logics (DL 2016)
30. 30
Franck Michel
Contacts:
Franck Michel
Catherine Faron-Zucker
Johan Montagnat
[1] F. Michel, L. Djimenou, C. Faron-Zucker, and J. Montagnat. Translation of Relational and Non-Relational
Databases into RDF with xR2RML. In proc. of WebIST 2015.
[2] C. Callou, F. Michel, C. Faron-Zucker, C. Martin, J. Montagnat. Towards a Shared Reference Thesaurus for
Studies on History of Zoology, Archaeozoology and Conservation Biology. In SW4SH workshop, ESWC’15.
[3] F. Michel, C. Faron-Zucker, and J. Montagnat. A Generic Mapping-Based Query Translation from SPARQL to
Various Target Database Query Languages. In proc. of WebIST 2016.
https://github.com/frmichel/morph-xr2rml/