SlideShare una empresa de Scribd logo
1 de 70
Descargar para leer sin conexión
Complex queries in a
distributed multi-model
database
Max Neunhöffer
Tech Talk Geekdom, 16 March 2015
www.arangodb.com
Documents and collections
{
"_key": "123456",
"_id": "chars/123456",
"name": "Duck",
"firstname": "Donald",
"dob": "1934-11-13",
"hobbies": ["Golf",
"Singing",
"Running"],
"home":
{"town": "Duck town",
"street": "Lake Road",
"number": 17},
"species": "duck"
}
1
Documents and collections
{
"_key": "123456",
"_id": "chars/123456",
"name": "Duck",
"firstname": "Donald",
"dob": "1934-11-13",
"hobbies": ["Golf",
"Singing",
"Running"],
"home":
{"town": "Duck town",
"street": "Lake Road",
"number": 17},
"species": "duck"
}
When I say “document”,
I mean “JSON”.
1
Documents and collections
{
"_key": "123456",
"_id": "chars/123456",
"name": "Duck",
"firstname": "Donald",
"dob": "1934-11-13",
"hobbies": ["Golf",
"Singing",
"Running"],
"home":
{"town": "Duck town",
"street": "Lake Road",
"number": 17},
"species": "duck"
}
When I say “document”,
I mean “JSON”.
A “collection” is a set of
documents in a DB.
1
Documents and collections
{
"_key": "123456",
"_id": "chars/123456",
"name": "Duck",
"firstname": "Donald",
"dob": "1934-11-13",
"hobbies": ["Golf",
"Singing",
"Running"],
"home":
{"town": "Duck town",
"street": "Lake Road",
"number": 17},
"species": "duck"
}
When I say “document”,
I mean “JSON”.
A “collection” is a set of
documents in a DB.
The DB can inspect the
values, allowing for
secondary indexes.
1
Documents and collections
{
"_key": "123456",
"_id": "chars/123456",
"name": "Duck",
"firstname": "Donald",
"dob": "1934-11-13",
"hobbies": ["Golf",
"Singing",
"Running"],
"home":
{"town": "Duck town",
"street": "Lake Road",
"number": 17},
"species": "duck"
}
When I say “document”,
I mean “JSON”.
A “collection” is a set of
documents in a DB.
The DB can inspect the
values, allowing for
secondary indexes.
Or one can just treat the
DB as a key/value store.
1
Documents and collections
{
"_key": "123456",
"_id": "chars/123456",
"name": "Duck",
"firstname": "Donald",
"dob": "1934-11-13",
"hobbies": ["Golf",
"Singing",
"Running"],
"home":
{"town": "Duck town",
"street": "Lake Road",
"number": 17},
"species": "duck"
}
When I say “document”,
I mean “JSON”.
A “collection” is a set of
documents in a DB.
The DB can inspect the
values, allowing for
secondary indexes.
Or one can just treat the
DB as a key/value store.
Sharding: the data of a
collection is distributed
between multiple servers.
1
Graphs
A
B
D
E
F
G
C
"likes"
"hates"
2
Graphs
A
B
D
E
F
G
C
"likes"
"hates"
A graph consists of vertices and
edges.
2
Graphs
A
B
D
E
F
G
C
"likes"
"hates"
A graph consists of vertices and
edges.
Graphs model relations, can be
directed or undirected.
2
Graphs
A
B
D
E
F
G
C
"likes"
"hates"
A graph consists of vertices and
edges.
Graphs model relations, can be
directed or undirected.
Vertices and edges are
documents.
2
Graphs
A
B
D
E
F
G
C
"likes"
"hates"
A graph consists of vertices and
edges.
Graphs model relations, can be
directed or undirected.
Vertices and edges are
documents.
Every edge has a _from and a _to
attribute.
2
Graphs
A
B
D
E
F
G
C
"likes"
"hates"
A graph consists of vertices and
edges.
Graphs model relations, can be
directed or undirected.
Vertices and edges are
documents.
Every edge has a _from and a _to
attribute.
The database offers queries and
transactions dealing with graphs.
2
Graphs
A
B
D
E
F
G
C
"likes"
"hates"
A graph consists of vertices and
edges.
Graphs model relations, can be
directed or undirected.
Vertices and edges are
documents.
Every edge has a _from and a _to
attribute.
The database offers queries and
transactions dealing with graphs.
For example, paths in the graph
are interesting.
2
Query 1
Fetch all documents in a collection
FOR p IN people
RETURN p
3
Query 1
Fetch all documents in a collection
FOR p IN people
RETURN p
[ { "name": "Schmidt", "firstname": "Helmut",
"hobbies": ["Smoking"]},
{ "name": "Neunhöffer", "firstname": "Max",
"hobbies": ["Piano", "Golf"]},
...
]
3
Query 1
Fetch all documents in a collection
FOR p IN people
RETURN p
[ { "name": "Schmidt", "firstname": "Helmut",
"hobbies": ["Smoking"]},
{ "name": "Neunhöffer", "firstname": "Max",
"hobbies": ["Piano", "Golf"]},
...
]
(Actually, a cursor is returned.)
3
Query 2
Use filtering, sorting and limit
FOR p IN people
FILTER p.age >= @minage
SORT p.name, p.firstname
LIMIT @nrlimit
RETURN { name: CONCAT(p.name, ", ",
p.firstname),
age : p.age }
4
Query 2
Use filtering, sorting and limit
FOR p IN people
FILTER p.age >= @minage
SORT p.name, p.firstname
LIMIT @nrlimit
RETURN { name: CONCAT(p.name, ", ",
p.firstname),
age : p.age }
[ { "name": "Neunhöffer, Max", "age": 44 },
{ "name": "Schmidt, Helmut", "age": 95 },
...
]
4
Query 3
Aggregation and functions
FOR p IN people
COLLECT a = p.age INTO L
FILTER a >= @minage
RETURN { "age": a, "number": LENGTH(L) }
5
Query 3
Aggregation and functions
FOR p IN people
COLLECT a = p.age INTO L
FILTER a >= @minage
RETURN { "age": a, "number": LENGTH(L) }
[ { "age": 18, "number": 10 },
{ "age": 19, "number": 17 },
{ "age": 20, "number": 12 },
...
]
5
Query 4
Joins
FOR p IN @@peoplecollection
FOR h IN houses
FILTER p._key == h.owner
SORT h.streetname, h.housename
RETURN { housename: h.housename,
streetname: h.streetname,
owner: p.name,
value: h.value }
6
Query 4
Joins
FOR p IN @@peoplecollection
FOR h IN houses
FILTER p._key == h.owner
SORT h.streetname, h.housename
RETURN { housename: h.housename,
streetname: h.streetname,
owner: p.name,
value: h.value }
[ { "housename": "Firlefanz",
"streetname": "Meyer street",
"owner": "Hans Schmidt", "value": 423000
},
...
]
6
Query 5
Modifying data
FOR e IN events
FILTER e.timestamp<"2014-09-01T09:53+0200"
INSERT e IN oldevents
FOR e IN events
FILTER e.timestamp<"2014-09-01T09:53+0200"
REMOVE e._key IN events
7
Query 6
Graph queries
FOR x IN GRAPH_SHORTEST_PATH(
"routeplanner", "germanCity/Cologne",
"frenchCity/Paris", {weight: "distance"} )
RETURN { begin : x.startVertex,
end : x.vertex,
distance : x.distance,
nrPaths : LENGTH(x.paths) }
8
Query 6
Graph queries
FOR x IN GRAPH_SHORTEST_PATH(
"routeplanner", "germanCity/Cologne",
"frenchCity/Paris", {weight: "distance"} )
RETURN { begin : x.startVertex,
end : x.vertex,
distance : x.distance,
nrPaths : LENGTH(x.paths) }
[ { "begin": "germanCity/Cologne",
"end" : {"_id": "frenchCity/Paris", ... },
"distance": 550,
"nrPaths": 10 },
...
]
8
Life of a query
1. Text and query parameters come from user
9
Life of a query
1. Text and query parameters come from user
2. Parse text, produce abstract syntax tree (AST)
9
Life of a query
1. Text and query parameters come from user
2. Parse text, produce abstract syntax tree (AST)
3. Substitute query parameters
9
Life of a query
1. Text and query parameters come from user
2. Parse text, produce abstract syntax tree (AST)
3. Substitute query parameters
4. First optimisation: constant expressions, etc.
9
Life of a query
1. Text and query parameters come from user
2. Parse text, produce abstract syntax tree (AST)
3. Substitute query parameters
4. First optimisation: constant expressions, etc.
5. Translate AST into an execution plan (EXP)
9
Life of a query
1. Text and query parameters come from user
2. Parse text, produce abstract syntax tree (AST)
3. Substitute query parameters
4. First optimisation: constant expressions, etc.
5. Translate AST into an execution plan (EXP)
6. Optimise one EXP, produce many, potentially better EXPs
9
Life of a query
1. Text and query parameters come from user
2. Parse text, produce abstract syntax tree (AST)
3. Substitute query parameters
4. First optimisation: constant expressions, etc.
5. Translate AST into an execution plan (EXP)
6. Optimise one EXP, produce many, potentially better EXPs
7. Reason about distribution in cluster
9
Life of a query
1. Text and query parameters come from user
2. Parse text, produce abstract syntax tree (AST)
3. Substitute query parameters
4. First optimisation: constant expressions, etc.
5. Translate AST into an execution plan (EXP)
6. Optimise one EXP, produce many, potentially better EXPs
7. Reason about distribution in cluster
8. Optimise distributed EXPs
9
Life of a query
1. Text and query parameters come from user
2. Parse text, produce abstract syntax tree (AST)
3. Substitute query parameters
4. First optimisation: constant expressions, etc.
5. Translate AST into an execution plan (EXP)
6. Optimise one EXP, produce many, potentially better EXPs
7. Reason about distribution in cluster
8. Optimise distributed EXPs
9. Estimate costs for all EXPs, and sort by ascending cost
9
Life of a query
1. Text and query parameters come from user
2. Parse text, produce abstract syntax tree (AST)
3. Substitute query parameters
4. First optimisation: constant expressions, etc.
5. Translate AST into an execution plan (EXP)
6. Optimise one EXP, produce many, potentially better EXPs
7. Reason about distribution in cluster
8. Optimise distributed EXPs
9. Estimate costs for all EXPs, and sort by ascending cost
10. Instanciate “cheapest” plan, i.e. set up execution engine
9
Life of a query
1. Text and query parameters come from user
2. Parse text, produce abstract syntax tree (AST)
3. Substitute query parameters
4. First optimisation: constant expressions, etc.
5. Translate AST into an execution plan (EXP)
6. Optimise one EXP, produce many, potentially better EXPs
7. Reason about distribution in cluster
8. Optimise distributed EXPs
9. Estimate costs for all EXPs, and sort by ascending cost
10. Instanciate “cheapest” plan, i.e. set up execution engine
11. Distribute and link up engines on different servers
9
Life of a query
1. Text and query parameters come from user
2. Parse text, produce abstract syntax tree (AST)
3. Substitute query parameters
4. First optimisation: constant expressions, etc.
5. Translate AST into an execution plan (EXP)
6. Optimise one EXP, produce many, potentially better EXPs
7. Reason about distribution in cluster
8. Optimise distributed EXPs
9. Estimate costs for all EXPs, and sort by ascending cost
10. Instanciate “cheapest” plan, i.e. set up execution engine
11. Distribute and link up engines on different servers
12. Execute plan, provide cursor API
9
Execution plans
FOR a IN collA
RETURN {x: a.x, z: b.z}
EnumerateCollection a
EnumerateCollection b
Calculation xx == b.y
Filter xx == b.y
Singleton
Calculation xx
Return {x: a.x, z: b.z}
Calc {x: a.x, z: b.z}
FILTER xx == b.y
FOR b IN collB
LET xx = a.x
Query → EXP
10
Execution plans
FOR a IN collA
RETURN {x: a.x, z: b.z}
EnumerateCollection a
EnumerateCollection b
Calculation xx == b.y
Filter xx == b.y
Singleton
Calculation xx
Return {x: a.x, z: b.z}
Calc {x: a.x, z: b.z}
FILTER xx == b.y
FOR b IN collB
LET xx = a.x
Query → EXP
Black arrows are
dependencies
10
Execution plans
FOR a IN collA
RETURN {x: a.x, z: b.z}
EnumerateCollection a
EnumerateCollection b
Calculation xx == b.y
Filter xx == b.y
Singleton
Calculation xx
Return {x: a.x, z: b.z}
Calc {x: a.x, z: b.z}
FILTER xx == b.y
FOR b IN collB
LET xx = a.x
Query → EXP
Black arrows are
dependencies
Think of a pipeline
10
Execution plans
FOR a IN collA
RETURN {x: a.x, z: b.z}
EnumerateCollection a
EnumerateCollection b
Calculation xx == b.y
Filter xx == b.y
Singleton
Calculation xx
Return {x: a.x, z: b.z}
Calc {x: a.x, z: b.z}
FILTER xx == b.y
FOR b IN collB
LET xx = a.x
Query → EXP
Black arrows are
dependencies
Think of a pipeline
Each node provides
a cursor API
10
Execution plans
FOR a IN collA
RETURN {x: a.x, z: b.z}
EnumerateCollection a
EnumerateCollection b
Calculation xx == b.y
Filter xx == b.y
Singleton
Calculation xx
Return {x: a.x, z: b.z}
Calc {x: a.x, z: b.z}
FILTER xx == b.y
FOR b IN collB
LET xx = a.x
Query → EXP
Black arrows are
dependencies
Think of a pipeline
Each node provides
a cursor API
Blocks of “Items”
travel through the
pipeline
10
Execution plans
FOR a IN collA
RETURN {x: a.x, z: b.z}
EnumerateCollection a
EnumerateCollection b
Calculation xx == b.y
Filter xx == b.y
Singleton
Calculation xx
Return {x: a.x, z: b.z}
Calc {x: a.x, z: b.z}
FILTER xx == b.y
FOR b IN collB
LET xx = a.x
Query → EXP
Black arrows are
dependencies
Think of a pipeline
Each node provides
a cursor API
Blocks of “Items”
travel through the
pipeline
What is an “item”???
10
Pipeline and items
FOR a IN collA EnumerateCollection a
EnumerateCollection b
Singleton
Calculation xx
FOR b IN collB
LET xx = a.x Items have vars a, xx
Items have no vars
Items are the thingies traveling through the pipeline.
11
Pipeline and items
FOR a IN collA EnumerateCollection a
EnumerateCollection b
Singleton
Calculation xx
FOR b IN collB
LET xx = a.x Items have vars a, xx
Items have no vars
Items are the thingies traveling through the pipeline.
An item holds values of those variables in the current frame
11
Pipeline and items
FOR a IN collA EnumerateCollection a
EnumerateCollection b
Singleton
Calculation xx
FOR b IN collB
LET xx = a.x Items have vars a, xx
Items have no vars
Items are the thingies traveling through the pipeline.
An item holds values of those variables in the current frame
Thus: Items look differently in different parts of the plan
11
Pipeline and items
FOR a IN collA EnumerateCollection a
EnumerateCollection b
Singleton
Calculation xx
FOR b IN collB
LET xx = a.x Items have vars a, xx
Items have no vars
Items are the thingies traveling through the pipeline.
An item holds values of those variables in the current frame
Thus: Items look differently in different parts of the plan
We always deal with blocks of items for performance reasons
11
Execution plans
FOR a IN collA
RETURN {x: a.x, z: b.z}
EnumerateCollection a
EnumerateCollection b
Calculation xx == b.y
Filter xx == b.y
Singleton
Calculation xx
Return {x: a.x, z: b.z}
Calc {x: a.x, z: b.z}
FILTER xx == b.y
FOR b IN collB
LET xx = a.x
12
Move filters up
FOR a IN collA
FOR b IN collB
FILTER a.x == 10
FILTER a.u == b.v
RETURN {u:a.u,w:b.w}
Singleton
EnumColl a
EnumColl b
Calc a.x == 10
Return {u:a.u,w:b.w}
Filter a.u == b.v
Calc a.u == b.v
Filter a.x == 10
13
Move filters up
FOR a IN collA
FOR b IN collB
FILTER a.x == 10
FILTER a.u == b.v
RETURN {u:a.u,w:b.w}
The result and behaviour does not
change, if the first FILTER is pulled
out of the inner FOR.
Singleton
EnumColl a
EnumColl b
Calc a.x == 10
Return {u:a.u,w:b.w}
Filter a.u == b.v
Calc a.u == b.v
Filter a.x == 10
13
Move filters up
FOR a IN collA
FILTER a.x < 10
FOR b IN collB
FILTER a.u == b.v
RETURN {u:a.u,w:b.w}
The result and behaviour does not
change, if the first FILTER is pulled
out of the inner FOR.
However, the number of items trave-
ling in the pipeline is decreased.
Singleton
EnumColl a
Return {u:a.u,w:b.w}
Filter a.u == b.v
Calc a.u == b.v
Calc a.x == 10
EnumColl b
Filter a.x == 10
13
Move filters up
FOR a IN collA
FILTER a.x < 10
FOR b IN collB
FILTER a.u == b.v
RETURN {u:a.u,w:b.w}
The result and behaviour does not
change, if the first FILTER is pulled
out of the inner FOR.
However, the number of items trave-
ling in the pipeline is decreased.
Note that the two FOR statements
could be interchanged!
Singleton
EnumColl a
Return {u:a.u,w:b.w}
Filter a.u == b.v
Calc a.u == b.v
Calc a.x == 10
EnumColl b
Filter a.x == 10
13
Remove unnecessary calculations
FOR a IN collA
LET L = LENGTH(a.hobbies)
FOR b IN collB
FILTER a.u == b.v
RETURN {h:a.hobbies,w:b.w}
Singleton
EnumColl a
Calc L = ...
EnumColl b
Calc a.u == b.v
Filter a.u == b.v
Return {...}
14
Remove unnecessary calculations
FOR a IN collA
LET L = LENGTH(a.hobbies)
FOR b IN collB
FILTER a.u == b.v
RETURN {h:a.hobbies,w:b.w}
The Calculation of L is unnecessary!
Singleton
EnumColl a
Calc L = ...
EnumColl b
Calc a.u == b.v
Filter a.u == b.v
Return {...}
14
Remove unnecessary calculations
FOR a IN collA
FOR b IN collB
FILTER a.u == b.v
RETURN {h:a.hobbies,w:b.w}
The Calculation of L is unnecessary!
(since it cannot throw an exception).
Singleton
EnumColl a
EnumColl b
Calc a.u == b.v
Filter a.u == b.v
Return {...}
14
Remove unnecessary calculations
FOR a IN collA
FOR b IN collB
FILTER a.u == b.v
RETURN {h:a.hobbies,w:b.w}
The Calculation of L is unnecessary!
(since it cannot throw an exception).
Therefore we can just leave it out.
Singleton
EnumColl a
EnumColl b
Calc a.u == b.v
Filter a.u == b.v
Return {...}
14
Use index for FILTER and SORT
FOR a IN collA
FILTER a.x > 17 &&
a.x <= 23 &&
a.y == 10
SORT a.y, a.x
RETURN a
Singleton
EnumColl a
Filter ...
Calc ...
Sort a.y, a.x
Return a
15
Use index for FILTER and SORT
FOR a IN collA
FILTER a.x > 17 &&
a.x <= 23 &&
a.y == 10
SORT a.y, a.x
RETURN a
Assume collA has a skiplist index on “y”
and “x” (in this order),
Singleton
EnumColl a
Filter ...
Calc ...
Sort a.y, a.x
Return a
15
Use index for FILTER and SORT
FOR a IN collA
FILTER a.x > 17 &&
a.x <= 23 &&
a.y == 10
SORT a.y, a.x
RETURN a
Assume collA has a skiplist index on “y”
and “x” (in this order), then we can read
off the half-open interval between
{ y: 10, x: 17 } and
{ y: 10, x: 23 }
from the skiplist index.
Singleton
Sort a.y, a.x
Return a
IndexRange a
15
Use index for FILTER and SORT
FOR a IN collA
FILTER a.x > 17 &&
a.x <= 23 &&
a.y == 10
SORT a.y, a.x
RETURN a
Assume collA has a skiplist index on “y”
and “x” (in this order), then we can read
off the half-open interval between
{ y: 10, x: 17 } and
{ y: 10, x: 23 }
from the skiplist index.
The result will automatically be sorted by
y and then by x.
Singleton
Return a
IndexRange a
15
Data distribution in a cluster
Requests
DBserver DBserver DBserver
CoordinatorCoordinator
4 2 5 3 11
The shards of a collection are distributed across the DB
servers.
16
Data distribution in a cluster
Requests
DBserver DBserver DBserver
CoordinatorCoordinator
4 2 5 3 11
The shards of a collection are distributed across the DB
servers.
The coordinators receive queries and organise their
execution
16
Scatter/gather
EnumerateCollection
17
Scatter/gather
Remote
EnumShard
Remote Remote
EnumShard
Remote
Concat/Merge
Remote
EnumShard
Remote
Scatter
17
Scatter/gather
Remote
EnumShard
Remote Remote
EnumShard
Remote
Concat/Merge
Remote
EnumShard
Remote
Scatter
17
Modifying queries
Fortunately:
There can be at most one modifying node in each query.
There can be no modifying nodes in subqueries.
18
Modifying queries
Fortunately:
There can be at most one modifying node in each query.
There can be no modifying nodes in subqueries.
Modifying nodes
The modifying node in a query
is executed on the DBservers,
18
Modifying queries
Fortunately:
There can be at most one modifying node in each query.
There can be no modifying nodes in subqueries.
Modifying nodes
The modifying node in a query
is executed on the DBservers,
to this end, we either scatter the items to all DBservers,
or, if possible, we distribute each item to the shard
that is responsible for the modification.
18
Modifying queries
Fortunately:
There can be at most one modifying node in each query.
There can be no modifying nodes in subqueries.
Modifying nodes
The modifying node in a query
is executed on the DBservers,
to this end, we either scatter the items to all DBservers,
or, if possible, we distribute each item to the shard
that is responsible for the modification.
Sometimes, we can even optimise away a gather/scatter
combination and parallelise completely.
18

Más contenido relacionado

La actualidad más candente

Poster GraphQL-LD: Linked Data Querying with GraphQL
Poster GraphQL-LD: Linked Data Querying with GraphQLPoster GraphQL-LD: Linked Data Querying with GraphQL
Poster GraphQL-LD: Linked Data Querying with GraphQLRuben Taelman
 
Building a modern Application with DataFrames
Building a modern Application with DataFramesBuilding a modern Application with DataFrames
Building a modern Application with DataFramesSpark Summit
 
Deep Dive : Spark Data Frames, SQL and Catalyst Optimizer
Deep Dive : Spark Data Frames, SQL and Catalyst OptimizerDeep Dive : Spark Data Frames, SQL and Catalyst Optimizer
Deep Dive : Spark Data Frames, SQL and Catalyst OptimizerSachin Aggarwal
 
Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...
Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...
Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...Databricks
 
Vertica And Spark: Connecting Computation And Data
Vertica And Spark: Connecting Computation And DataVertica And Spark: Connecting Computation And Data
Vertica And Spark: Connecting Computation And DataSpark Summit
 
Maximilian Michels – Google Cloud Dataflow on Top of Apache Flink
Maximilian Michels – Google Cloud Dataflow on Top of Apache FlinkMaximilian Michels – Google Cloud Dataflow on Top of Apache Flink
Maximilian Michels – Google Cloud Dataflow on Top of Apache FlinkFlink Forward
 
A look ahead at spark 2.0
A look ahead at spark 2.0 A look ahead at spark 2.0
A look ahead at spark 2.0 Databricks
 
Seattle Scalability Mahout
Seattle Scalability MahoutSeattle Scalability Mahout
Seattle Scalability MahoutJake Mannix
 
Designing and Building a Graph Database Application - Ian Robinson (Neo Techn...
Designing and Building a Graph Database Application - Ian Robinson (Neo Techn...Designing and Building a Graph Database Application - Ian Robinson (Neo Techn...
Designing and Building a Graph Database Application - Ian Robinson (Neo Techn...jaxLondonConference
 
Robust and Scalable ETL over Cloud Storage with Apache Spark
Robust and Scalable ETL over Cloud Storage with Apache SparkRobust and Scalable ETL over Cloud Storage with Apache Spark
Robust and Scalable ETL over Cloud Storage with Apache SparkDatabricks
 
Presto At Treasure Data
Presto At Treasure DataPresto At Treasure Data
Presto At Treasure DataTaro L. Saito
 
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015Martin Junghanns
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxData
 
What's new in pandas and the SciPy stack for financial users
What's new in pandas and the SciPy stack for financial usersWhat's new in pandas and the SciPy stack for financial users
What's new in pandas and the SciPy stack for financial usersWes McKinney
 
Poster Declaratively Describing Responses of Hypermedia-Driven Web APIs
Poster Declaratively Describing Responses of Hypermedia-Driven Web APIsPoster Declaratively Describing Responses of Hypermedia-Driven Web APIs
Poster Declaratively Describing Responses of Hypermedia-Driven Web APIsRuben Taelman
 
What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?Miklos Christine
 
Open Source Big Data Ingestion - Without the Heartburn!
Open Source Big Data Ingestion - Without the Heartburn!Open Source Big Data Ingestion - Without the Heartburn!
Open Source Big Data Ingestion - Without the Heartburn!Pat Patterson
 
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Databricks
 
Patterns and Operational Insights from the First Users of Delta Lake
Patterns and Operational Insights from the First Users of Delta LakePatterns and Operational Insights from the First Users of Delta Lake
Patterns and Operational Insights from the First Users of Delta LakeDatabricks
 
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache SparkArbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache SparkDatabricks
 

La actualidad más candente (20)

Poster GraphQL-LD: Linked Data Querying with GraphQL
Poster GraphQL-LD: Linked Data Querying with GraphQLPoster GraphQL-LD: Linked Data Querying with GraphQL
Poster GraphQL-LD: Linked Data Querying with GraphQL
 
Building a modern Application with DataFrames
Building a modern Application with DataFramesBuilding a modern Application with DataFrames
Building a modern Application with DataFrames
 
Deep Dive : Spark Data Frames, SQL and Catalyst Optimizer
Deep Dive : Spark Data Frames, SQL and Catalyst OptimizerDeep Dive : Spark Data Frames, SQL and Catalyst Optimizer
Deep Dive : Spark Data Frames, SQL and Catalyst Optimizer
 
Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...
Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...
Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...
 
Vertica And Spark: Connecting Computation And Data
Vertica And Spark: Connecting Computation And DataVertica And Spark: Connecting Computation And Data
Vertica And Spark: Connecting Computation And Data
 
Maximilian Michels – Google Cloud Dataflow on Top of Apache Flink
Maximilian Michels – Google Cloud Dataflow on Top of Apache FlinkMaximilian Michels – Google Cloud Dataflow on Top of Apache Flink
Maximilian Michels – Google Cloud Dataflow on Top of Apache Flink
 
A look ahead at spark 2.0
A look ahead at spark 2.0 A look ahead at spark 2.0
A look ahead at spark 2.0
 
Seattle Scalability Mahout
Seattle Scalability MahoutSeattle Scalability Mahout
Seattle Scalability Mahout
 
Designing and Building a Graph Database Application - Ian Robinson (Neo Techn...
Designing and Building a Graph Database Application - Ian Robinson (Neo Techn...Designing and Building a Graph Database Application - Ian Robinson (Neo Techn...
Designing and Building a Graph Database Application - Ian Robinson (Neo Techn...
 
Robust and Scalable ETL over Cloud Storage with Apache Spark
Robust and Scalable ETL over Cloud Storage with Apache SparkRobust and Scalable ETL over Cloud Storage with Apache Spark
Robust and Scalable ETL over Cloud Storage with Apache Spark
 
Presto At Treasure Data
Presto At Treasure DataPresto At Treasure Data
Presto At Treasure Data
 
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
 
What's new in pandas and the SciPy stack for financial users
What's new in pandas and the SciPy stack for financial usersWhat's new in pandas and the SciPy stack for financial users
What's new in pandas and the SciPy stack for financial users
 
Poster Declaratively Describing Responses of Hypermedia-Driven Web APIs
Poster Declaratively Describing Responses of Hypermedia-Driven Web APIsPoster Declaratively Describing Responses of Hypermedia-Driven Web APIs
Poster Declaratively Describing Responses of Hypermedia-Driven Web APIs
 
What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?
 
Open Source Big Data Ingestion - Without the Heartburn!
Open Source Big Data Ingestion - Without the Heartburn!Open Source Big Data Ingestion - Without the Heartburn!
Open Source Big Data Ingestion - Without the Heartburn!
 
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...
 
Patterns and Operational Insights from the First Users of Delta Lake
Patterns and Operational Insights from the First Users of Delta LakePatterns and Operational Insights from the First Users of Delta Lake
Patterns and Operational Insights from the First Users of Delta Lake
 
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache SparkArbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
 

Destacado

Hotcode 2013: Javascript in a database (Part 2)
Hotcode 2013: Javascript in a database (Part 2)Hotcode 2013: Javascript in a database (Part 2)
Hotcode 2013: Javascript in a database (Part 2)ArangoDB Database
 
Backbone using Extensible Database APIs over HTTP
Backbone using Extensible Database APIs over HTTPBackbone using Extensible Database APIs over HTTP
Backbone using Extensible Database APIs over HTTPMax Neunhöffer
 
GraphDatabases and what we can use them for
GraphDatabases and what we can use them forGraphDatabases and what we can use them for
GraphDatabases and what we can use them forMichael Hackstein
 
Hotcode 2013: Javascript in a database (Part 1)
Hotcode 2013: Javascript in a database (Part 1)Hotcode 2013: Javascript in a database (Part 1)
Hotcode 2013: Javascript in a database (Part 1)ArangoDB Database
 
Running MRuby in a Database - ArangoDB - RuPy 2012
Running MRuby in a Database - ArangoDB - RuPy 2012 Running MRuby in a Database - ArangoDB - RuPy 2012
Running MRuby in a Database - ArangoDB - RuPy 2012 ArangoDB Database
 
Multi-model databases and node.js
Multi-model databases and node.jsMulti-model databases and node.js
Multi-model databases and node.jsMax Neunhöffer
 
Domain Driven Design & NoSQL
Domain Driven Design & NoSQLDomain Driven Design & NoSQL
Domain Driven Design & NoSQLArangoDB Database
 
Extensible Database APIs and their role in Software Architecture
Extensible Database APIs and their role in Software ArchitectureExtensible Database APIs and their role in Software Architecture
Extensible Database APIs and their role in Software ArchitectureMax Neunhöffer
 
ArangoDB – Persistência Poliglota e Banco de Dados Multi-Modelos
ArangoDB – Persistência Poliglota e Banco de Dados Multi-ModelosArangoDB – Persistência Poliglota e Banco de Dados Multi-Modelos
ArangoDB – Persistência Poliglota e Banco de Dados Multi-ModelosHelder Santana
 
Domain Driven Design & NoSQL
Domain Driven Design & NoSQLDomain Driven Design & NoSQL
Domain Driven Design & NoSQLArangoDB Database
 
Jan Steemann: Modelling data in a schema free world (Talk held at Froscon, 2...
Jan Steemann: Modelling data in a schema free world  (Talk held at Froscon, 2...Jan Steemann: Modelling data in a schema free world  (Talk held at Froscon, 2...
Jan Steemann: Modelling data in a schema free world (Talk held at Froscon, 2...ArangoDB Database
 
Is multi-model the future of NoSQL?
Is multi-model the future of NoSQL?Is multi-model the future of NoSQL?
Is multi-model the future of NoSQL?Max Neunhöffer
 
Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at...
 Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at... Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at...
Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at...Big Data Spain
 
ArangoDB - Using JavaScript in the database
ArangoDB - Using JavaScript in the databaseArangoDB - Using JavaScript in the database
ArangoDB - Using JavaScript in the databaseArangoDB Database
 
Overhauling a database engine in 2 months
Overhauling a database engine in 2 monthsOverhauling a database engine in 2 months
Overhauling a database engine in 2 monthsMax Neunhöffer
 
guacamole: an Object Document Mapper for ArangoDB
guacamole: an Object Document Mapper for ArangoDBguacamole: an Object Document Mapper for ArangoDB
guacamole: an Object Document Mapper for ArangoDBMax Neunhöffer
 
Rupy2012 ArangoDB Workshop Part1
Rupy2012 ArangoDB Workshop Part1Rupy2012 ArangoDB Workshop Part1
Rupy2012 ArangoDB Workshop Part1ArangoDB Database
 
Domain driven design @FrOSCon
Domain driven design @FrOSConDomain driven design @FrOSCon
Domain driven design @FrOSConArangoDB Database
 
Processing large-scale graphs with Google Pregel
Processing large-scale graphs with Google PregelProcessing large-scale graphs with Google Pregel
Processing large-scale graphs with Google PregelMax Neunhöffer
 

Destacado (20)

Hotcode 2013: Javascript in a database (Part 2)
Hotcode 2013: Javascript in a database (Part 2)Hotcode 2013: Javascript in a database (Part 2)
Hotcode 2013: Javascript in a database (Part 2)
 
Backbone using Extensible Database APIs over HTTP
Backbone using Extensible Database APIs over HTTPBackbone using Extensible Database APIs over HTTP
Backbone using Extensible Database APIs over HTTP
 
GraphDatabases and what we can use them for
GraphDatabases and what we can use them forGraphDatabases and what we can use them for
GraphDatabases and what we can use them for
 
Hotcode 2013: Javascript in a database (Part 1)
Hotcode 2013: Javascript in a database (Part 1)Hotcode 2013: Javascript in a database (Part 1)
Hotcode 2013: Javascript in a database (Part 1)
 
Running MRuby in a Database - ArangoDB - RuPy 2012
Running MRuby in a Database - ArangoDB - RuPy 2012 Running MRuby in a Database - ArangoDB - RuPy 2012
Running MRuby in a Database - ArangoDB - RuPy 2012
 
Multi-model databases and node.js
Multi-model databases and node.jsMulti-model databases and node.js
Multi-model databases and node.js
 
Domain Driven Design & NoSQL
Domain Driven Design & NoSQLDomain Driven Design & NoSQL
Domain Driven Design & NoSQL
 
Extensible Database APIs and their role in Software Architecture
Extensible Database APIs and their role in Software ArchitectureExtensible Database APIs and their role in Software Architecture
Extensible Database APIs and their role in Software Architecture
 
ArangoDB – Persistência Poliglota e Banco de Dados Multi-Modelos
ArangoDB – Persistência Poliglota e Banco de Dados Multi-ModelosArangoDB – Persistência Poliglota e Banco de Dados Multi-Modelos
ArangoDB – Persistência Poliglota e Banco de Dados Multi-Modelos
 
Domain Driven Design & NoSQL
Domain Driven Design & NoSQLDomain Driven Design & NoSQL
Domain Driven Design & NoSQL
 
Multi model-databases
Multi model-databasesMulti model-databases
Multi model-databases
 
Jan Steemann: Modelling data in a schema free world (Talk held at Froscon, 2...
Jan Steemann: Modelling data in a schema free world  (Talk held at Froscon, 2...Jan Steemann: Modelling data in a schema free world  (Talk held at Froscon, 2...
Jan Steemann: Modelling data in a schema free world (Talk held at Froscon, 2...
 
Is multi-model the future of NoSQL?
Is multi-model the future of NoSQL?Is multi-model the future of NoSQL?
Is multi-model the future of NoSQL?
 
Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at...
 Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at... Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at...
Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at...
 
ArangoDB - Using JavaScript in the database
ArangoDB - Using JavaScript in the databaseArangoDB - Using JavaScript in the database
ArangoDB - Using JavaScript in the database
 
Overhauling a database engine in 2 months
Overhauling a database engine in 2 monthsOverhauling a database engine in 2 months
Overhauling a database engine in 2 months
 
guacamole: an Object Document Mapper for ArangoDB
guacamole: an Object Document Mapper for ArangoDBguacamole: an Object Document Mapper for ArangoDB
guacamole: an Object Document Mapper for ArangoDB
 
Rupy2012 ArangoDB Workshop Part1
Rupy2012 ArangoDB Workshop Part1Rupy2012 ArangoDB Workshop Part1
Rupy2012 ArangoDB Workshop Part1
 
Domain driven design @FrOSCon
Domain driven design @FrOSConDomain driven design @FrOSCon
Domain driven design @FrOSCon
 
Processing large-scale graphs with Google Pregel
Processing large-scale graphs with Google PregelProcessing large-scale graphs with Google Pregel
Processing large-scale graphs with Google Pregel
 

Similar a Complex queries in a distributed multi-model database

Max Neunhöffer – Joins and aggregations in a distributed NoSQL DB - NoSQL mat...
Max Neunhöffer – Joins and aggregations in a distributed NoSQL DB - NoSQL mat...Max Neunhöffer – Joins and aggregations in a distributed NoSQL DB - NoSQL mat...
Max Neunhöffer – Joins and aggregations in a distributed NoSQL DB - NoSQL mat...NoSQLmatters
 
06. ElasticSearch : Mapping and Analysis
06. ElasticSearch : Mapping and Analysis06. ElasticSearch : Mapping and Analysis
06. ElasticSearch : Mapping and AnalysisOpenThink Labs
 
Mapping Graph Queries to PostgreSQL
Mapping Graph Queries to PostgreSQLMapping Graph Queries to PostgreSQL
Mapping Graph Queries to PostgreSQLGábor Szárnyas
 
Search Engine-Building with Lucene and Solr
Search Engine-Building with Lucene and SolrSearch Engine-Building with Lucene and Solr
Search Engine-Building with Lucene and SolrKai Chan
 
Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order ...
Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order ...Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order ...
Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order ...Simon Price
 
Powerful Analysis with the Aggregation Pipeline
Powerful Analysis with the Aggregation PipelinePowerful Analysis with the Aggregation Pipeline
Powerful Analysis with the Aggregation PipelineMongoDB
 
Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany
Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, GermanyHarnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany
Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, GermanyAndré Ricardo Barreto de Oliveira
 
Dynamic languages, for software craftmanship group
Dynamic languages, for software craftmanship groupDynamic languages, for software craftmanship group
Dynamic languages, for software craftmanship groupReuven Lerner
 
Analyze one year of radio station songs aired with Spark SQL, Spotify, and Da...
Analyze one year of radio station songs aired with Spark SQL, Spotify, and Da...Analyze one year of radio station songs aired with Spark SQL, Spotify, and Da...
Analyze one year of radio station songs aired with Spark SQL, Spotify, and Da...Paul Leclercq
 
"Powerful Analysis with the Aggregation Pipeline (Tutorial)"
"Powerful Analysis with the Aggregation Pipeline (Tutorial)""Powerful Analysis with the Aggregation Pipeline (Tutorial)"
"Powerful Analysis with the Aggregation Pipeline (Tutorial)"MongoDB
 
What/How to do with GraphQL? - Valentyn Ostakh (ENG) | Ruby Meditation 27
What/How to do with GraphQL? - Valentyn Ostakh (ENG) | Ruby Meditation 27What/How to do with GraphQL? - Valentyn Ostakh (ENG) | Ruby Meditation 27
What/How to do with GraphQL? - Valentyn Ostakh (ENG) | Ruby Meditation 27Ruby Meditation
 
Ontop: Answering SPARQL Queries over Relational Databases
Ontop: Answering SPARQL Queries over Relational DatabasesOntop: Answering SPARQL Queries over Relational Databases
Ontop: Answering SPARQL Queries over Relational DatabasesGuohui Xiao
 
Search Intelligence & MarkLogic Search API
Search Intelligence & MarkLogic Search APISearch Intelligence & MarkLogic Search API
Search Intelligence & MarkLogic Search APIWillThompson78
 
Introduction to source{d} Engine and source{d} Lookout
Introduction to source{d} Engine and source{d} Lookout Introduction to source{d} Engine and source{d} Lookout
Introduction to source{d} Engine and source{d} Lookout source{d}
 
Advanced MongoDB Aggregation Pipelines
Advanced MongoDB Aggregation PipelinesAdvanced MongoDB Aggregation Pipelines
Advanced MongoDB Aggregation PipelinesTom Schreiber
 
MongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines
MongoDB Europe 2016 - Advanced MongoDB Aggregation PipelinesMongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines
MongoDB Europe 2016 - Advanced MongoDB Aggregation PipelinesMongoDB
 
3. python intro
3. python   intro3. python   intro
3. python introin4400
 
Happy Go Programming
Happy Go ProgrammingHappy Go Programming
Happy Go ProgrammingLin Yo-An
 

Similar a Complex queries in a distributed multi-model database (20)

Max Neunhöffer – Joins and aggregations in a distributed NoSQL DB - NoSQL mat...
Max Neunhöffer – Joins and aggregations in a distributed NoSQL DB - NoSQL mat...Max Neunhöffer – Joins and aggregations in a distributed NoSQL DB - NoSQL mat...
Max Neunhöffer – Joins and aggregations in a distributed NoSQL DB - NoSQL mat...
 
06. ElasticSearch : Mapping and Analysis
06. ElasticSearch : Mapping and Analysis06. ElasticSearch : Mapping and Analysis
06. ElasticSearch : Mapping and Analysis
 
Mapping Graph Queries to PostgreSQL
Mapping Graph Queries to PostgreSQLMapping Graph Queries to PostgreSQL
Mapping Graph Queries to PostgreSQL
 
Search Engine-Building with Lucene and Solr
Search Engine-Building with Lucene and SolrSearch Engine-Building with Lucene and Solr
Search Engine-Building with Lucene and Solr
 
Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order ...
Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order ...Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order ...
Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order ...
 
Powerful Analysis with the Aggregation Pipeline
Powerful Analysis with the Aggregation PipelinePowerful Analysis with the Aggregation Pipeline
Powerful Analysis with the Aggregation Pipeline
 
Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany
Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, GermanyHarnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany
Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany
 
Dynamic languages, for software craftmanship group
Dynamic languages, for software craftmanship groupDynamic languages, for software craftmanship group
Dynamic languages, for software craftmanship group
 
Pgbr 2013 fts
Pgbr 2013 ftsPgbr 2013 fts
Pgbr 2013 fts
 
Analyze one year of radio station songs aired with Spark SQL, Spotify, and Da...
Analyze one year of radio station songs aired with Spark SQL, Spotify, and Da...Analyze one year of radio station songs aired with Spark SQL, Spotify, and Da...
Analyze one year of radio station songs aired with Spark SQL, Spotify, and Da...
 
"Powerful Analysis with the Aggregation Pipeline (Tutorial)"
"Powerful Analysis with the Aggregation Pipeline (Tutorial)""Powerful Analysis with the Aggregation Pipeline (Tutorial)"
"Powerful Analysis with the Aggregation Pipeline (Tutorial)"
 
What/How to do with GraphQL? - Valentyn Ostakh (ENG) | Ruby Meditation 27
What/How to do with GraphQL? - Valentyn Ostakh (ENG) | Ruby Meditation 27What/How to do with GraphQL? - Valentyn Ostakh (ENG) | Ruby Meditation 27
What/How to do with GraphQL? - Valentyn Ostakh (ENG) | Ruby Meditation 27
 
Ontop: Answering SPARQL Queries over Relational Databases
Ontop: Answering SPARQL Queries over Relational DatabasesOntop: Answering SPARQL Queries over Relational Databases
Ontop: Answering SPARQL Queries over Relational Databases
 
Easy R
Easy REasy R
Easy R
 
Search Intelligence & MarkLogic Search API
Search Intelligence & MarkLogic Search APISearch Intelligence & MarkLogic Search API
Search Intelligence & MarkLogic Search API
 
Introduction to source{d} Engine and source{d} Lookout
Introduction to source{d} Engine and source{d} Lookout Introduction to source{d} Engine and source{d} Lookout
Introduction to source{d} Engine and source{d} Lookout
 
Advanced MongoDB Aggregation Pipelines
Advanced MongoDB Aggregation PipelinesAdvanced MongoDB Aggregation Pipelines
Advanced MongoDB Aggregation Pipelines
 
MongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines
MongoDB Europe 2016 - Advanced MongoDB Aggregation PipelinesMongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines
MongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines
 
3. python intro
3. python   intro3. python   intro
3. python intro
 
Happy Go Programming
Happy Go ProgrammingHappy Go Programming
Happy Go Programming
 

Más de Max Neunhöffer

Scaling ArangoDB on Mesosphere DCOS
Scaling ArangoDB on Mesosphere DCOSScaling ArangoDB on Mesosphere DCOS
Scaling ArangoDB on Mesosphere DCOSMax Neunhöffer
 
Scaling ArangoDB on Mesosphere DCOS
Scaling ArangoDB on Mesosphere DCOSScaling ArangoDB on Mesosphere DCOS
Scaling ArangoDB on Mesosphere DCOSMax Neunhöffer
 
Experience with C++11 in ArangoDB
Experience with C++11 in ArangoDBExperience with C++11 in ArangoDB
Experience with C++11 in ArangoDBMax Neunhöffer
 

Más de Max Neunhöffer (6)

Deep Dive on ArangoDB
Deep Dive on ArangoDBDeep Dive on ArangoDB
Deep Dive on ArangoDB
 
Scaling ArangoDB on Mesosphere DCOS
Scaling ArangoDB on Mesosphere DCOSScaling ArangoDB on Mesosphere DCOS
Scaling ArangoDB on Mesosphere DCOS
 
Scaling ArangoDB on Mesosphere DCOS
Scaling ArangoDB on Mesosphere DCOSScaling ArangoDB on Mesosphere DCOS
Scaling ArangoDB on Mesosphere DCOS
 
Experience with C++11 in ArangoDB
Experience with C++11 in ArangoDBExperience with C++11 in ArangoDB
Experience with C++11 in ArangoDB
 
Oslo bekk2014
Oslo bekk2014Oslo bekk2014
Oslo bekk2014
 
Oslo baksia2014
Oslo baksia2014Oslo baksia2014
Oslo baksia2014
 

Último

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 

Último (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

Complex queries in a distributed multi-model database

  • 1. Complex queries in a distributed multi-model database Max Neunhöffer Tech Talk Geekdom, 16 March 2015 www.arangodb.com
  • 2. Documents and collections { "_key": "123456", "_id": "chars/123456", "name": "Duck", "firstname": "Donald", "dob": "1934-11-13", "hobbies": ["Golf", "Singing", "Running"], "home": {"town": "Duck town", "street": "Lake Road", "number": 17}, "species": "duck" } 1
  • 3. Documents and collections { "_key": "123456", "_id": "chars/123456", "name": "Duck", "firstname": "Donald", "dob": "1934-11-13", "hobbies": ["Golf", "Singing", "Running"], "home": {"town": "Duck town", "street": "Lake Road", "number": 17}, "species": "duck" } When I say “document”, I mean “JSON”. 1
  • 4. Documents and collections { "_key": "123456", "_id": "chars/123456", "name": "Duck", "firstname": "Donald", "dob": "1934-11-13", "hobbies": ["Golf", "Singing", "Running"], "home": {"town": "Duck town", "street": "Lake Road", "number": 17}, "species": "duck" } When I say “document”, I mean “JSON”. A “collection” is a set of documents in a DB. 1
  • 5. Documents and collections { "_key": "123456", "_id": "chars/123456", "name": "Duck", "firstname": "Donald", "dob": "1934-11-13", "hobbies": ["Golf", "Singing", "Running"], "home": {"town": "Duck town", "street": "Lake Road", "number": 17}, "species": "duck" } When I say “document”, I mean “JSON”. A “collection” is a set of documents in a DB. The DB can inspect the values, allowing for secondary indexes. 1
  • 6. Documents and collections { "_key": "123456", "_id": "chars/123456", "name": "Duck", "firstname": "Donald", "dob": "1934-11-13", "hobbies": ["Golf", "Singing", "Running"], "home": {"town": "Duck town", "street": "Lake Road", "number": 17}, "species": "duck" } When I say “document”, I mean “JSON”. A “collection” is a set of documents in a DB. The DB can inspect the values, allowing for secondary indexes. Or one can just treat the DB as a key/value store. 1
  • 7. Documents and collections { "_key": "123456", "_id": "chars/123456", "name": "Duck", "firstname": "Donald", "dob": "1934-11-13", "hobbies": ["Golf", "Singing", "Running"], "home": {"town": "Duck town", "street": "Lake Road", "number": 17}, "species": "duck" } When I say “document”, I mean “JSON”. A “collection” is a set of documents in a DB. The DB can inspect the values, allowing for secondary indexes. Or one can just treat the DB as a key/value store. Sharding: the data of a collection is distributed between multiple servers. 1
  • 10. Graphs A B D E F G C "likes" "hates" A graph consists of vertices and edges. Graphs model relations, can be directed or undirected. 2
  • 11. Graphs A B D E F G C "likes" "hates" A graph consists of vertices and edges. Graphs model relations, can be directed or undirected. Vertices and edges are documents. 2
  • 12. Graphs A B D E F G C "likes" "hates" A graph consists of vertices and edges. Graphs model relations, can be directed or undirected. Vertices and edges are documents. Every edge has a _from and a _to attribute. 2
  • 13. Graphs A B D E F G C "likes" "hates" A graph consists of vertices and edges. Graphs model relations, can be directed or undirected. Vertices and edges are documents. Every edge has a _from and a _to attribute. The database offers queries and transactions dealing with graphs. 2
  • 14. Graphs A B D E F G C "likes" "hates" A graph consists of vertices and edges. Graphs model relations, can be directed or undirected. Vertices and edges are documents. Every edge has a _from and a _to attribute. The database offers queries and transactions dealing with graphs. For example, paths in the graph are interesting. 2
  • 15. Query 1 Fetch all documents in a collection FOR p IN people RETURN p 3
  • 16. Query 1 Fetch all documents in a collection FOR p IN people RETURN p [ { "name": "Schmidt", "firstname": "Helmut", "hobbies": ["Smoking"]}, { "name": "Neunhöffer", "firstname": "Max", "hobbies": ["Piano", "Golf"]}, ... ] 3
  • 17. Query 1 Fetch all documents in a collection FOR p IN people RETURN p [ { "name": "Schmidt", "firstname": "Helmut", "hobbies": ["Smoking"]}, { "name": "Neunhöffer", "firstname": "Max", "hobbies": ["Piano", "Golf"]}, ... ] (Actually, a cursor is returned.) 3
  • 18. Query 2 Use filtering, sorting and limit FOR p IN people FILTER p.age >= @minage SORT p.name, p.firstname LIMIT @nrlimit RETURN { name: CONCAT(p.name, ", ", p.firstname), age : p.age } 4
  • 19. Query 2 Use filtering, sorting and limit FOR p IN people FILTER p.age >= @minage SORT p.name, p.firstname LIMIT @nrlimit RETURN { name: CONCAT(p.name, ", ", p.firstname), age : p.age } [ { "name": "Neunhöffer, Max", "age": 44 }, { "name": "Schmidt, Helmut", "age": 95 }, ... ] 4
  • 20. Query 3 Aggregation and functions FOR p IN people COLLECT a = p.age INTO L FILTER a >= @minage RETURN { "age": a, "number": LENGTH(L) } 5
  • 21. Query 3 Aggregation and functions FOR p IN people COLLECT a = p.age INTO L FILTER a >= @minage RETURN { "age": a, "number": LENGTH(L) } [ { "age": 18, "number": 10 }, { "age": 19, "number": 17 }, { "age": 20, "number": 12 }, ... ] 5
  • 22. Query 4 Joins FOR p IN @@peoplecollection FOR h IN houses FILTER p._key == h.owner SORT h.streetname, h.housename RETURN { housename: h.housename, streetname: h.streetname, owner: p.name, value: h.value } 6
  • 23. Query 4 Joins FOR p IN @@peoplecollection FOR h IN houses FILTER p._key == h.owner SORT h.streetname, h.housename RETURN { housename: h.housename, streetname: h.streetname, owner: p.name, value: h.value } [ { "housename": "Firlefanz", "streetname": "Meyer street", "owner": "Hans Schmidt", "value": 423000 }, ... ] 6
  • 24. Query 5 Modifying data FOR e IN events FILTER e.timestamp<"2014-09-01T09:53+0200" INSERT e IN oldevents FOR e IN events FILTER e.timestamp<"2014-09-01T09:53+0200" REMOVE e._key IN events 7
  • 25. Query 6 Graph queries FOR x IN GRAPH_SHORTEST_PATH( "routeplanner", "germanCity/Cologne", "frenchCity/Paris", {weight: "distance"} ) RETURN { begin : x.startVertex, end : x.vertex, distance : x.distance, nrPaths : LENGTH(x.paths) } 8
  • 26. Query 6 Graph queries FOR x IN GRAPH_SHORTEST_PATH( "routeplanner", "germanCity/Cologne", "frenchCity/Paris", {weight: "distance"} ) RETURN { begin : x.startVertex, end : x.vertex, distance : x.distance, nrPaths : LENGTH(x.paths) } [ { "begin": "germanCity/Cologne", "end" : {"_id": "frenchCity/Paris", ... }, "distance": 550, "nrPaths": 10 }, ... ] 8
  • 27. Life of a query 1. Text and query parameters come from user 9
  • 28. Life of a query 1. Text and query parameters come from user 2. Parse text, produce abstract syntax tree (AST) 9
  • 29. Life of a query 1. Text and query parameters come from user 2. Parse text, produce abstract syntax tree (AST) 3. Substitute query parameters 9
  • 30. Life of a query 1. Text and query parameters come from user 2. Parse text, produce abstract syntax tree (AST) 3. Substitute query parameters 4. First optimisation: constant expressions, etc. 9
  • 31. Life of a query 1. Text and query parameters come from user 2. Parse text, produce abstract syntax tree (AST) 3. Substitute query parameters 4. First optimisation: constant expressions, etc. 5. Translate AST into an execution plan (EXP) 9
  • 32. Life of a query 1. Text and query parameters come from user 2. Parse text, produce abstract syntax tree (AST) 3. Substitute query parameters 4. First optimisation: constant expressions, etc. 5. Translate AST into an execution plan (EXP) 6. Optimise one EXP, produce many, potentially better EXPs 9
  • 33. Life of a query 1. Text and query parameters come from user 2. Parse text, produce abstract syntax tree (AST) 3. Substitute query parameters 4. First optimisation: constant expressions, etc. 5. Translate AST into an execution plan (EXP) 6. Optimise one EXP, produce many, potentially better EXPs 7. Reason about distribution in cluster 9
  • 34. Life of a query 1. Text and query parameters come from user 2. Parse text, produce abstract syntax tree (AST) 3. Substitute query parameters 4. First optimisation: constant expressions, etc. 5. Translate AST into an execution plan (EXP) 6. Optimise one EXP, produce many, potentially better EXPs 7. Reason about distribution in cluster 8. Optimise distributed EXPs 9
  • 35. Life of a query 1. Text and query parameters come from user 2. Parse text, produce abstract syntax tree (AST) 3. Substitute query parameters 4. First optimisation: constant expressions, etc. 5. Translate AST into an execution plan (EXP) 6. Optimise one EXP, produce many, potentially better EXPs 7. Reason about distribution in cluster 8. Optimise distributed EXPs 9. Estimate costs for all EXPs, and sort by ascending cost 9
  • 36. Life of a query 1. Text and query parameters come from user 2. Parse text, produce abstract syntax tree (AST) 3. Substitute query parameters 4. First optimisation: constant expressions, etc. 5. Translate AST into an execution plan (EXP) 6. Optimise one EXP, produce many, potentially better EXPs 7. Reason about distribution in cluster 8. Optimise distributed EXPs 9. Estimate costs for all EXPs, and sort by ascending cost 10. Instanciate “cheapest” plan, i.e. set up execution engine 9
  • 37. Life of a query 1. Text and query parameters come from user 2. Parse text, produce abstract syntax tree (AST) 3. Substitute query parameters 4. First optimisation: constant expressions, etc. 5. Translate AST into an execution plan (EXP) 6. Optimise one EXP, produce many, potentially better EXPs 7. Reason about distribution in cluster 8. Optimise distributed EXPs 9. Estimate costs for all EXPs, and sort by ascending cost 10. Instanciate “cheapest” plan, i.e. set up execution engine 11. Distribute and link up engines on different servers 9
  • 38. Life of a query 1. Text and query parameters come from user 2. Parse text, produce abstract syntax tree (AST) 3. Substitute query parameters 4. First optimisation: constant expressions, etc. 5. Translate AST into an execution plan (EXP) 6. Optimise one EXP, produce many, potentially better EXPs 7. Reason about distribution in cluster 8. Optimise distributed EXPs 9. Estimate costs for all EXPs, and sort by ascending cost 10. Instanciate “cheapest” plan, i.e. set up execution engine 11. Distribute and link up engines on different servers 12. Execute plan, provide cursor API 9
  • 39. Execution plans FOR a IN collA RETURN {x: a.x, z: b.z} EnumerateCollection a EnumerateCollection b Calculation xx == b.y Filter xx == b.y Singleton Calculation xx Return {x: a.x, z: b.z} Calc {x: a.x, z: b.z} FILTER xx == b.y FOR b IN collB LET xx = a.x Query → EXP 10
  • 40. Execution plans FOR a IN collA RETURN {x: a.x, z: b.z} EnumerateCollection a EnumerateCollection b Calculation xx == b.y Filter xx == b.y Singleton Calculation xx Return {x: a.x, z: b.z} Calc {x: a.x, z: b.z} FILTER xx == b.y FOR b IN collB LET xx = a.x Query → EXP Black arrows are dependencies 10
  • 41. Execution plans FOR a IN collA RETURN {x: a.x, z: b.z} EnumerateCollection a EnumerateCollection b Calculation xx == b.y Filter xx == b.y Singleton Calculation xx Return {x: a.x, z: b.z} Calc {x: a.x, z: b.z} FILTER xx == b.y FOR b IN collB LET xx = a.x Query → EXP Black arrows are dependencies Think of a pipeline 10
  • 42. Execution plans FOR a IN collA RETURN {x: a.x, z: b.z} EnumerateCollection a EnumerateCollection b Calculation xx == b.y Filter xx == b.y Singleton Calculation xx Return {x: a.x, z: b.z} Calc {x: a.x, z: b.z} FILTER xx == b.y FOR b IN collB LET xx = a.x Query → EXP Black arrows are dependencies Think of a pipeline Each node provides a cursor API 10
  • 43. Execution plans FOR a IN collA RETURN {x: a.x, z: b.z} EnumerateCollection a EnumerateCollection b Calculation xx == b.y Filter xx == b.y Singleton Calculation xx Return {x: a.x, z: b.z} Calc {x: a.x, z: b.z} FILTER xx == b.y FOR b IN collB LET xx = a.x Query → EXP Black arrows are dependencies Think of a pipeline Each node provides a cursor API Blocks of “Items” travel through the pipeline 10
  • 44. Execution plans FOR a IN collA RETURN {x: a.x, z: b.z} EnumerateCollection a EnumerateCollection b Calculation xx == b.y Filter xx == b.y Singleton Calculation xx Return {x: a.x, z: b.z} Calc {x: a.x, z: b.z} FILTER xx == b.y FOR b IN collB LET xx = a.x Query → EXP Black arrows are dependencies Think of a pipeline Each node provides a cursor API Blocks of “Items” travel through the pipeline What is an “item”??? 10
  • 45. Pipeline and items FOR a IN collA EnumerateCollection a EnumerateCollection b Singleton Calculation xx FOR b IN collB LET xx = a.x Items have vars a, xx Items have no vars Items are the thingies traveling through the pipeline. 11
  • 46. Pipeline and items FOR a IN collA EnumerateCollection a EnumerateCollection b Singleton Calculation xx FOR b IN collB LET xx = a.x Items have vars a, xx Items have no vars Items are the thingies traveling through the pipeline. An item holds values of those variables in the current frame 11
  • 47. Pipeline and items FOR a IN collA EnumerateCollection a EnumerateCollection b Singleton Calculation xx FOR b IN collB LET xx = a.x Items have vars a, xx Items have no vars Items are the thingies traveling through the pipeline. An item holds values of those variables in the current frame Thus: Items look differently in different parts of the plan 11
  • 48. Pipeline and items FOR a IN collA EnumerateCollection a EnumerateCollection b Singleton Calculation xx FOR b IN collB LET xx = a.x Items have vars a, xx Items have no vars Items are the thingies traveling through the pipeline. An item holds values of those variables in the current frame Thus: Items look differently in different parts of the plan We always deal with blocks of items for performance reasons 11
  • 49. Execution plans FOR a IN collA RETURN {x: a.x, z: b.z} EnumerateCollection a EnumerateCollection b Calculation xx == b.y Filter xx == b.y Singleton Calculation xx Return {x: a.x, z: b.z} Calc {x: a.x, z: b.z} FILTER xx == b.y FOR b IN collB LET xx = a.x 12
  • 50. Move filters up FOR a IN collA FOR b IN collB FILTER a.x == 10 FILTER a.u == b.v RETURN {u:a.u,w:b.w} Singleton EnumColl a EnumColl b Calc a.x == 10 Return {u:a.u,w:b.w} Filter a.u == b.v Calc a.u == b.v Filter a.x == 10 13
  • 51. Move filters up FOR a IN collA FOR b IN collB FILTER a.x == 10 FILTER a.u == b.v RETURN {u:a.u,w:b.w} The result and behaviour does not change, if the first FILTER is pulled out of the inner FOR. Singleton EnumColl a EnumColl b Calc a.x == 10 Return {u:a.u,w:b.w} Filter a.u == b.v Calc a.u == b.v Filter a.x == 10 13
  • 52. Move filters up FOR a IN collA FILTER a.x < 10 FOR b IN collB FILTER a.u == b.v RETURN {u:a.u,w:b.w} The result and behaviour does not change, if the first FILTER is pulled out of the inner FOR. However, the number of items trave- ling in the pipeline is decreased. Singleton EnumColl a Return {u:a.u,w:b.w} Filter a.u == b.v Calc a.u == b.v Calc a.x == 10 EnumColl b Filter a.x == 10 13
  • 53. Move filters up FOR a IN collA FILTER a.x < 10 FOR b IN collB FILTER a.u == b.v RETURN {u:a.u,w:b.w} The result and behaviour does not change, if the first FILTER is pulled out of the inner FOR. However, the number of items trave- ling in the pipeline is decreased. Note that the two FOR statements could be interchanged! Singleton EnumColl a Return {u:a.u,w:b.w} Filter a.u == b.v Calc a.u == b.v Calc a.x == 10 EnumColl b Filter a.x == 10 13
  • 54. Remove unnecessary calculations FOR a IN collA LET L = LENGTH(a.hobbies) FOR b IN collB FILTER a.u == b.v RETURN {h:a.hobbies,w:b.w} Singleton EnumColl a Calc L = ... EnumColl b Calc a.u == b.v Filter a.u == b.v Return {...} 14
  • 55. Remove unnecessary calculations FOR a IN collA LET L = LENGTH(a.hobbies) FOR b IN collB FILTER a.u == b.v RETURN {h:a.hobbies,w:b.w} The Calculation of L is unnecessary! Singleton EnumColl a Calc L = ... EnumColl b Calc a.u == b.v Filter a.u == b.v Return {...} 14
  • 56. Remove unnecessary calculations FOR a IN collA FOR b IN collB FILTER a.u == b.v RETURN {h:a.hobbies,w:b.w} The Calculation of L is unnecessary! (since it cannot throw an exception). Singleton EnumColl a EnumColl b Calc a.u == b.v Filter a.u == b.v Return {...} 14
  • 57. Remove unnecessary calculations FOR a IN collA FOR b IN collB FILTER a.u == b.v RETURN {h:a.hobbies,w:b.w} The Calculation of L is unnecessary! (since it cannot throw an exception). Therefore we can just leave it out. Singleton EnumColl a EnumColl b Calc a.u == b.v Filter a.u == b.v Return {...} 14
  • 58. Use index for FILTER and SORT FOR a IN collA FILTER a.x > 17 && a.x <= 23 && a.y == 10 SORT a.y, a.x RETURN a Singleton EnumColl a Filter ... Calc ... Sort a.y, a.x Return a 15
  • 59. Use index for FILTER and SORT FOR a IN collA FILTER a.x > 17 && a.x <= 23 && a.y == 10 SORT a.y, a.x RETURN a Assume collA has a skiplist index on “y” and “x” (in this order), Singleton EnumColl a Filter ... Calc ... Sort a.y, a.x Return a 15
  • 60. Use index for FILTER and SORT FOR a IN collA FILTER a.x > 17 && a.x <= 23 && a.y == 10 SORT a.y, a.x RETURN a Assume collA has a skiplist index on “y” and “x” (in this order), then we can read off the half-open interval between { y: 10, x: 17 } and { y: 10, x: 23 } from the skiplist index. Singleton Sort a.y, a.x Return a IndexRange a 15
  • 61. Use index for FILTER and SORT FOR a IN collA FILTER a.x > 17 && a.x <= 23 && a.y == 10 SORT a.y, a.x RETURN a Assume collA has a skiplist index on “y” and “x” (in this order), then we can read off the half-open interval between { y: 10, x: 17 } and { y: 10, x: 23 } from the skiplist index. The result will automatically be sorted by y and then by x. Singleton Return a IndexRange a 15
  • 62. Data distribution in a cluster Requests DBserver DBserver DBserver CoordinatorCoordinator 4 2 5 3 11 The shards of a collection are distributed across the DB servers. 16
  • 63. Data distribution in a cluster Requests DBserver DBserver DBserver CoordinatorCoordinator 4 2 5 3 11 The shards of a collection are distributed across the DB servers. The coordinators receive queries and organise their execution 16
  • 67. Modifying queries Fortunately: There can be at most one modifying node in each query. There can be no modifying nodes in subqueries. 18
  • 68. Modifying queries Fortunately: There can be at most one modifying node in each query. There can be no modifying nodes in subqueries. Modifying nodes The modifying node in a query is executed on the DBservers, 18
  • 69. Modifying queries Fortunately: There can be at most one modifying node in each query. There can be no modifying nodes in subqueries. Modifying nodes The modifying node in a query is executed on the DBservers, to this end, we either scatter the items to all DBservers, or, if possible, we distribute each item to the shard that is responsible for the modification. 18
  • 70. Modifying queries Fortunately: There can be at most one modifying node in each query. There can be no modifying nodes in subqueries. Modifying nodes The modifying node in a query is executed on the DBservers, to this end, we either scatter the items to all DBservers, or, if possible, we distribute each item to the shard that is responsible for the modification. Sometimes, we can even optimise away a gather/scatter combination and parallelise completely. 18