SlideShare una empresa de Scribd logo
1 de 24
Budapest University of Technology and Economics
Department of Measurement and Information Systems
Budapest University of Technology and Economics
Fault Tolerant Systems Research Group
INCQUERY-D:
INCREMENTAL QUERIES IN THE CLOUD
Gábor Szárnyas, Benedek Izsó,
István Ráth, Dániel Varró
Overview
 Introduction
 MDE scalability challenges for model queries
 Overview: scaling out in the cloud
 Evaluation: a feasibility study
 Conclusions and future work
SCALABILITY IN MDE
Scalability challenges in MDE
 Complex instance models and queries
 Instance model complexity
o Size
o Structure
 Query complexity
o MDE workloads involve much more complex queries
than typical data-driven applications (e.g. model
validation, transformations, …)
 Scalability challenges arise due to their
combination
Model sizes
 Instance models with several million elements
o AUTOSAR models [1]
o Source code models
o Sensor data
Source: Markus Scheidgen, How Big are Models – An Estimation, 2012. [2]
application model size
software models 0 – 109
sensor data 109
geo-spatial models 109 – 1012
[1] http://wiki.eclipse.org/Auto_IWG_WP2
[2] http://hwl.hu-berlin.de/fileadmin/user_upload/documents/howbig_techreport.pdf
EMF-IncQuery
 State of the art incremental graph query engine
 Open source Eclipse project by BUTE and others
 Typical use cases
o Validation
o Incremental model transformation
o Model synchronization, view maintenance
Single workstation limitations
 Majority of tools mostly work for <1M model
elements due to algorithmic complexity
 Best tools for <10M model elements due to JVM’s
limitations
o A JVM cannot handle 15+ GB heap memory efficiently
o Long GC pauses
o Specialized JVMs (e.g. Azul Systems’ Zing)
• Commercial, experimental
• May require special hardware
 Proposed solution
o Scale out: distributed system
OVERVIEW OF THE
INCQUERY-D APPROACH
In-memory
EMF model
Architecture
In-memory storage
Transaction
Rete
net
Indexer
layer
Indexing
Production network
• Stores intermediate query results
• Propagates changes
EMF-IncQuery
DB shard 0
Architecture
In-memory storageServer 1
DB shard 1
Server 2
DB shard 2
Server 3
DB shard 3
Transaction
Server 0
Rete
net
Indexer
layer
IncQuery-D middleware
Rete net
Distributed indexing,
notification
Distributed persistent
storage
Distributed production network
• Each intermediate node can be allocated
to a different host
• Remote internode communication
EMF-IncQuery IncQuery-D
Rete net
 Asynchronous communication
 Consistency guaranteed by a termination protocol
indexer indexer indexer indexer
production
DB shard 0 DB shard 1 DB shard 2 DB shard 3
IncQuery-D
 Scaling out by…
o Sharding the data
o Sharding the pattern matcher network →
Avoid memory bottleneck
 Further advantages
o Agnostic to the representation of the graph
• Property graph, (EMF, RDF)
• Information from the metamodel is only used for indexing
o Query layer decoupled from the data storage
• Storage layer freely exchangeable
• Indexing is independent of storage features
Scalability considerations
 Construction process
1. Shard the data in the storage layer
2. Derive a Rete net layout from the query
3. Allocate the middleware indexers
4. Allocate the Rete nodes in the cloud
 Design aspects for scalability
o Local resource limitations
o Load balancing
o Minimize remote communication
• Given problem characteristics, global resource requirements can
be calculated
• Approach intrinsically supports dynamic scaling
EVALUATION
 Benchmark goal
o Evaluate the feasibility of the concept
o Measure the scalability characteristics
o Workload profile similar to real world model validation
 Scenarios
o Batch – “traditional” batch graph search
o Incremental – Rete network
 Operations
o Simulates a user’s interaction with a model
o Load and first validation; transformation; revalidation
Evaluation of IncQuery-D
 Load and first validation: load the graph to the databases
and execute the query
 Transformation: query the graph and delete some
elements
 Revalidation: execute the query
Batch graph scenarioIncremental scenario – IncQuery-D
Transformation RevalidationGraphML
DB shards Result set
Load and first
validation
DB shards Result set
 Load and first validation: load the graph to the databases
and initialize the Rete net and retrieve the results
 Revalidation: retrieve the results from the Rete net
 Transformation: incrementally query the graph and
delete some elements, propagate the changes
Batch graph scenarioIncremental scenario – IncQuery-D
Transformation RevalidationGraphML
DB shards Result set
Rete net
Load and first
validation
DB shards Result set
Rete net
Implementation
Server 1
DB shard 1
Server 2
DB shard 2
Server 3
DB shard 3
Transaction
In-memory
EMF model
DB shard 0
Server 0
Rete
net
Indexer
layer
IncQuery-D middleware
Rete net
Neo4j
4 Ubuntu Linux servers
16 GB RAM
2×2.5 GHz Intel Xeon CPU
Detailed benchmark description: http://incquery.net/publications/incquery-d
Cypher
through REST
Akka
(asynchronous
communication)
Akka
(asynchronous
communication)
1
2
4
8
16
32
64
128
256
512
1024
2048
4096
0.1 /
0.008
0.2 /
0.015
0.5 /
0.03
0.9 /
0.06
1.7 /
0.114
3.5 /
0.231
7.1 /
0.47
14.1 /
0.945
28.0 /
1.907
55.8 /
3.853
time[s]
model size [million elements / file size in GB]
Neo4j/Cypher (batch) IncQuery-D (incremental)
Load and first validation phase
Small overhead for
the Rete network’s
construction
50M+: approx. 30 minutesParallel loading of the
graph from a GraphML
representation
1
2
4
8
16
32
64
128
256
512
1024
2048
4096
0.1 /
0.008
0.2 /
0.015
0.5 /
0.03
0.9 /
0.06
1.7 /
0.114
3.5 /
0.231
7.1 /
0.47
14.1 /
0.945
28.0 /
1.907
55.8 /
3.853
time[s]
model size [million elements / file size in GB]
Neo4j/Cypher (batch) IncQuery-D (incremental)
Transformation phase
1. Elementary model query
2. Model manipulation
• Both implemented with Cypher
• The query evaluation time is dominating
• Query is supported by the Rete net
• Only the manipulation implemented with Cypher
• Overhead due to change propagation is negligible
• 1.5 OOM faster
• Performs a transformation
over a 55M model in one
minute
0.25
1
4
16
64
256
1024
4096
0.1 /
0.008
0.2 /
0.015
0.5 /
0.03
0.9 /
0.06
1.7 /
0.114
3.5 /
0.231
7.1 /
0.47
14.1 /
0.945
28.0 /
1.907
55.8 /
3.853
time[s]
model size [million elements / file size in GB]
Neo4j/Cypher (batch) IncQuery-D (incremental)
Revalidation phase
Near instant
response time for
very large models
Different characteristics,
4 OOM for the largest model
Revalidation time is
independent of node size
CONCLUSIONS
Conclusions
 Novel approach for the distributed execution of
incremental graph queries
 Distributed Rete network
o Middleware for change propagation and indexing
o Incremental query layer decoupled from a sharded
graph database
 Results
o Working proof of concept
o Near instantaneous query evaluation up to 50M+
model elements
o Improves scalability of transformations significantly
Future work
 Tooling and automation
o Evolve the prototype into a developer tool
 Explore optimization possibilities
o Allocation of Rete nodes
o Dynamic reallocation of Rete nodes
o Sharding strategy, resource usage, network
communication overhead
 Cloud readiness
 Experiment with distributed EMF model stores
o CDO, MongoEMF, Morsa, …

Más contenido relacionado

La actualidad más candente

Getting started with Innoslate - Systems Engineering
Getting started with Innoslate - Systems EngineeringGetting started with Innoslate - Systems Engineering
Getting started with Innoslate - Systems EngineeringElizabeth Steiner
 
How to Use Innoslate for Advanced Users
How to Use Innoslate for Advanced UsersHow to Use Innoslate for Advanced Users
How to Use Innoslate for Advanced UsersElizabeth Steiner
 
The Internet of Simulations and the agile development of Cyber-physical systems
The Internet of Simulations and the agile development of Cyber-physical systemsThe Internet of Simulations and the agile development of Cyber-physical systems
The Internet of Simulations and the agile development of Cyber-physical systemsSimware
 
Magdalena Stenius: MLOPS Will Change Machine Learning
Magdalena Stenius: MLOPS Will Change Machine LearningMagdalena Stenius: MLOPS Will Change Machine Learning
Magdalena Stenius: MLOPS Will Change Machine LearningLviv Startup Club
 
The Genesis of Holistic Systems Engineering: Completeness and Consistency Man...
The Genesis of Holistic Systems Engineering: Completeness and Consistency Man...The Genesis of Holistic Systems Engineering: Completeness and Consistency Man...
The Genesis of Holistic Systems Engineering: Completeness and Consistency Man...IncQuery Labs
 
The Genesis of Holistic Systems Design
The Genesis of Holistic Systems DesignThe Genesis of Holistic Systems Design
The Genesis of Holistic Systems DesignIncQuery Labs
 
Dive into POOSL : Simulate your systems!
Dive into POOSL : Simulate your systems!Dive into POOSL : Simulate your systems!
Dive into POOSL : Simulate your systems!Obeo
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionProvectus
 
Software architecture patterns
Software architecture patternsSoftware architecture patterns
Software architecture patternsMd. Sadhan Sarker
 
What are the Unique Challenges and Opportunities in Systems for ML?
What are the Unique Challenges and Opportunities in Systems for ML?What are the Unique Challenges and Opportunities in Systems for ML?
What are the Unique Challenges and Opportunities in Systems for ML?Matei Zaharia
 
From Data Science to MLOps
From Data Science to MLOpsFrom Data Science to MLOps
From Data Science to MLOpsCarl W. Handlin
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine LearningC4Media
 
Productionzing ML Model Using MLflow Model Serving
Productionzing ML Model Using MLflow Model ServingProductionzing ML Model Using MLflow Model Serving
Productionzing ML Model Using MLflow Model ServingDatabricks
 
Developing ML-enabled Data Pipelines on Databricks using IDE & CI/CD at Runta...
Developing ML-enabled Data Pipelines on Databricks using IDE & CI/CD at Runta...Developing ML-enabled Data Pipelines on Databricks using IDE & CI/CD at Runta...
Developing ML-enabled Data Pipelines on Databricks using IDE & CI/CD at Runta...Databricks
 
VIATRA 3: A Reactive Model Transformation Platform
VIATRA 3: A Reactive Model Transformation PlatformVIATRA 3: A Reactive Model Transformation Platform
VIATRA 3: A Reactive Model Transformation PlatformÁbel Hegedüs
 
Model versioning done right: A ModelDB 2.0 Walkthrough
Model versioning done right: A ModelDB 2.0 WalkthroughModel versioning done right: A ModelDB 2.0 Walkthrough
Model versioning done right: A ModelDB 2.0 WalkthroughManasi Vartak
 
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google CloudVertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google CloudMárton Kodok
 
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full LifecycleMLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full LifecycleDatabricks
 
Nasscom ml ops webinar
Nasscom ml ops webinarNasscom ml ops webinar
Nasscom ml ops webinarSameer Mahajan
 
Seamless MLOps with Seldon and MLflow
Seamless MLOps with Seldon and MLflowSeamless MLOps with Seldon and MLflow
Seamless MLOps with Seldon and MLflowDatabricks
 

La actualidad más candente (20)

Getting started with Innoslate - Systems Engineering
Getting started with Innoslate - Systems EngineeringGetting started with Innoslate - Systems Engineering
Getting started with Innoslate - Systems Engineering
 
How to Use Innoslate for Advanced Users
How to Use Innoslate for Advanced UsersHow to Use Innoslate for Advanced Users
How to Use Innoslate for Advanced Users
 
The Internet of Simulations and the agile development of Cyber-physical systems
The Internet of Simulations and the agile development of Cyber-physical systemsThe Internet of Simulations and the agile development of Cyber-physical systems
The Internet of Simulations and the agile development of Cyber-physical systems
 
Magdalena Stenius: MLOPS Will Change Machine Learning
Magdalena Stenius: MLOPS Will Change Machine LearningMagdalena Stenius: MLOPS Will Change Machine Learning
Magdalena Stenius: MLOPS Will Change Machine Learning
 
The Genesis of Holistic Systems Engineering: Completeness and Consistency Man...
The Genesis of Holistic Systems Engineering: Completeness and Consistency Man...The Genesis of Holistic Systems Engineering: Completeness and Consistency Man...
The Genesis of Holistic Systems Engineering: Completeness and Consistency Man...
 
The Genesis of Holistic Systems Design
The Genesis of Holistic Systems DesignThe Genesis of Holistic Systems Design
The Genesis of Holistic Systems Design
 
Dive into POOSL : Simulate your systems!
Dive into POOSL : Simulate your systems!Dive into POOSL : Simulate your systems!
Dive into POOSL : Simulate your systems!
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
 
Software architecture patterns
Software architecture patternsSoftware architecture patterns
Software architecture patterns
 
What are the Unique Challenges and Opportunities in Systems for ML?
What are the Unique Challenges and Opportunities in Systems for ML?What are the Unique Challenges and Opportunities in Systems for ML?
What are the Unique Challenges and Opportunities in Systems for ML?
 
From Data Science to MLOps
From Data Science to MLOpsFrom Data Science to MLOps
From Data Science to MLOps
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Productionzing ML Model Using MLflow Model Serving
Productionzing ML Model Using MLflow Model ServingProductionzing ML Model Using MLflow Model Serving
Productionzing ML Model Using MLflow Model Serving
 
Developing ML-enabled Data Pipelines on Databricks using IDE & CI/CD at Runta...
Developing ML-enabled Data Pipelines on Databricks using IDE & CI/CD at Runta...Developing ML-enabled Data Pipelines on Databricks using IDE & CI/CD at Runta...
Developing ML-enabled Data Pipelines on Databricks using IDE & CI/CD at Runta...
 
VIATRA 3: A Reactive Model Transformation Platform
VIATRA 3: A Reactive Model Transformation PlatformVIATRA 3: A Reactive Model Transformation Platform
VIATRA 3: A Reactive Model Transformation Platform
 
Model versioning done right: A ModelDB 2.0 Walkthrough
Model versioning done right: A ModelDB 2.0 WalkthroughModel versioning done right: A ModelDB 2.0 Walkthrough
Model versioning done right: A ModelDB 2.0 Walkthrough
 
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google CloudVertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
 
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full LifecycleMLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
 
Nasscom ml ops webinar
Nasscom ml ops webinarNasscom ml ops webinar
Nasscom ml ops webinar
 
Seamless MLOps with Seldon and MLflow
Seamless MLOps with Seldon and MLflowSeamless MLOps with Seldon and MLflow
Seamless MLOps with Seldon and MLflow
 

Similar a IncQuery-D: Incremental Queries in the Cloud

Using Grid Technologies in the Cloud for High Scalability
Using Grid Technologies in the Cloud for High ScalabilityUsing Grid Technologies in the Cloud for High Scalability
Using Grid Technologies in the Cloud for High Scalabilitymabuhr
 
SERENE 2014 School: Daniel varro serene2014_school
SERENE 2014 School: Daniel varro serene2014_schoolSERENE 2014 School: Daniel varro serene2014_school
SERENE 2014 School: Daniel varro serene2014_schoolHenry Muccini
 
SERENE 2014 School: Incremental Model Queries over the Cloud
SERENE 2014 School: Incremental Model Queries over the CloudSERENE 2014 School: Incremental Model Queries over the Cloud
SERENE 2014 School: Incremental Model Queries over the CloudSERENEWorkshop
 
Cloudsim & greencloud
Cloudsim & greencloud Cloudsim & greencloud
Cloudsim & greencloud nedamaleki87
 
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...Daniel Varro
 
Cloudsim & Green Cloud
Cloudsim & Green CloudCloudsim & Green Cloud
Cloudsim & Green CloudNeda Maleki
 
Decreasing your Coffe Consumption by Incremental Code regeneration
Decreasing your Coffe Consumption by Incremental Code regenerationDecreasing your Coffe Consumption by Incremental Code regeneration
Decreasing your Coffe Consumption by Incremental Code regenerationÁkos Horváth
 
The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?Ivo Andreev
 
MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce Platform
MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce PlatformMongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce Platform
MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce PlatformMongoDB
 
[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...
[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...
[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...Insight Technology, Inc.
 
Iwsm2014 performance measurement for cloud computing applications using iso...
Iwsm2014   performance measurement for cloud computing applications using iso...Iwsm2014   performance measurement for cloud computing applications using iso...
Iwsm2014 performance measurement for cloud computing applications using iso...Nesma
 
Incremental Queries and Transformations for Engineering Critical Systems
Incremental Queries and Transformations for Engineering Critical SystemsIncremental Queries and Transformations for Engineering Critical Systems
Incremental Queries and Transformations for Engineering Critical SystemsÁkos Horváth
 
Scaling Databricks to Run Data and ML Workloads on Millions of VMs
Scaling Databricks to Run Data and ML Workloads on Millions of VMsScaling Databricks to Run Data and ML Workloads on Millions of VMs
Scaling Databricks to Run Data and ML Workloads on Millions of VMsMatei Zaharia
 
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei ZahariaDeep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei ZahariaGoDataDriven
 
MineDB Mineral Resource Evaluation White Paper
MineDB Mineral Resource Evaluation White PaperMineDB Mineral Resource Evaluation White Paper
MineDB Mineral Resource Evaluation White PaperDerek Diamond
 
Simulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud InfrastructuresSimulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud InfrastructuresCloudLightning
 
Advanced Open IoT Platform for Prevention and Early Detection of Forest Fires
Advanced Open IoT Platform for Prevention and Early Detection of Forest FiresAdvanced Open IoT Platform for Prevention and Early Detection of Forest Fires
Advanced Open IoT Platform for Prevention and Early Detection of Forest FiresIvo Andreev
 
IncQuery-D: Distributed Incremental Graph Queries
IncQuery-D: Distributed Incremental Graph QueriesIncQuery-D: Distributed Incremental Graph Queries
IncQuery-D: Distributed Incremental Graph QueriesGábor Szárnyas
 
Spark Summit EU talk by Nick Pentreath
Spark Summit EU talk by Nick PentreathSpark Summit EU talk by Nick Pentreath
Spark Summit EU talk by Nick PentreathSpark Summit
 
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshThe Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshIanFurlong4
 

Similar a IncQuery-D: Incremental Queries in the Cloud (20)

Using Grid Technologies in the Cloud for High Scalability
Using Grid Technologies in the Cloud for High ScalabilityUsing Grid Technologies in the Cloud for High Scalability
Using Grid Technologies in the Cloud for High Scalability
 
SERENE 2014 School: Daniel varro serene2014_school
SERENE 2014 School: Daniel varro serene2014_schoolSERENE 2014 School: Daniel varro serene2014_school
SERENE 2014 School: Daniel varro serene2014_school
 
SERENE 2014 School: Incremental Model Queries over the Cloud
SERENE 2014 School: Incremental Model Queries over the CloudSERENE 2014 School: Incremental Model Queries over the Cloud
SERENE 2014 School: Incremental Model Queries over the Cloud
 
Cloudsim & greencloud
Cloudsim & greencloud Cloudsim & greencloud
Cloudsim & greencloud
 
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...
 
Cloudsim & Green Cloud
Cloudsim & Green CloudCloudsim & Green Cloud
Cloudsim & Green Cloud
 
Decreasing your Coffe Consumption by Incremental Code regeneration
Decreasing your Coffe Consumption by Incremental Code regenerationDecreasing your Coffe Consumption by Incremental Code regeneration
Decreasing your Coffe Consumption by Incremental Code regeneration
 
The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?
 
MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce Platform
MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce PlatformMongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce Platform
MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce Platform
 
[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...
[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...
[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...
 
Iwsm2014 performance measurement for cloud computing applications using iso...
Iwsm2014   performance measurement for cloud computing applications using iso...Iwsm2014   performance measurement for cloud computing applications using iso...
Iwsm2014 performance measurement for cloud computing applications using iso...
 
Incremental Queries and Transformations for Engineering Critical Systems
Incremental Queries and Transformations for Engineering Critical SystemsIncremental Queries and Transformations for Engineering Critical Systems
Incremental Queries and Transformations for Engineering Critical Systems
 
Scaling Databricks to Run Data and ML Workloads on Millions of VMs
Scaling Databricks to Run Data and ML Workloads on Millions of VMsScaling Databricks to Run Data and ML Workloads on Millions of VMs
Scaling Databricks to Run Data and ML Workloads on Millions of VMs
 
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei ZahariaDeep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
 
MineDB Mineral Resource Evaluation White Paper
MineDB Mineral Resource Evaluation White PaperMineDB Mineral Resource Evaluation White Paper
MineDB Mineral Resource Evaluation White Paper
 
Simulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud InfrastructuresSimulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud Infrastructures
 
Advanced Open IoT Platform for Prevention and Early Detection of Forest Fires
Advanced Open IoT Platform for Prevention and Early Detection of Forest FiresAdvanced Open IoT Platform for Prevention and Early Detection of Forest Fires
Advanced Open IoT Platform for Prevention and Early Detection of Forest Fires
 
IncQuery-D: Distributed Incremental Graph Queries
IncQuery-D: Distributed Incremental Graph QueriesIncQuery-D: Distributed Incremental Graph Queries
IncQuery-D: Distributed Incremental Graph Queries
 
Spark Summit EU talk by Nick Pentreath
Spark Summit EU talk by Nick PentreathSpark Summit EU talk by Nick Pentreath
Spark Summit EU talk by Nick Pentreath
 
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshThe Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
 

Más de Gábor Szárnyas

GraphBLAS: A linear algebraic approach for high-performance graph queries
GraphBLAS: A linear algebraic approach for high-performance graph queriesGraphBLAS: A linear algebraic approach for high-performance graph queries
GraphBLAS: A linear algebraic approach for high-performance graph queriesGábor Szárnyas
 
What Makes Graph Queries Difficult?
What Makes Graph Queries Difficult?What Makes Graph Queries Difficult?
What Makes Graph Queries Difficult?Gábor Szárnyas
 
Mapping Graph Queries to PostgreSQL
Mapping Graph Queries to PostgreSQLMapping Graph Queries to PostgreSQL
Mapping Graph Queries to PostgreSQLGábor Szárnyas
 
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...Gábor Szárnyas
 
Incremental View Maintenance for openCypher Queries
Incremental View Maintenance for openCypher QueriesIncremental View Maintenance for openCypher Queries
Incremental View Maintenance for openCypher QueriesGábor Szárnyas
 
Writing a Cypher Engine in Clojure
Writing a Cypher Engine in ClojureWriting a Cypher Engine in Clojure
Writing a Cypher Engine in ClojureGábor Szárnyas
 
Learning Timed Automata with Cypher
Learning Timed Automata with CypherLearning Timed Automata with Cypher
Learning Timed Automata with CypherGábor Szárnyas
 
Időzített automatatanulás Cypherrel
Időzített automatatanulás CypherrelIdőzített automatatanulás Cypherrel
Időzített automatatanulás CypherrelGábor Szárnyas
 
Compiling openCypher graph queries with Spark Catalyst
Compiling openCypher graph queries with Spark CatalystCompiling openCypher graph queries with Spark Catalyst
Compiling openCypher graph queries with Spark CatalystGábor Szárnyas
 
Towards the Characterization of Realistic Models: Evaluation of Multidiscipli...
Towards the Characterization of Realistic Models: Evaluation of Multidiscipli...Towards the Characterization of Realistic Models: Evaluation of Multidiscipli...
Towards the Characterization of Realistic Models: Evaluation of Multidiscipli...Gábor Szárnyas
 
Sharded Joins for Scalable Incremental Graph Queries
Sharded Joins for Scalable Incremental Graph QueriesSharded Joins for Scalable Incremental Graph Queries
Sharded Joins for Scalable Incremental Graph QueriesGábor Szárnyas
 
Towards a Macrobenchmark Framework for Performance Analysis of Java Applications
Towards a Macrobenchmark Framework for Performance Analysis of Java ApplicationsTowards a Macrobenchmark Framework for Performance Analysis of Java Applications
Towards a Macrobenchmark Framework for Performance Analysis of Java ApplicationsGábor Szárnyas
 

Más de Gábor Szárnyas (13)

GraphBLAS: A linear algebraic approach for high-performance graph queries
GraphBLAS: A linear algebraic approach for high-performance graph queriesGraphBLAS: A linear algebraic approach for high-performance graph queries
GraphBLAS: A linear algebraic approach for high-performance graph queries
 
What Makes Graph Queries Difficult?
What Makes Graph Queries Difficult?What Makes Graph Queries Difficult?
What Makes Graph Queries Difficult?
 
Mapping Graph Queries to PostgreSQL
Mapping Graph Queries to PostgreSQLMapping Graph Queries to PostgreSQL
Mapping Graph Queries to PostgreSQL
 
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
 
Incremental View Maintenance for openCypher Queries
Incremental View Maintenance for openCypher QueriesIncremental View Maintenance for openCypher Queries
Incremental View Maintenance for openCypher Queries
 
Writing a Cypher Engine in Clojure
Writing a Cypher Engine in ClojureWriting a Cypher Engine in Clojure
Writing a Cypher Engine in Clojure
 
Learning Timed Automata with Cypher
Learning Timed Automata with CypherLearning Timed Automata with Cypher
Learning Timed Automata with Cypher
 
Időzített automatatanulás Cypherrel
Időzített automatatanulás CypherrelIdőzített automatatanulás Cypherrel
Időzített automatatanulás Cypherrel
 
Parsing process
Parsing processParsing process
Parsing process
 
Compiling openCypher graph queries with Spark Catalyst
Compiling openCypher graph queries with Spark CatalystCompiling openCypher graph queries with Spark Catalyst
Compiling openCypher graph queries with Spark Catalyst
 
Towards the Characterization of Realistic Models: Evaluation of Multidiscipli...
Towards the Characterization of Realistic Models: Evaluation of Multidiscipli...Towards the Characterization of Realistic Models: Evaluation of Multidiscipli...
Towards the Characterization of Realistic Models: Evaluation of Multidiscipli...
 
Sharded Joins for Scalable Incremental Graph Queries
Sharded Joins for Scalable Incremental Graph QueriesSharded Joins for Scalable Incremental Graph Queries
Sharded Joins for Scalable Incremental Graph Queries
 
Towards a Macrobenchmark Framework for Performance Analysis of Java Applications
Towards a Macrobenchmark Framework for Performance Analysis of Java ApplicationsTowards a Macrobenchmark Framework for Performance Analysis of Java Applications
Towards a Macrobenchmark Framework for Performance Analysis of Java Applications
 

Último

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 

Último (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 

IncQuery-D: Incremental Queries in the Cloud

  • 1. Budapest University of Technology and Economics Department of Measurement and Information Systems Budapest University of Technology and Economics Fault Tolerant Systems Research Group INCQUERY-D: INCREMENTAL QUERIES IN THE CLOUD Gábor Szárnyas, Benedek Izsó, István Ráth, Dániel Varró
  • 2. Overview  Introduction  MDE scalability challenges for model queries  Overview: scaling out in the cloud  Evaluation: a feasibility study  Conclusions and future work
  • 4. Scalability challenges in MDE  Complex instance models and queries  Instance model complexity o Size o Structure  Query complexity o MDE workloads involve much more complex queries than typical data-driven applications (e.g. model validation, transformations, …)  Scalability challenges arise due to their combination
  • 5. Model sizes  Instance models with several million elements o AUTOSAR models [1] o Source code models o Sensor data Source: Markus Scheidgen, How Big are Models – An Estimation, 2012. [2] application model size software models 0 – 109 sensor data 109 geo-spatial models 109 – 1012 [1] http://wiki.eclipse.org/Auto_IWG_WP2 [2] http://hwl.hu-berlin.de/fileadmin/user_upload/documents/howbig_techreport.pdf
  • 6. EMF-IncQuery  State of the art incremental graph query engine  Open source Eclipse project by BUTE and others  Typical use cases o Validation o Incremental model transformation o Model synchronization, view maintenance
  • 7. Single workstation limitations  Majority of tools mostly work for <1M model elements due to algorithmic complexity  Best tools for <10M model elements due to JVM’s limitations o A JVM cannot handle 15+ GB heap memory efficiently o Long GC pauses o Specialized JVMs (e.g. Azul Systems’ Zing) • Commercial, experimental • May require special hardware  Proposed solution o Scale out: distributed system
  • 9. In-memory EMF model Architecture In-memory storage Transaction Rete net Indexer layer Indexing Production network • Stores intermediate query results • Propagates changes EMF-IncQuery
  • 10. DB shard 0 Architecture In-memory storageServer 1 DB shard 1 Server 2 DB shard 2 Server 3 DB shard 3 Transaction Server 0 Rete net Indexer layer IncQuery-D middleware Rete net Distributed indexing, notification Distributed persistent storage Distributed production network • Each intermediate node can be allocated to a different host • Remote internode communication EMF-IncQuery IncQuery-D
  • 11. Rete net  Asynchronous communication  Consistency guaranteed by a termination protocol indexer indexer indexer indexer production DB shard 0 DB shard 1 DB shard 2 DB shard 3
  • 12. IncQuery-D  Scaling out by… o Sharding the data o Sharding the pattern matcher network → Avoid memory bottleneck  Further advantages o Agnostic to the representation of the graph • Property graph, (EMF, RDF) • Information from the metamodel is only used for indexing o Query layer decoupled from the data storage • Storage layer freely exchangeable • Indexing is independent of storage features
  • 13. Scalability considerations  Construction process 1. Shard the data in the storage layer 2. Derive a Rete net layout from the query 3. Allocate the middleware indexers 4. Allocate the Rete nodes in the cloud  Design aspects for scalability o Local resource limitations o Load balancing o Minimize remote communication • Given problem characteristics, global resource requirements can be calculated • Approach intrinsically supports dynamic scaling
  • 15.  Benchmark goal o Evaluate the feasibility of the concept o Measure the scalability characteristics o Workload profile similar to real world model validation  Scenarios o Batch – “traditional” batch graph search o Incremental – Rete network  Operations o Simulates a user’s interaction with a model o Load and first validation; transformation; revalidation Evaluation of IncQuery-D
  • 16.  Load and first validation: load the graph to the databases and execute the query  Transformation: query the graph and delete some elements  Revalidation: execute the query Batch graph scenarioIncremental scenario – IncQuery-D Transformation RevalidationGraphML DB shards Result set Load and first validation DB shards Result set
  • 17.  Load and first validation: load the graph to the databases and initialize the Rete net and retrieve the results  Revalidation: retrieve the results from the Rete net  Transformation: incrementally query the graph and delete some elements, propagate the changes Batch graph scenarioIncremental scenario – IncQuery-D Transformation RevalidationGraphML DB shards Result set Rete net Load and first validation DB shards Result set Rete net
  • 18. Implementation Server 1 DB shard 1 Server 2 DB shard 2 Server 3 DB shard 3 Transaction In-memory EMF model DB shard 0 Server 0 Rete net Indexer layer IncQuery-D middleware Rete net Neo4j 4 Ubuntu Linux servers 16 GB RAM 2×2.5 GHz Intel Xeon CPU Detailed benchmark description: http://incquery.net/publications/incquery-d Cypher through REST Akka (asynchronous communication) Akka (asynchronous communication)
  • 19. 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 0.1 / 0.008 0.2 / 0.015 0.5 / 0.03 0.9 / 0.06 1.7 / 0.114 3.5 / 0.231 7.1 / 0.47 14.1 / 0.945 28.0 / 1.907 55.8 / 3.853 time[s] model size [million elements / file size in GB] Neo4j/Cypher (batch) IncQuery-D (incremental) Load and first validation phase Small overhead for the Rete network’s construction 50M+: approx. 30 minutesParallel loading of the graph from a GraphML representation
  • 20. 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 0.1 / 0.008 0.2 / 0.015 0.5 / 0.03 0.9 / 0.06 1.7 / 0.114 3.5 / 0.231 7.1 / 0.47 14.1 / 0.945 28.0 / 1.907 55.8 / 3.853 time[s] model size [million elements / file size in GB] Neo4j/Cypher (batch) IncQuery-D (incremental) Transformation phase 1. Elementary model query 2. Model manipulation • Both implemented with Cypher • The query evaluation time is dominating • Query is supported by the Rete net • Only the manipulation implemented with Cypher • Overhead due to change propagation is negligible • 1.5 OOM faster • Performs a transformation over a 55M model in one minute
  • 21. 0.25 1 4 16 64 256 1024 4096 0.1 / 0.008 0.2 / 0.015 0.5 / 0.03 0.9 / 0.06 1.7 / 0.114 3.5 / 0.231 7.1 / 0.47 14.1 / 0.945 28.0 / 1.907 55.8 / 3.853 time[s] model size [million elements / file size in GB] Neo4j/Cypher (batch) IncQuery-D (incremental) Revalidation phase Near instant response time for very large models Different characteristics, 4 OOM for the largest model Revalidation time is independent of node size
  • 23. Conclusions  Novel approach for the distributed execution of incremental graph queries  Distributed Rete network o Middleware for change propagation and indexing o Incremental query layer decoupled from a sharded graph database  Results o Working proof of concept o Near instantaneous query evaluation up to 50M+ model elements o Improves scalability of transformations significantly
  • 24. Future work  Tooling and automation o Evolve the prototype into a developer tool  Explore optimization possibilities o Allocation of Rete nodes o Dynamic reallocation of Rete nodes o Sharding strategy, resource usage, network communication overhead  Cloud readiness  Experiment with distributed EMF model stores o CDO, MongoEMF, Morsa, …