Thesis Defense (Gwendal DANIEL) - Nov 2017

Efficient Persistence, Query, and
Transformation of Large Models
Gwendal DANIEL
Inria, IMT Atlantique, LS2N
gwendal.daniel@inria.fr
Supervisor: Jordi Cabot
Co-Supervisor: Gerson Sunyé
Co-Supervisor: Massimo Tisi

Context
“A model of a system or process is a theoretical description that can help you understand how the
system or process works, or how it might work.’’ (Collins 2017)
2
Credit: https://www.autodesk.com/solutions/bim

Context
• Software Modeling
3
Forward Engineering
Reverse Engineering

Context
4
Injection
Transformations
Queries
Generation
Model Driven
Reverse Engineering

Context
5
Injection
Transformations
Queries
Generation
Model Driven
Reverse Engineering
Execution Time
Latencies
Memory
Consumption
Limited adoption in the industry
• Scalability of existing solutions
• Large generated models
[J. Whittle et. al., 2014]

Research Challenges
1. Storing large models
• Efficient data structures
• Efficient persistence solutions
• Efficient memory management
6
2. Accessing stored models efficiently
• Data representation
• Caching and prefetching
3. Querying large models
• Complex model navigations
• Efficient constraint checking
4. Transforming large models
• Refactoring operations
• Code generation

Contributions
• A novel scalable modeling
ecosystem
7
MogwaïGremlin-ATL
NeoEMF PrefetchML
Scalable Model Persistence
Efficient
Transformations
Efficient Queries
1. A multi-database model
persistence solution
4. A scalable model transformation
engine
3. An OCL to database query
language approach
2. A model prefetching and caching
component

Research Process
8
State of the Art
Analysis
Problematic
Definition
Hypothesis &
Solution
Prototype
Implementation
Evaluation &
Analysis
Hypothesis
Validation and
Reformulation
Challenge
Identification

Efficient Model Persistence
9
NeoEMF
MogwaïGremlin-ATL
NeoEMF PrefetchML
Efficient
Transformations
Efficient Queries

Model Persistence
• Default serialization mechanism: XMI
• Scalable model persistence frameworks: CDO, Morsa
• Use databases to store models
• Lazy loading
• Low memory footprint
10
NeoEMF
State of the Art
Analysis

Problematic
• Performance & memory issues
• [Pagan et.al., 2011; Gomez et. al., 2015]
• Mostly relational solutions
• Generic scalability improvements
• Single storage solutions
• Not aware of the modeling scenario
• External (non-standard) APIs
11
NeoEMF
Problematic
Definition

NeoEMF
• A rich infrastructure that allows to choose the database and data
representation to use.
• Compliant with existing modeling solutions
• Extensible architecture
• Memory efficiency
12
NeoEMF
Hypothesis
& Solution

NeoEMF
• NoSQL persistence
• Specific purpose
• Efficient to process large amount of data
• Built-in scaling capabilities
• Advanced query language
13
NeoEMF
Hypothesis
& solution

NeoEMF
14
Model-Based Tools
Modeling Framework
Persistence
Framework
NeoEMF Core
/Graph
NeoEMF Advanced API
Importers
/Map /Column
Data-store
Connector
Model Access API
Persistence API
Backend Connector
Backend API
Caching
Blueprints MapDB
BerkeleyDB
HBase +
Zookeeper
NeoEMF
Prototype
Implementation

NeoEMF
• Implementation
• Eclipse plugin
• Maven central
• Open source (EPL v2)
• 1700 Unit Tests
• 80% code coverage
15
NeoEMF
Prototype
Implementation

Evaluation
• Reference Benchmarks
• Transformation Tool Contest (TTC)
• Grabats
• Model Driven Engineering integration
• MoDisco
• Persistence layer for the MONDO platform
• European project on MDE scalability
• JMH Benchmarks
• Reproducible using a Docker image
16
NeoEMF
Evaluation

Evaluation
17
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
set1 set2 set3
Invisible Methods Results (ms)
EMF-XMI CDO NeoEMF/Graph NeoEMF/Map
OutOfMemory
549892
693537
… …
NeoEMF
3 models (6000 to 1.5M elements)
Constrained memory environment
4 17
Evaluation
& Analysis

Model Caching and Prefetching
18
PrefetchML
MogwaïGremlin-ATL
NeoEMF PrefetchML
Efficient
Transformations
Efficient Queries

Prefetching & Caching
• Prefetching
• Bring objects in memory before they are requested
• Caching
• Retain objects in memory to speed-up their access
• Integrated in databases and file systems
• Speeds-up I/O intensive applications
19
PrefetchML
State of the Art
Analysis

Problematic
• Database prefetchers & caches
• Lack of fine-grained configuration
• Tightly coupled to data representation
• Not common in NoSQL stores
• Prefetching components in persistence frameworks
• Not expressive enough
• Lack of fine-grained strategies
20
PrefetchML
Problematic
Definition

Hypothesis
• We need to prefetch and cache models
• Scalable persistence frameworks
• Databases to store large models (relational, NoSQL)
• Latencies to bring elements from the database (lazy-loading)
• Metamodel information
• High-level prefetching and caching rules
• Decoupled from the data representation
21
PrefetchML
Hypothesis
& Solution

PrefetchML
• A prefetching and caching framework at the model level
• PrefetchML DSL (rule definition)
• Metamodel and model level
• Use case dependent
• Readable
• PrefetchML Engine (rule execution)
• Datastore independent
• Transparent to the persistence solution
22
PrefetchML
Hypothesis
& Solution

PrefetchML
Import ‘http://www.example.com/Java’
plan samplePlan {
rule r1: on starting
fetch Package.allInstances()
}
23
rule r2: on access type Class
fetch self.methods.modifier
use cache LRU[size=100, chunk=10]
(self.name = ‘MyClass’)
remove type Package
p1: Package
+ name = 'package1'
c1: Class
+ name = 'class1'
c2: Class
+ name = 'class2'
m1: Method
+ name = 'method1'
t1: Type
+ name = 'void'
mod1: Modifier
+ visibility = #private
classes
classes
imports
methods
returnType
modifier
PrefetchML
Prototype
Implementation

Evaluation
• Up to 18% improvement on NeoEMF/Graph
• Up to 95% improvement on NeoEMF/Map
• Invalid plans detected at runtime
• New rule suggestions
24
PrefetchML
Evaluation
& Analysis

Efficient Model Queries
25
Mogwaï
MogwaïGremlin-ATL
NeoEMF PrefetchML
Efficient
Transformations
Efficient Queries

State of the Art
26
API Call1
…
API Calln
OCL
Query
OCL
Interpreter
EMF API Database
Package.allInstances().name
get(p1)
get(p1,name)
…
get(pn)
get(pn,name)
Modeler
Point of View
Under the Hood
Low-level modeling API
→ Not aligned with the
database capabilities
Fragmented queries
→ Not efficient
→ Remote database
Intermediate objects
→ Memory consumption
→ Execution time overhead
Mogwaï
State of the Art
Analysis

Problematic
• Not efficient to compute model queries
• Database queries are more efficient but
• Modern persistence frameworks typically rely on NoSQL databases
• Multiple query languages
• Multiple data representations
• Low-level queries are hard to understand and maintain
• Modeling expertise vs. Database expertise
• Solution: generate them!
27
Mogwaï
Problematic
Definition

Mogwaï
28
OCL
Query
Model
Transformation
Gremlin
Traversal
Blueprints
Database
Reification
Modeler
v1
v3
v2
p1: Package
p2: Package
p3: Package
• Generate graph database
queries from OCL expressions
• Bypass modeling framework
API
• Gremlin Query Language
Mogwaï
Package.allInstances()
• Single execution of the queries
• Compatible with existing
modeling frameworks
Hypothesis
& Solution

Mogwai
• Gremlin metamodel
• ~ 100 classes
• ATL Transformation
• OCL to Gremlin mapping
• Query composition
• 70 rules and helpers
29
Mogwaï
Package.allInstances().classes
->select(c | c.name = ‘class1’)
g.idx(‘’metaclasses’’)[[name:’’Package’’]]
inE(‘’instanceof’’).outV.
outE(‘’classes’’).inV.
filter{it.name=‘class1’}
OCL Expression Gremlin Steps
Type g.Idx(‘’metaclasses’’)[[name:’Type’]]
AllInstances() inE(‘instanceof’).outV
Collect(att) Att
Collect(ref) outE(‘ref’).inV
Select(cond) Filter{cond}
oclIsTypeOf(Type) outE(‘instanceof’).inV.transform{it.next() ==
Type}
Prototype
Implementation

30
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
set2 set3
Invisible Methods Results (ms)
OCL IncQuery Mogwaï NeoEMF/Map
175000
……
606000
Mogwaï
2 models (80000 to 1.5M elements)
Constrained memory environment
NeoEMF/Graph
Evaluation
Evaluation
& Analysis

Scalable Model Transformations
31
Gremlin-ATL
MogwaïGremlin-ATL
NeoEMF PrefetchML
Efficient
Transformations
Efficient Queries

• Extension of Mogwaï for model transformations
• ATL to Gremlin transformation
• Multiple database support
• Advanced configuration
Gremlin-ATL
32
ATLtoGremlin
Transformation
Gremlin
Traversal
Output
DatabaseATL Transformation
Input
Database
• Evaluation: reference transformation
• Comparison with the standard ATL engine
• Reduces the execution time by a factor of 2 to 134
• Reduces the memory consumption by a factor of 10
Gremlin-ATL

Conclusion
• Novel modeling ecosystem
• Scalable persistence approach
• Multi-databases: adapted to the expected modeling scenario
• Compliant with existing modeling solutions
• On-demand loading
• Prefetching and caching component
• Improve large model access efficiency
• Define precise plans adapted to a given model access pattern
34
Scalable Model
Persistence

Conclusion
• Novel modeling ecosystem
• Translation-based approach for query and
transformation computation
• Standard language support
• Generate database-specific operations
• Bypasses modeling framework API
• Positive performances on large models
35
Efficient Model
Queries & Transformations

Perspectives & Future Work
• NeoEMF
• Combine multiple databases
• Multi-resource database
• Integrate collaborative modeling techniques
• PrefetchML
• Automatic plan generation
• Static analysis: existing code
• Dynamic analysis: execution logs
• Advanced monitoring component
• Rule creation / deletion
• Rule modifications
36

Perspectives & Future Work
• Application outside the MDE community
• UML to GraphDB
• Use NeoEMF internal mapping and Mogwaï to generate graph code from UML
• Gremlin-ATL to express data migration operations
• Promising results on the Neo4j panama paper database
• Native database prefetching with PrefetchML
• A model-based prefetcher for NoSQL databases
37

Publications
• Journals
• G. Daniel, G. Sunyé, A. Benelallam, M. Tisi, Y. Vernageau, A. Gomez & J. Cabot
NeoEMF, a Multi-Database Model Persistence Framework for Very Large Models. SCP 2017
• G. Daniel, G. Sunyé, J. Cabot
Advanced Prefetching and Caching of Models with PrefetchML. SoSym 2018 (accepted)
• International Conferences
Mogwaï, a Framework to Handle Complex Queries on Large Models. RCIS 2016
PrefetchML, a Framework for Prefetching and Caching Models. MoDELS 2016. Distinguished paper award
UMLtoGraphDB: Mapping Conceptual Schemas to Graph Databases. ER 2016
• G. Daniel, F. Jouault, G. Sunyé, J. Cabot
Gremlin-ATL: a Scalable Model Transformation Framework. ASE 2017
• 3 Workshops
38

Websites & Software Repositories
NeoEMF: neoemf.com
PrefetchML: github.com/atlanmod/Prefetching_Caching_DSL
Mogwaï/Gremlin-ATL: github.com/atlanmod/mogwai
UmltoGraphDB : github.com/atlanmod/uml2nosql
39

40
Question?
Thank you for your attention!
☺ P, F, M, A, P-Y, R, M, G, E, J, S

Thesis Defense (Gwendal DANIEL) - Nov 2017

Recomendados

Recomendados

Más contenido relacionado

Similar a Thesis Defense (Gwendal DANIEL) - Nov 2017

Similar a Thesis Defense (Gwendal DANIEL) - Nov 2017 (20)

Último

Último (20)

Thesis Defense (Gwendal DANIEL) - Nov 2017