Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Thesis Defense (Gwendal DANIEL) - Nov 2017

406 visualizaciones

Publicado el

The slides I used during my thesis defense at IMT Atlantique, Nantes, the 14th of November 2017.

  • Inicia sesión para ver los comentarios

  • Sé el primero en recomendar esto

Thesis Defense (Gwendal DANIEL) - Nov 2017

  1. 1. Efficient Persistence, Query, and Transformation of Large Models Gwendal DANIEL Inria, IMT Atlantique, LS2N gwendal.daniel@inria.fr Supervisor: Jordi Cabot Co-Supervisor: Gerson Sunyé Co-Supervisor: Massimo Tisi
  2. 2. Context “A model of a system or process is a theoretical description that can help you understand how the system or process works, or how it might work.’’ (Collins 2017) 2 Credit: https://www.autodesk.com/solutions/bim
  3. 3. Context • Software Modeling 3 Forward Engineering Reverse Engineering
  4. 4. Context 4 Injection Transformations Queries Generation Model Driven Reverse Engineering
  5. 5. Context 5 Injection Transformations Queries Generation Model Driven Reverse Engineering Execution Time Latencies Memory Consumption Limited adoption in the industry • Scalability of existing solutions • Large generated models [J. Whittle et. al., 2014]
  6. 6. Research Challenges 1. Storing large models • Efficient data structures • Efficient persistence solutions • Efficient memory management 6 2. Accessing stored models efficiently • Data representation • Caching and prefetching 3. Querying large models • Complex model navigations • Efficient constraint checking 4. Transforming large models • Refactoring operations • Code generation
  7. 7. Contributions • A novel scalable modeling ecosystem 7 MogwaïGremlin-ATL NeoEMF PrefetchML Scalable Model Persistence Efficient Transformations Efficient Queries 1. A multi-database model persistence solution 4. A scalable model transformation engine 3. An OCL to database query language approach 2. A model prefetching and caching component
  8. 8. Research Process 8 State of the Art Analysis Problematic Definition Hypothesis & Solution Prototype Implementation Evaluation & Analysis Hypothesis Validation and Reformulation Challenge Identification
  9. 9. Efficient Model Persistence 9 NeoEMF MogwaïGremlin-ATL NeoEMF PrefetchML Scalable Model Persistence Efficient Transformations Efficient Queries
  10. 10. Model Persistence • Default serialization mechanism: XMI • Scalable model persistence frameworks: CDO, Morsa • Use databases to store models • Lazy loading • Low memory footprint 10 NeoEMF State of the Art Analysis
  11. 11. Problematic • Performance & memory issues • [Pagan et.al., 2011; Gomez et. al., 2015] • Mostly relational solutions • Generic scalability improvements • Single storage solutions • Not aware of the modeling scenario • External (non-standard) APIs 11 NeoEMF Problematic Definition
  12. 12. NeoEMF • A rich infrastructure that allows to choose the database and data representation to use. • Compliant with existing modeling solutions • Extensible architecture • Memory efficiency 12 NeoEMF Hypothesis & Solution
  13. 13. NeoEMF • NoSQL persistence • Specific purpose • Efficient to process large amount of data • Built-in scaling capabilities • Advanced query language 13 NeoEMF Hypothesis & solution
  14. 14. NeoEMF 14 Model-Based Tools Modeling Framework Persistence Framework NeoEMF Core /Graph NeoEMF Advanced API Importers /Map /Column Data-store Connector Model Access API Persistence API Backend Connector Backend API Caching Blueprints MapDB BerkeleyDB HBase + Zookeeper NeoEMF Prototype Implementation
  15. 15. NeoEMF • Implementation • Eclipse plugin • Maven central • Open source (EPL v2) • 1700 Unit Tests • 80% code coverage 15 NeoEMF Prototype Implementation
  16. 16. Evaluation • Reference Benchmarks • Transformation Tool Contest (TTC) • Grabats • Model Driven Engineering integration • MoDisco • Persistence layer for the MONDO platform • European project on MDE scalability • JMH Benchmarks • Reproducible using a Docker image 16 NeoEMF Evaluation
  17. 17. Evaluation 17 0 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 set1 set2 set3 Invisible Methods Results (ms) EMF-XMI CDO NeoEMF/Graph NeoEMF/Map OutOfMemory 549892 693537 … … NeoEMF 3 models (6000 to 1.5M elements) Constrained memory environment 4 17 Evaluation & Analysis
  18. 18. Model Caching and Prefetching 18 PrefetchML MogwaïGremlin-ATL NeoEMF PrefetchML Scalable Model Persistence Efficient Transformations Efficient Queries
  19. 19. Prefetching & Caching • Prefetching • Bring objects in memory before they are requested • Caching • Retain objects in memory to speed-up their access • Integrated in databases and file systems • Speeds-up I/O intensive applications 19 PrefetchML State of the Art Analysis
  20. 20. Problematic • Database prefetchers & caches • Lack of fine-grained configuration • Tightly coupled to data representation • Not common in NoSQL stores • Prefetching components in persistence frameworks • Not expressive enough • Lack of fine-grained strategies 20 PrefetchML Problematic Definition
  21. 21. Hypothesis • We need to prefetch and cache models • Scalable persistence frameworks • Databases to store large models (relational, NoSQL) • Latencies to bring elements from the database (lazy-loading) • Metamodel information • High-level prefetching and caching rules • Decoupled from the data representation 21 PrefetchML Hypothesis & Solution
  22. 22. PrefetchML • A prefetching and caching framework at the model level • PrefetchML DSL (rule definition) • Metamodel and model level • Use case dependent • Readable • PrefetchML Engine (rule execution) • Datastore independent • Transparent to the persistence solution 22 PrefetchML Hypothesis & Solution
  23. 23. PrefetchML Import ‘http://www.example.com/Java’ plan samplePlan { rule r1: on starting fetch Package.allInstances() } 23 rule r2: on access type Class fetch self.methods.modifier use cache LRU[size=100, chunk=10] (self.name = ‘MyClass’) remove type Package p1: Package + name = 'package1' c1: Class + name = 'class1' c2: Class + name = 'class2' m1: Method + name = 'method1' t1: Type + name = 'void' mod1: Modifier + visibility = #private classes classes imports methods returnType modifier PrefetchML Prototype Implementation
  24. 24. Evaluation • Up to 18% improvement on NeoEMF/Graph • Up to 95% improvement on NeoEMF/Map • Invalid plans detected at runtime • New rule suggestions 24 PrefetchML Evaluation & Analysis
  25. 25. Efficient Model Queries 25 Mogwaï MogwaïGremlin-ATL NeoEMF PrefetchML Scalable Model Persistence Efficient Transformations Efficient Queries
  26. 26. State of the Art 26 API Call1 … API Calln OCL Query OCL Interpreter EMF API Database Package.allInstances().name get(p1) get(p1,name) … get(pn) get(pn,name) Modeler Point of View Under the Hood Low-level modeling API → Not aligned with the database capabilities Fragmented queries → Not efficient → Remote database Intermediate objects → Memory consumption → Execution time overhead Mogwaï State of the Art Analysis
  27. 27. Problematic • Not efficient to compute model queries • Database queries are more efficient but • Modern persistence frameworks typically rely on NoSQL databases • Multiple query languages • Multiple data representations • Low-level queries are hard to understand and maintain • Modeling expertise vs. Database expertise • Solution: generate them! 27 Mogwaï Problematic Definition
  28. 28. Mogwaï 28 OCL Query Model Transformation Gremlin Traversal Blueprints Database Reification Modeler v1 v3 v2 p1: Package p2: Package p3: Package • Generate graph database queries from OCL expressions • Bypass modeling framework API • Gremlin Query Language Mogwaï Package.allInstances() • Single execution of the queries • Compatible with existing modeling frameworks Hypothesis & Solution
  29. 29. Mogwai • Gremlin metamodel • ~ 100 classes • ATL Transformation • OCL to Gremlin mapping • Query composition • 70 rules and helpers 29 Mogwaï Package.allInstances().classes ->select(c | c.name = ‘class1’) g.idx(‘’metaclasses’’)[[name:’’Package’’]] inE(‘’instanceof’’).outV. outE(‘’classes’’).inV. filter{it.name=‘class1’} OCL Expression Gremlin Steps Type g.Idx(‘’metaclasses’’)[[name:’Type’]] AllInstances() inE(‘instanceof’).outV Collect(att) Att Collect(ref) outE(‘ref’).inV Select(cond) Filter{cond} oclIsTypeOf(Type) outE(‘instanceof’).inV.transform{it.next() == Type} Prototype Implementation
  30. 30. 30 0 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 set2 set3 Invisible Methods Results (ms) OCL IncQuery Mogwaï NeoEMF/Map 175000 …… 606000 Mogwaï 2 models (80000 to 1.5M elements) Constrained memory environment NeoEMF/Graph Evaluation Evaluation & Analysis
  31. 31. Scalable Model Transformations 31 Gremlin-ATL MogwaïGremlin-ATL NeoEMF PrefetchML Scalable Model Persistence Efficient Transformations Efficient Queries
  32. 32. • Extension of Mogwaï for model transformations • ATL to Gremlin transformation • Multiple database support • Advanced configuration Gremlin-ATL 32 ATLtoGremlin Transformation Gremlin Traversal Output DatabaseATL Transformation Input Database • Evaluation: reference transformation • Comparison with the standard ATL engine • Reduces the execution time by a factor of 2 to 134 • Reduces the memory consumption by a factor of 10 Gremlin-ATL
  33. 33. Conclusion
  34. 34. Conclusion • Novel modeling ecosystem • Scalable persistence approach • Multi-databases: adapted to the expected modeling scenario • Compliant with existing modeling solutions • On-demand loading • Prefetching and caching component • Improve large model access efficiency • Define precise plans adapted to a given model access pattern 34 Scalable Model Persistence
  35. 35. Conclusion • Novel modeling ecosystem • Translation-based approach for query and transformation computation • Standard language support • Generate database-specific operations • Bypasses modeling framework API • Positive performances on large models 35 Efficient Model Queries & Transformations
  36. 36. Perspectives & Future Work • NeoEMF • Combine multiple databases • Multi-resource database • Integrate collaborative modeling techniques • PrefetchML • Automatic plan generation • Static analysis: existing code • Dynamic analysis: execution logs • Advanced monitoring component • Rule creation / deletion • Rule modifications 36
  37. 37. Perspectives & Future Work • Application outside the MDE community • UML to GraphDB • Use NeoEMF internal mapping and Mogwaï to generate graph code from UML • Gremlin-ATL to express data migration operations • Promising results on the Neo4j panama paper database • Native database prefetching with PrefetchML • A model-based prefetcher for NoSQL databases 37
  38. 38. Publications • Journals • G. Daniel, G. Sunyé, A. Benelallam, M. Tisi, Y. Vernageau, A. Gomez & J. Cabot NeoEMF, a Multi-Database Model Persistence Framework for Very Large Models. SCP 2017 • G. Daniel, G. Sunyé, J. Cabot Advanced Prefetching and Caching of Models with PrefetchML. SoSym 2018 (accepted) • International Conferences • G. Daniel, G. Sunyé, J. Cabot Mogwaï, a Framework to Handle Complex Queries on Large Models. RCIS 2016 • G. Daniel, G. Sunyé, J. Cabot PrefetchML, a Framework for Prefetching and Caching Models. MoDELS 2016. Distinguished paper award • G. Daniel, G. Sunyé, J. Cabot UMLtoGraphDB: Mapping Conceptual Schemas to Graph Databases. ER 2016 • G. Daniel, F. Jouault, G. Sunyé, J. Cabot Gremlin-ATL: a Scalable Model Transformation Framework. ASE 2017 • 3 Workshops 38
  39. 39. Websites & Software Repositories NeoEMF: neoemf.com PrefetchML: github.com/atlanmod/Prefetching_Caching_DSL Mogwaï/Gremlin-ATL: github.com/atlanmod/mogwai UmltoGraphDB : github.com/atlanmod/uml2nosql 39
  40. 40. 40 Question? Thank you for your attention! ☺ P, F, M, A, P-Y, R, M, G, E, J, S

×