Right Money Management App For Your Financial Goals
OCL'16 slides: Models from Code or Code as a Model?
1. Intro Proposal Evaluation Conclusion
Models from Code or Code as a Model?
Antonio García-Domínguez, Dimitris Kolovos
Aston University, University of York
OCL’16
October 2nd, 2016
A. García, D. S. Kolovos Models from Code or Code as a Model? 1 / 13
2. Intro Proposal Evaluation Conclusion
Using codebases to drive software engineering tasks
Usual sequence of events
We had a business need
We invested in developing a program that covered it
Now we may need to:
Extract architecture or underlying business process?
Find bugs before they hit us?
Improve its design?
Migrate it to a new technology?
Code is the most accurate description: let’s use it
How do we extract knowledge?
Regexps do not scale to complex tasks
We need something that understands the language
The “extractor” is embedded within a process
A. García, D. S. Kolovos Models from Code or Code as a Model? 2 / 13
3. Intro Proposal Evaluation Conclusion
Existing approaches
Some well-known reverse engineering tools (“extractors”)
Eclipse MoDisco: EMF-based, implements KDM/ASTM
JaMoPP: EMF-based, uses custom metamodel
Moose: FAMIX-based tool
Rascal: extracts partial JDT representation into a model
Issue: extractors are one-off processes
They produce standalone models: the code is no longer needed
However, current tools are not incremental: if the code is
changed, the extraction has to be redone from scratch
Issue: extractors are not query-aware
Some tasks will only access a small part of the codebase
Extracting the rest only adds overhead
A. García, D. S. Kolovos Models from Code or Code as a Model? 3 / 13
4. Intro Proposal Evaluation Conclusion
Epsilon EMC JDT driver: use IDE indices as models
IDEs already extract representations all the time
Eclipse indexes Java projects on the background
Keeps fast pointers to classes / methods
Extra info available on demand through parsing
JDT indices are under active improvement:
See Stefan Xenos’ talk on EclipseCon NA’16
Cross-pollination from CDT project (C++)
Faster, more thorough indexing coming in future releases
Our proposal: Epsilon EMC JDT driver
Expose code as seen by IDE (JDT) as a model
On-demand loading + direct access to Java classes
Sources available on GitHub (epsilonlabs/emc-jdt)
A. García, D. S. Kolovos Models from Code or Code as a Model? 4 / 13
5. Intro Proposal Evaluation Conclusion
Making it possible: Epsilon architecture
Epsilon Object Language (EOL) = JavaScript + OCL
Epsilon Model Connectivity (EMC)
Core
Model Validation (EVL) Code Generation (EGL)
Model-to-model Transformation (ETL) ...
Task-specific
languages
Technology-specific
drivers
Eclipse Modeling Framework (EMF) Schema-less XML
Eclipse Java Developer Tools (JDT) CSV ...
extends
implements
All Epsilon languages are based on EOL
EOL accesses models through EMC interfaces
By implementing the EMC interfaces, all Epsilon languages
can use JDT indices as models
A. García, D. S. Kolovos Models from Code or Code as a Model? 5 / 13
6. Intro Proposal Evaluation Conclusion
Configuration dialog for an EMC JDT model
A. García, D. S. Kolovos Models from Code or Code as a Model? 6 / 13
7. Intro Proposal Evaluation Conclusion
allInstances() in EMC JDT
Reflection-based X.allInstances
1 Parse Java sources in the projects on the fly
2 Traverse JDT Document Object Model with ASTVisitor
3 Use Java reflection to fetch instances of X
4 Cache for later executions, if desired
Special case: TypeDeclaration.allInstances
Searchable, lazy (no parsing unless looping over it)
Supports two new operations:
c.select(it|it.name = expr) searches with JDT the relevant
compilation unit, parses it and returns the right DOM node
c.search(it|it.name = expr) works the same, but it returns the
raw index entry (a simpler JDT SourceType)
A. García, D. S. Kolovos Models from Code or Code as a Model? 7 / 13
8. Intro Proposal Evaluation Conclusion
Case study: validate code against UML models
Overview
We are maintaining a library, and we need to check its
compliance with a UML model as it changes
Here, “compliance” means “must have all the classes and
methods in the UML model” (code may have more)
Which is faster/more convenient:
extracting a model with MoDisco first, or
checking it directly with the EMC JDT driver?
A. García, D. S. Kolovos Models from Code or Code as a Model? 8 / 13
9. Intro Proposal Evaluation Conclusion
Experiment setup
Inputs
Source code: JFreeChart 1.0.17, 1.0.18 and 1.0.19
UML model for 1.0.17 extracted by Modelio
Tools
MoDisco 0.13.2 discoverer extracted 1 .xmi per version
Epsilon interim (3d4408), emc-jdt interim (5b5ea)
Validation task: implemented in the Epsilon Validation Language
1 version for MoDisco, 2 for EMC JDT (select/search)
Rules:
1 Each UML class has its corresponding Java class
2 Each UML method is implemented in that Java class
3 Each nested class obeys the two above rules
4 validation errors found in 1.0.18 and 1.0.19
A. García, D. S. Kolovos Models from Code or Code as a Model? 9 / 13
10. Intro Proposal Evaluation Conclusion
Performance results
MoDisco validates all versions in 112.2s, JDT/select in 71.92s, JDT/search in 36.52s
1.0.17
1.0.18
1.0.19
1.0.17
1.0.18
1.0.19
1.0.17
1.0.18
1.0.19
0
10
20
30
40
JDT w/select JDT w/searchMoDisco
Executiontime(s)
Extract Load Validation
A. García, D. S. Kolovos Models from Code or Code as a Model? 10 / 13
11. Intro Proposal Evaluation Conclusion
Performance discussion
MoDisco: slower loading for faster validation
When using .xmi files, we load entire model into memory
Pro: with everything in memory, validation is faster
Con: huge models won’t fit in memory
We’ll need a store with on-demand loading (e.g. CDO)
On-demand loading can change performance profile
Amortisation of extraction costs depends on codebase
Frozen codebase (e.g. legacy systems):
Full extraction is quickly amortised
MoDisco is a better choice
Quickly changing codebase (e.g. actively developed systems):
Extracting on demand is usually better (models don’t live long)
EMC JDT is a better choice
A. García, D. S. Kolovos Models from Code or Code as a Model? 11 / 13
12. Intro Proposal Evaluation Conclusion
Conclusion and future lines of work
Summary
Codebases are a valuable input for many SE tasks
Two options to query codebases:
Extract standalone models (MoDisco)
Use code directly as a model (EMC JDT)
EMC JDT is faster for changing codebases
Future work
Further optimisations to improve performance
Evaluate impact of future JDT versions
More filtering fields for searchable collections
More shorthand properties for common scenarios
Port approach to other languages (e.g. C++ through CDT)
A. García, D. S. Kolovos Models from Code or Code as a Model? 12 / 13
13. End of the presentation
Questions?
@antoniogado
A. García, D. S. Kolovos Models from Code or Code as a Model? 13 / 13
14. Extra features in EMC JDT
Shorthand properties
Quick access to commonly needed information
EMC PropertyGetter computes value on demand
FieldDeclaration: “name”
BodyDeclaration: “public”, “static”...
A. García, D. S. Kolovos Models from Code or Code as a Model? 14 / 13