Slides of our OCL'16 paper on re-implementing Apache Thrift using EMF, Xtext and Epsilon. Full paper: http://oclworkshop.github.io/2016/papers/OCL16_paper_7.pdf
Re-Implementing Apache Thrift using Model-Driven Engineering Technologies: An Experience Report
1. Re-Implementing Apache Thrift using
Model-Driven Engineering Technologies:
An Experience Report
Sina Madani and Dimitris Kolovos
Department of Computer Science
University of York
{sm1748, dimitris.kolovos}@york.ac.uk
S. Madani et. al. – OCL Workshop 2016 September 14, 2016 – Slide 1/9
2. Motivation
Several experience reports on applications of MDE in industry
Mostly based on interviews
Limited availability of “hard data”
e.g. measurements extracted by implementing the same
software with/without MDE technologies
Such data can provide useful insights and/or act as a good
marketing tool for MDE
S. Madani et. al. – OCL Workshop 2016 September 14, 2016 – Slide 2/9
3. In this work . . .
We re-implemented a subset of Apache Thrift using EMF,
Xtext and Epsilon
Thrift is a web-services framework originally developed at
Facebook (http://thrift.apache.org)
Re-implementaton available in
https://github.com/SMadani/ThriftMDE/
We performed quantitative and qualitative comparison of the
two implementations
S. Madani et. al. – OCL Workshop 2016 September 14, 2016 – Slide 3/9
4. Apache Thrift
Interface description language for modelling services,
input/output data types/structures (similar to CORBA’s IDL)
Several transport protocols (e.g. JSON, binary, compressed)
“Compiler” that produces client/server stubs from interface
descriptions in Java, Ruby, JavaScript etc.
Compiler implemented using flex, Bison and C++
Code generation through string concatenation (!!)
S. Madani et. al. – OCL Workshop 2016 September 14, 2016 – Slide 4/9
5. MDE re-implementation of Thrift’s Compiler
Hand-crafted abstract syntax → EMF
flex, Bison → Xtext
Model validation in C++ → OCL-like constraints
Implemented using the Epsilon Validation Language (EVL)
Concatenation-based code generators → Template-based code
generators
Implemented using the Epsilon Generation Language (EGL)
Only re-implemented code generators for Ruby and Java
To enable automated equivalence testing we produce identical
code (including bugs)
S. Madani et. al. – OCL Workshop 2016 September 14, 2016 – Slide 5/9
6. Quantitative Evaluation: Lines of Code
Implementation Original MDE Difference
Language definition
(parsing & validation)
3,419 447 -87%
Language-neutral
code
712 1,036 +68%
Java generator 5,129 2,224 -57%
Ruby generator 1,231 422 -66%
Total 10,491 4,149 -60%
Remarks
We only counted hand-written code for both implementations
(i.e. excluded code generated by flex, Bison, EMF and Xtext)
S. Madani et. al. – OCL Workshop 2016 September 14, 2016 – Slide 6/9
7. Quantitative Evaluation: KBs (characters)
Implementation Original MDE Difference
Language definition
(parsing & validation)
105 14 -87%
Language-neutral
code
22 26 +15%
Java generator 187 73 -61%
Ruby generator 40 14 -65%
Total 395 128 -68%
S. Madani et. al. – OCL Workshop 2016 September 14, 2016 – Slide 7/9
8. Qualitative Evaluation
The MDE implementation is more modular
Code generation is split over a number of templates for each
language vs. all in one class in the current implementation
Validation logic is contained in a set of EVL constraints vs.
being spread across the codebase
Xtext is significantly more concise than flex and Bison
Using a template-based code generation language
Reduced the accidental complexity of string concatenation
Made little difference in terms of the complexity of the output
computation logic
S. Madani et. al. – OCL Workshop 2016 September 14, 2016 – Slide 8/9
9. Conclusions and Further Work
The MDE re-implementation is significantly more concise
(60-70%)
To argue about maintainability we need to conduct
experiments involving developers
Xtext is a significant improvement over flex and Bison
A template-based code generation language is more concise
but does not reduce the complexity of the code generator
Irreducable essential complexity?
Our conclusions are based on one data point
Need to repeat this exercise with additional software to build
confidence (e.g. Google Protocol Buffers?)
S. Madani et. al. – OCL Workshop 2016 September 14, 2016 – Slide 9/9