The document provides an overview of Stéphane Ducasse's expertise in software evolution and reengineering. It discusses challenges in maintaining large software projects over time. It introduces Moose, an open-source reengineering platform developed by Ducasse to help with tasks like program understanding, metrics analysis, visualization, and detecting duplicated code. The document provides examples of how Moose can be used to analyze software structure, identify patterns of change, understand class hierarchies and how they evolve, and characterize how properties spread across packages over multiple versions of a system.
Gen AI in Business - Global Trends Report 2024.pdf
Ducasse's Maintenance Expertise
1. LSE
A Portfolio of
Software Evolution
Expertise
Stéphane Ducasse
stephane.ducasse@inria.fr
http://stephane.ducasse.free.fr/
Stéphane Ducasse 1
2. A word of presentation
Co-author of Object-Oriented Reengineering Patterns
Co-developer of Moose (reengineering platform)
10 PhD Theses in reengineering
50+ articles
Grounded in reality
Was maintainer of Squeak 3.9
Worked with:
Harman-Becker AG
Bedag AG
Nokia, Daimler
LSE
S.Ducasse 2
3. Roadmap
• Some facts
• Our approach
• Supporting maintenance
• Moose an open-platform
• Some visual examples
• Conclusion
LSE
S.Ducasse 3
4. Software is complex.
29% Succeeded
18% Failed
53% Challenged
The Standish Group, 2004
LSE
S.Ducasse 4
9. How large is your project?
1’000’000 lines of code
LSE
S.Ducasse 5
10. How large is your project?
1’000’000 lines of code
* 2 = 2’000’000 seconds
LSE
S.Ducasse 5
11. How large is your project?
1’000’000 lines of code
* 2 = 2’000’000 seconds
/ 3600 = 560 hours
LSE
S.Ducasse 5
12. How large is your project?
1’000’000 lines of code
* 2 = 2’000’000 seconds
/ 3600 = 560 hours
/ 8 = 70 days
LSE
S.Ducasse 5
13. How large is your project?
1’000’000 lines of code
* 2 = 2’000’000 seconds
/ 3600 = 560 hours
/ 8 = 70 days
/ 20 = 3 months
LSE
S.Ducasse 5
14. Maintenance is Continuous Development
4.1% Other
18.2% Adaptive
(new platforms or OS)
Relative Maintenance Effort
Between 50% and 75% of global
effort is spent on 17.4% Corrective
“maintenance” ! (fixing reported errors)
60.3% Perfective
(new functionality)
The bulk of the maintenance cost is due to new functionality
even with better requirements, it is hard to predict new functions
LSE
S.Ducasse 6
15. Lehman’s Software Evolution Laws
Continuous Change: “A program that is used in a
real-world environment must change, or become
progressively less useful in that environment.”
Software Entropy: “As a program evolves, it becomes
more complex, and extra resources are needed to
preserve and simplify its structure.”
LSE
S.Ducasse 7
16. Roadmap
• Some facts
• Our approach
• Supporting maintenance
• Moose an open-platform
• Some visual examples
• Conclusion
LSE
S.Ducasse 8
17. Supporting the evolution of applications
A research goal and agenda grounded in reality
How to help companies maintaining their large
software?
What is the xray for software?
code, people, practices
Which analyses?
How can you monitor your system (dashboards....)
How to present extracted information?
S.Ducasse 9
19. Software Metrics
[LMO99, OOPSLA00]
Duplicated Code Identification
Understanding Large Systems [ICSM99, ICSM02]
Group Identification
[WCRE99, TSI00, TSE03]
Static/Dynamic Information [ASE03]
Test Generation
[ICSM99]
Feature Analysis [CSMR 06]
Concept Identification
[JSME 06]
Analyses [WCRE 06]
Class Understanding
[OOPSLA01,TSE04]
Package Blueprints Reverse
[ICSM 07]
Engineering
Distribution Maps
[ICSM 06]
Representation Transformations
Language Independent
Refactorings
[IWPSE 00]
Evolution
Language Independent Meta
Model (FAMIX) Reengineering Patterns
[UML99] Version Analyses
An Extensible Reengineering [ICSM 05]
Environment (Moose) HISMO metamodel
[Models 06] [JSME 05]
LSE
S.Ducasse 11
20. One Example: who is responsible of what?
(4) Visualisation
(3) Analyses
2) Modèle
(1) Extraction
Distribution Map of authors
on JBoss
S.Ducasse 12
21. Moose is a reengineering tool which integrates
multiple techniques
Number of classes = 382
Number of methods = 4268
Metrics
…
Visualization
Moose
Queries and Navigation
word1 word2
… Semantic Analysis
Evolution Analysis
LSE
S.Ducasse 13
22. Moose is open and open-source
meta-described
meta-model aware
Method Class
Inheritance
LSE
S.Ducasse 14
23. Designed to be extensible
Class
History
Duplication Class
Author
Version
Method Class File
Event Inheritance
Trace
LSE
S.Ducasse 15
24. Roadmap
• Some facts
• Our approach
• Supporting maintenance
• Moose an open-platform
• Some visual examples
• Conclusion
LSE
S.Ducasse 16
25. Understanding large systems
Understanding code is difficult!
Systems are large
Code is abstract
Should I really convinced you?
Some existing approaches
Metrics: problems you often get meaningless results once
combined
Visualization: often beautiful but without meaning
LSE
S.Ducasse 17
27. Polymetric views condense information
To get a feel of the inheritance
semantics: adding vs. reusing
Classes+Inheritance
W: # of Added Methods
H: # of Overridden Method
C: # of Method Extended
methods
LOC
# statements
# parameters
LSE
S.Ducasse 19
36. How can we predict changes?
Common wisdom stresses that what changes yesterday
will change today, but it is true?
In the Sahara the weather is constant,
tomorrow: 90% chance that it is the same as today
In Belgium, the weather is changing really fast (sea
influence), 30% chance that it is the same as today
LSE
S.Ducasse 28
37. With history analysis we can get the
climate of a software system
Past Late Future Early
Changers Changers
1, TopLENOM1..i (S, t1) ∩
TopEENOMi..n (S, t2) ≠ ∅
YWi(S) =
0, TopLENOM1..i (S, t1) ∩
TopEENOMi..n (S, t2) = ∅
∑ YWi(S, t1, t2)
YW(S, t1, t2) =
Past Present Future n-2
hit
versions version versions
LSE
S.Ducasse 29
38. How developers develop?
• More efficient to put people working together in the
same office?
• How can we optimize software development?
LSE
S.Ducasse 30
40. Line colors show which author owned
which files in which period
Green author Green author
large commit ownership
File A
File B
Blue author
small commit
LSE
S.Ducasse 32
43. Based on similar commit signature
Edit Takeover
Monologue Familiarization Dialogue
LSE
S.Ducasse 35
44. Understanding evolution of large systems
• How old are the hierarchies?
• How did the classes change?
• How did the inheritance change?
LSE
S.Ducasse 36
45. Evolution holds useful information
A A A A A
BC BC BC B
D D D
time
A is persistent C was removed
B is stable E is newborn
D inherited from C and then from A …
LSE
S.Ducasse 37
46. Hierarchy Evolution Complexity View
characterizes class hierarchy histories
ENOM
A Age
ENOS Class
History
Removed
C B
Age Inheritance
History
E
D Removed
A is persistent C was removed
B is stable E is newborn
D inherited from C and then from A …
LSE
S.Ducasse 38
47. Class hierarchies over 40 versions of
Jun - a 740 classes, 3D framework
LSE
S.Ducasse 39
48. Identifying Duplicated Code
“Parsing the program suite of interest requires a parser for the
language dialect of interest. While this is nominally an easy task, in
practice one must acquire a tested grammar for the dialect of the
language at hand. Often for legacy codes, the dialect is unique and the
developing organization will need to build their own parser. Worse,
legacy systems often have a number of languages and a parser is
needed for each. Standard tools such as Lex and Yacc are rather a
disappointment for this purpose, as they deal poorly with lexical
hiccups and language ambiguities.” [Baxter 98]
Problems
Unknown Duplicated Code
Scalability
Understanding
LSE
S.Ducasse 40
49. Language Independent a b c defa b cdef
Language independent, Textual,
[ICSM’99], M. Rieger’s PhD. Thesis
Duploc handled
Exact Copies
Pascal, Java, Smalltalk, Python, a b c d e fa b x y e f
Cobol, C++, PDP-11, C
Slower than other approaches but...
Max 45 min to adapt our approach to
a new language
Between 3% and 10% Copies with
less identification than parametrized match
LSE
S.Ducasse 41
50. A Conceptual Matrix
File A File B
a b c defa b cdef
File A
Exact Copies
a b c d e fa b x y e f
File B
Copies with
Variations
42
LSE
S.Ducasse
51. Entities that change together can reveal hidden
dependencies
(A,B,C,D,E)
()
A 2 3 3 3 4 6
(A,B,C,D)
(A,D,E)
(v6)
(v2)
B 6 6 6 5 6 7
(A,B,C)
(D,E) (A,D)
C 3 3 5 5 8 9
(v5,v6)
(v2,v4) (v2,v6)
D 1 3 3 4 4 6
(D) (C)
(A)
(v2,v4,v6) (v3,v5,v6)
(v2,v5,v6)
E 4 5 5 6 6 6
v1 v2 v3 v4 v5 v6
()
(v1,v2,v3,v4,v5,v6)
LSE
S.Ducasse 43
52. How properties spread in large systems?
Properties:
Metrics
People
Symbol/Concepts
Spread = how many packages does it touch?
Focus = do packages and properties match?
Distribution Map:
a generic visualization
LSE
S.Ducasse 44
59. Principle
P2 P3 P4
A2 B2 A3 B3 A4
D1 E1 F1 G1 C1 A1 B1 H1 I1
P1
D1 E1 F1 G1 C1 A1 B1 H1 I1
col col col col col col col col col
col col col col col col col col col
A1 D1 G1
Internal
Internal
E1 F1 referenced classes
referenced classes
references
B1 C1 H1 I1 A1 C1 B1
internal
references
head
A1 C1 B1
internal
head
G1 H1 I1
Package under analysis
G1 H1 I1
P1
B3 D1 E1 F1 G1
B3 D1 E1 F1 G1
A3 D1 E1 C1
body
A3 D1 E1 C1
references
body
external
references
A2 A1
external A2 A1
B2 D1
B2 D1
A4 E1 F1 G1
A4 E1 F1 G1
most—least
External most—least
internal referencing classes
External
referenced classes internal referencing classes
referenced classes
LSE
S.Ducasse 51
61. Symbols contain domain information
• What are the concepts used in an application?
• How can we use symbolic information?
LSE
S.Ducasse 53
62. Looking at the Symbols
• Developers use meaningful names, which capture
the domain knowledge.
LSE
S.Ducasse 54
63. A cluster is a group of documents
which use the same terms
LSE
S.Ducasse 55
64. Moose has been validated on real life systems
Several large, industrial case studies (NDA)
Harman-Becker
Nokia
Daimler
Siemens
Different implementation languages (C++, Java, Smalltalk,
Cobol)
We use external C++ parsers
Different sizes
Moose is used in several research groups
LSE
S.Ducasse 56
65. Possible New Research Directions
• Remodularization
• Clustering analysis
• Open and Modular modules
• Service Identification in Service Oriented Architecture
• Architecture Extraction/Validation
• Software Quality
• Cost/Bugs prediction
• EJB evaluation
• Business rules extraction
• Model transformation
• Test
LSE
S.Ducasse 57
66. Evolution/Maintenance is a challenge
Understanding and maintaining large and complex
applications needs better tools/analyses
Moose is a platform for developing new analyses
Transfer to tool vendors
LSE
S.Ducasse 58