Unraveling Multimodality with Large Language Models.pdf
Concern-based Cohesion: Unveiling a Hidden Dimension of Cohesion Measurement
1. Concern-based Cohesion: Unveiling
a Hidden Dimension of Cohesion
Measurement
Bruno C. da Silva Cláudio Sant’Anna Christina Chavez Alessandro Garcia
brunocs@dcc.ufba.br santanna@dcc.ufba.br flach@dcc.ufba.br afgarcia@inf.puc-
afgarcia@inf.puc-rio.br
Federal University
of Bahia
(UFBA)
Software Design and Evolution Group
Software Engineering Lab – UFBA aside.dcc.ufba.br
Salvador-Bahia-Brazil - les.dcc.ufba.br
2. Cohesion can be defined as:
The degree to which a module
represents an abstraction of a
single concern of the software
Software Engineering Lab – UFBA
Salvador-Bahia-Brazil - les.dcc.ufba.br 2
3. Structural Cohesion Metrics
E.g. LCOM, LCOM2, etc.
Almost all methods share
the same instance
variable
Is it a high cohesive
class?
Software Engineering Lab – UFBA
Salvador-Bahia-Brazil - les.dcc.ufba.br … 3
4. Lack of Concern-Based Cohesion (LCbC)
How many
concerns does this
class address?
http
response
LCbC = 6 buffer
http
response
header
Is it a high
cohesive class? URL enconding
web cookies
http redirecting
Software Engineering Lab – sending
Error UFBA
and others…
Salvador-Bahia-Brazil - les.dcc.ufba.br … 4
5. Cohesion: Structure-based vs. Concern-based
They capture different dimensions of cohesion
• Different source of information and counting
mechanism;
• Different interpretation of cohesion;
low lack of cohesion
LCOM2 = 0 or
high cohesion
Example – ResponseFacade (Tomcat)
high lack of cohesion
LCbC = 6 or
low cohesion
Software Engineering Lab – UFBA
Salvador-Bahia-Brazil - les.dcc.ufba.br 5
6. Empirical Study – First Goal
Provide empirical evidence about
whether the concern-driven nature of a
cohesion metric makes it significantly
different from structural cohesion
metrics.
Software Engineering Lab – UFBA
Salvador-Bahia-Brazil - les.dcc.ufba.br 6
7. Moreover…
http response
buffer
changes
http response
header
URL enconding
web cookies
http redirecting
Error sending …
… the number of concerns a module realizes may influence
positively the number of changes it may be subject to.
Software Engineering Lab – UFBA
Salvador-Bahia-Brazil - les.dcc.ufba.br 7
8. Empirical Study – Second Goal
Investigate whether and how concern-
based cohesion is associated to
change-proneness.
Software Engineering Lab – UFBA
Salvador-Bahia-Brazil - les.dcc.ufba.br 8
9. Research Questions
RQ1: Does LCbC capture a dimension of
module cohesion that is not captured
by structural cohesion metrics?
Software Engineering Lab – UFBA
Salvador-Bahia-Brazil - les.dcc.ufba.br 9
10. Research Questions
RQ2: How strong is the correlation
between LCbC and module change-
proneness?
Software Engineering Lab – UFBA
Salvador-Bahia-Brazil - les.dcc.ufba.br 10
11. Research Questions
RQ3: Does the LCbC metric applied
together with structural cohesion
metrics enhance the prediction of
module changes?
Software Engineering Lab – UFBA
Salvador-Bahia-Brazil - les.dcc.ufba.br 11
14. Empirical Study Settings
LCbC needs a concern-to-code mapping
concern A concern B concern C
…
Software Engineering Lab – UFBA
… Salvador-Bahia-Brazil - les.dcc.ufba.br 14
15. Empirical Study Settings
Concern-to-code mapping procedure
System
JFreeChart
Concerns automatically Freecol
mapped using the XScan tool jEdit
Tomcat
Findbugs
Manual concern mapping
Rhino
provided by Eaddy et al (2008)
Software Engineering Lab – UFBA
Salvador-Bahia-Brazil - les.dcc.ufba.br 15
16. RQ1: Does LCbC capture a dimension of module
cohesion that is not captured by structural cohesion
metrics?
Principal Component Analysis (PCA)
JFreeChart Rhino jEdit Tomcat Findbugs Freecol
PC1 PC2 PC3 PC4 PC1 PC2 PC3 PC4 PC1 PC2 PC3 PC4 PC1 PC2 PC3 PC4 PC1 PC2 PC3 PC4 PC1 PC2 PC3 PC4
LCOM2 0.94 0.14 - 0.11 0.11 0.96 0.04 0.11 0.07 0.06 0.98 0.08 0.04 0.09 0.08 0.25 0.96 0.12 0.04 0.14 0.98 - 0.12 0.72 0.41 0.34
LCOM3 0.02 0.72 - 0.43 0.37 0.23 0.72 0.43 0.24 0.90 0.12 0.17 0.07 0.89 0.15 0.12 0.08 0.90 0.12 0.10 0.12 0.64 0.20 0.53 0.15
LCOM4 0.87 0.03 - 0.04 0.37 0.94 0.19 0.07 0.09 0.18 0.09 0.97 0.08 0.11 0.09 0.95 0.25 0.12 0.00 0.98 0.14 0.16 0.09 0.03 0.96
LCOM5 0.14 0.94 - 0.12 - 0.04 0.11 0.21 0.94 0.19 0.87 0.12 - 0.03 0.06 0.88 0.04 - 0.06 0.13 0.88 0.06 - 0.02 0.07 0.27 - 0.06 0.89 0.01
TCC - 0.12 - 0.24 0.95 - 0.09 - 0.08 - 0.95 - 0.08 - 0.02 - 0.79 0.12 - 0.21 0.06 - 0.85 - 0.01 - 0.16 0.03 - 0.80 0.06 - 0.14 - 0.04 - 0.89 - 0.05 - 0.16 - 0.11
LCbC 0.51 0.10 - 0.13 0.81 0.11 0.11 0.19 0.97 0.04 0.04 0.08 0.99 0.10 0.99 0.08 0.07 0.05 0.99 0.00 0.04 0.22 0.89 - 0.18 - 0.03
LCbC was the major metric of at least one PC in all
systems. And in most of the systems it contributed
exclusively for a PC
Software Engineering Lab – UFBA
Salvador-Bahia-Brazil - les.dcc.ufba.br 16
17. RQ2: How strong is the correlation between LCbC and
module change-proneness?
Spearman Correlation: each cohesion metric vs CC
JFreeChart Rhino jEdit Tomcat Findbugs Freecol
LCOM2 0.48 0.69 0.16 0.33 0.48 0.49
LCOM3 0.34 0.48 0.17 0.27 0.38 0.19
LCOM4 0.32 0.46 0.10 0.21 0.23 0.20
LCOM5 0.15 0.30 0.18 0.23 0.34 0.22
TCC 0.24 0.22 0.13 0.16 0.06* 0.30
LCbC 0.66 0.62 0.15 0.35 0.21 0.46
* no signicance level
In jEdit and Findbugs LCbC did not perform well
Software Engineering Lab – UFBA
Salvador-Bahia-Brazil - les.dcc.ufba.br 17
18. RQ2: How strong is the correlation between LCbC and
module change-proneness?
Spearman Correlation: each cohesion metric vs CC
JFreeChart Rhino jEdit Tomcat Findbugs Freecol
LCOM2 0.48 0.69 0.16 0.33 0.48 0.49
LCOM3 0.34 0.48 0.17 0.27 0.38 0.19
LCOM4 0.32 0.46 0.10 0.21 0.23 0.20
LCOM5 0.15 0.30 0.18 0.23 0.34 0.22
TCC 0.24 0.22 0.13 0.16 0.06* 0.30
LCbC 0.66 0.62 0.15 0.35 0.21 0.46
* no signicance level
LCbC and LCOM2 were the most correlated with
change count
Software Engineering Lab – UFBA
Salvador-Bahia-Brazil - les.dcc.ufba.br 18
19. RQ2: How strong is the correlation between LCbC and
module change-proneness?
Spearman Correlation: each cohesion metric vs CC
JFreeChart Rhino jEdit Tomcat Findbugs Freecol
LCOM2 0.48 0.69 0.16 0.33 0.48 0.49
LCOM3 0.34 0.48 0.17 0.27 0.38 0.19
LCOM4 0.32 0.46 0.10 0.21 0.23 0.20
LCOM5 0.15 0.30 0.18 0.23 0.34 0.22
TCC 0.24 0.22 0.13 0.16 0.06* 0.30
LCbC 0.66 0.62 0.15 0.35 0.21 0.46
* no signicance level
In Rhino and Freecol, LCbC was the second most correlated
(strong and moderate, respectively) preceded by LCOM2.
Software Engineering Lab – UFBA
Salvador-Bahia-Brazil - les.dcc.ufba.br 19
20. RQ2: How strong is the correlation between LCbC and
module change-proneness?
Spearman Correlation: each cohesion metric vs CC
JFreeChart Rhino jEdit Tomcat Findbugs Freecol
LCOM2 0.48 0.69 0.16 0.33 0.48 0.49
LCOM3 0.34 0.48 0.17 0.27 0.38 0.19
LCOM4 0.32 0.46 0.10 0.21 0.23 0.20
LCOM5 0.15 0.30 0.18 0.23 0.34 0.22
TCC 0.24 0.22 0.13 0.16 0.06* 0.30
LCbC 0.66 0.62 0.15 0.35 0.21 0.46
* no signicance level
LCbC was the most correlated with change count in
JFreeChart (strong correlation) and Tomcat (moderate
correlation).
Software Engineering Lab – UFBA
Salvador-Bahia-Brazil - les.dcc.ufba.br 20
21. RQ3: Does the LCbC metric applied together with
structural cohesion metrics enhance the prediction of
module changes?
Linear Regression Analysis
Metrics in the Final Model with Standardized Coefficients R2 (adj)
JFreeChart (0.47)LCOM2 + (0.11)LCOM3 + (0.59)LCbC + (-0.27)LCOM4 0.63
Rhino (0.63)LCOM2 + (0.37)LCOM3 + (0.18*)TCC 0.59
Findbugs (0.45)LCOM2 + (0.20)LCOM3 + (0.17)LCOM4 0.37
Freecol (0.44)LCOM2 + (0.21)LCOM3 + (0.11)LCbC 0.35
Tomcat (0.39)LCOM2 + (0.16)LCOM3 + (0.29)LCbC + (-0.07*)LCOM4 0.32
jEdit (0.20)LCOM2 + (0.35)LCOM4 + (0.09*)LCOM5 + (0.17)LCbC 0.26
* no signicance level
LCbC ended up in four regression models
Software Engineering Lab – UFBA
Salvador-Bahia-Brazil - les.dcc.ufba.br 21
22. RQ3: Does the LCbC metric applied together with
structural cohesion metrics enhance the prediction of
module changes?
Linear Regression Analysis
Metrics in the Final Model with Standardized Coefficients R2 (adj)
JFreeChart (0.47)LCOM2 + (0.11)LCOM3 + (0.59)LCbC + (-0.27)LCOM4 0.63
Rhino (0.63)LCOM2 + (0.37)LCOM3 + (0.18*)TCC 0.59
Findbugs (0.45)LCOM2 + (0.20)LCOM3 + (0.17)LCOM4 0.37
Freecol (0.44)LCOM2 + (0.21)LCOM3 + (0.11)LCbC 0.35
Tomcat (0.39)LCOM2 + (0.16)LCOM3 + (0.29)LCbC + (-0.07*)LCOM4 0.32
jEdit (0.20)LCOM2 + (0.35)LCOM4 + (0.09*)LCOM5 + (0.17)LCbC 0.26
* no signicance level
LCbC was the most important metric for the JFreeChart
regression model
Software Engineering Lab – UFBA
Salvador-Bahia-Brazil - les.dcc.ufba.br 22
23. Examples that illustrate the differences on
the dimensions of cohesion captured by
LCbC and structural cohesion metrics
Class (System) LCbC (Rank) LCOM2 (Rank) CC (Rank)
ResponseFacade (Tomcat) 10 (top 2%) 0 5 (top 20%)
CombinedRangeXYPlot (JFreeChart) 11 (top 5%) 33 (top 35%) 11 (top 10%)
Software Engineering Lab – UFBA
Salvador-Bahia-Brazil - les.dcc.ufba.br 23
24. Examples that illustrate the differences on
the dimensions of cohesion captured by
LCbC and structural cohesion metrics
Class (System) LCbC (Rank) LCOM2 (Rank) CC (Rank)
ResponseFacade (Tomcat) 10 (top 2%) 0 5 (top 20%)
CombinedRangeXYPlot (JFreeChart) 11 (top 5%) 33 (top 35%) 11 (top 10%)
Facade class usually has methods related to different
concerns because it serves as entrance point for
different functionalities.
Software Engineering Lab – UFBA
Salvador-Bahia-Brazil - les.dcc.ufba.br 24
25. Examples that illustrate the differences on
the dimensions of cohesion captured by
LCbC and structural cohesion metrics
Class (System) LCbC (Rank) LCOM2 (Rank) CC (Rank)
ResponseFacade (Tomcat) 10 (top 2%) 0 5 (top 20%)
CombinedRangeXYPlot (JFreeChart) 11 (top 5%) 33 (top 35%) 11 (top 10%)
Concerns related to: drawing, zooming, axis space, click
handling and plotting.
Software Engineering Lab – UFBA
Salvador-Bahia-Brazil - les.dcc.ufba.br 25
26. When concern-based cohesion fails in the
association with changes
When the concern-to-code mapping fails to
identify concerns!
Class (System) LCbC (Rank) LCOM2 (Rank) CC (Rank)
jEdit (jEdit) 0 9351 (3rd) 24 (2nd)
JEditBuffer (jEdit) 0 5913 (4th) 17 (5th)
SortedBugCollection (Findbugs) 0 1889 (5th) 76 (4th)
Software Engineering Lab – UFBA
Salvador-Bahia-Brazil - les.dcc.ufba.br 26
27. Threats to Validity
Quality of concern-to-code mapping
Underlying tool for concern mapping
Change Count
Software Engineering Lab – UFBA
Salvador-Bahia-Brazil - les.dcc.ufba.br 27
28. Conclusions
LCbC defined itself a new and orthogonal dimension of
module cohesion in the studied systems.
LCbC performed well in the association with change-
proneness in most of the systems.
Concern-based cohesion has provided indications that
it is worth to be further investigated.
Software Engineering Lab – UFBA
Salvador-Bahia-Brazil - les.dcc.ufba.br 28
29. Future Work
How LCbC performs in comparison with topic-based
cohesion metrics such as C3 and MWE
The association between LCbC and fault-proneness
Whether or not the type of class would be an
interesting factor to be considered
The application of different regression analysis
techniques
Search for more complete concern mappings
Software Engineering Lab – UFBA
Salvador-Bahia-Brazil - les.dcc.ufba.br 29
30. Concern-based Cohesion: Unveiling
a Hidden Dimension of Cohesion
Measurement
Bruno C. da Silva Cláudio Sant’Anna Christina Chavez Alessandro Garcia
brunocs@dcc.ufba.br santanna@dcc.ufba.br flach@dcc.ufba.br afgarcia@inf.puc-
afgarcia@inf.puc-rio.br
Federal University
of Bahia
(UFBA)
Software Design and Evolution Group
Software Engineering Lab – UFBA aside.dcc.ufba.br
Salvador-Bahia-Brazil - les.dcc.ufba.br