Apidays New York 2024 - The value of a flexible API Management solution for O...
Dipenta msr2011-csbf
1. Social Interactions around
Cross-System Bug Fixings:
The Case of
FreeBSD and OpenBSD
Gerardo Canfora, Luigi Cerulo,
Marta Cimitile, Massimiliano Di Penta
dipenta@unisannio.it
2. Context
Source code is often reused across different systems
Unixes (FreeBSD, OpenBSD, Linux)
Office applications (NeoOffice, OpenOffice)
Desktop environment apps (KDE or GNOME apps)
Maintenance might require to propagate bug fixings
We call this “Cross System Bug Fixing” (CSBF)
Example:
FreeBSD, 1996/01/19, file ip_icmp.h:
– “Added definitions for ICMP router discovery. Reviewed by:
wollman
OpenBSD, 1996/08/02, file ip_icmp.h:
– “ICMP Router Discovery definitions; from FreeBSD”
3. What we propose
A method to track CSBFs
A study on the social characteristics
and development activity made by
CSBF committers
degree, betweenness, brokerage
commits, lines changed
4. Detecting CSBF - I
Step 1: mining cross-referencing commits
openbsd, atphy.c,2008/09/25 20:47:16,brad,
Add a driver for the Attansic F1 PHY. From FreeBSD via
kevlo@
Step 2: mine commits previously performed on files
with same name in the other system
freebsd,atphy.c,2008/05/19 01:12:10,yongari,
Add Attansic/Atheros F1 PHY driver.
openbsd, atphy.c,2008/09/25 20:47:16,brad,
Add a driver for the Attansic F1 PHY. From FreeBSD via
kevlo@
5. Detecting CSBF - II
Step 3: compute file similarity with clone detection
CCFinder
Threshold: at least 10% of cloned lines
Step 4: take the previous change with the highest
textual similarity in the commit note
Use of Vector Space models
Cosine similarity; threshold (0.20) to filter out unrelated
commits
Add Attansic/Atheros F1 PHY driver.
= 0.72
Add a driver for the Attansic F1 PHY. From FreeBSD via kevlo@
6. Building Committers' Network
We extract communication from mailing
lists
Bug fixing mailing lists
Heuristic similar to the one of Bird et al.
[2006] to map inconsistent namings /
emails
Also, to map committer Ids to mailing list
names/emails
Nodes of the network labeled as:
Committer / other mailing list contributors
CSBFs committer
7. Empirical Study
Goal: analyze the phenomenon of CSBFs
Purpose: understanding its relevance with
respect to the social characteristics of the
involved developers
Context: CVS repositories and mailing lists
archives of FreeBSD and OpenBSD
Period: 1993-2009 (FreeBSD), 1998-2009
(OpenBSD)
Commits: 119,000 (FreeBSD), 70,000 (OpenBSD)
8. Research Questions
RQ1: How do the source code committers
and contributors of the two systems
overlap?
RQ2: How frequent is the phenomenon of
CSBFs?
RQ3: Who are the contributors involved in
CSBFs?
RQ4: Are mailing list contributors involved
in CSBFs more active than others?
9. RQ1 – Team overlap
FreeBSD OpenBSD Both
Committers 383 211 26
Mailing list contribs 8035 3843 359
Committers and 213 122 17
mailing list contributors
The two projects have less than 10% of
common contributors →
the development team of Free and
Open BSD is really different
10. RQ2 – Commit filtering
1000 933
900
800
700
600
500 439
400
296
300
200 133 120
100
59
0
FreeBSD OpenBSD
Referring commits Cloned files Linked commits
At the end of the filtering not that many but...
11. RQ2 – Cloned lines in CSBF files
C source files header files
Percentage smaller for .h files
Use of preprocessor conditional to make header files system-
dependent
#if defined(__FreeBSD__)
13. RQ3: social characteristics
Importance in terms of
(in/out) degree: number of (incoming/outcoming)
communication links
Betweenness: number of communications for which the
node is in the short path
Brokerage metrics: useful to analyze the
communication between two clusters
B is a coordinator
B is a gatekeeper
B is a representative
14. RQ3 – social characteristics
Representative
Gatekeeper
12
Coordinator /10
10
Betweenness / 1000
8
Out-degree
Column 1
6
In-degree Column 2
Column 3
4
Degree
2 0 5 10 15 20 25 30 35 40 45 50
0
Row 1 CSBF
Row 2 Others
Row 3 Row 4
All differences statistically significant
High effect size (Cohen d>1)
Contributors involved in CSBF have a higher importance in
the communication and in the flow of communication
between systems
16. RQ4 – change activity of CSBF
committers and others
LOC added/removed Commits
40000 1500
1000
20000
500
0 0
FreeBSD OpenBSD FreeBSD OpenBSD
CSBF Others CSBF Others
All differences statistically significant
High effect size (Cohen d∼1)
Contributors involved in CSBF are more active
than others
17. Conclusions and Work-in-Progress
We proposed method to mine CSBF
We reported a study on FreeBSD and OpenBSD where:
Development team is almost disjoint
There is a small, though not negligible portion of CSBF
Committers involved in CSBF have
– Higher social importance
– Higher brokerage level
– Higher activity in source code commits
Work-in-progress:
Better approaches to identify implicit CSBF, tracking and
linking changes occurring on both systems
More extensive study on less obvious cases