Presentation of research goals and ongoing research in the joint ARC project "ECOS: Ecological Studies of Open Source Software Ecosystems", presented by Tom Mens (UMONS) during the projects track of the CSMR-WCRE 2014 Software Evolution Week. Collaborators: Philippe Grosjean and Maelick Claes.
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
ECOS: Ecological Studies of Open Source Software Ecosystems (@ CSMR-WCRE 2014 Projects Track)
1. ECOS:
Ecological
Studies
of
Open
Source
So6ware
Ecosystems
•
•
Tom
Mens,
Maelick
Claes
So6ware
Engineering
Lab
!
•
Philippe
Grosjean
Numerical
Ecology
Lab
informaEque.umons.ac.be/genlog/projects/ecos
2. About
ECOS
informaEque.umons.ac.be/genlog/projects/ecos
• “AcEon
de
Recherche
Concertée”
of
University
of
Mons
– Interdisciplinary
project
• Combines
research
in
biology
(ecology)
and
compuEng
science
(empirical
so6ware
engineering)
– COMPLEXYS
Research
InsEtute
– Oct
2012
—>
Sep
2017
– 500K
EUR
funding
• Related
EU
project:
5
February
2014
—
CSMR-‐WCRE
So6ware
EvoluEon
Week,Antwerp,
Belgium
2
3. High-‐level
project
goal
• Improve
understanding
of,
and
support
for,
open
source
so#ware
ecosystems
–Draw
inspiraEon
from
biological
evoluEon,
ecology
and
natural
ecosystems
• Determine
main
factors
of
success
and
failure
of
OSS
projects
within
their
ecosystem
–Provide
beeer
techniques
and
mechanisms
to
predict
and
improve
survivability
of
OSS
projects
and
resilience
of
their
ecosystems
–Provide
guidelines
and
evoluEon
dashboards
to
support
so6ware
communiEes
5
February
2014
—
CSMR-‐WCRE
So6ware
EvoluEon
Week,Antwerp,
Belgium
3
4. So6ware
ecosystem
DefiniEon
Business-‐oriented
view
•
“a
set
of
actors
func5oning
as
a
unit
and
interac5ng
with
a
shared
market
for
so#ware
and
services,
together
with
the
rela5onships
among
them.”
(Jansen
et
al.
2009)
Examples
• Eclipse
• Android
and
iOS
app
store
5
February
2014
—
CSMR-‐WCRE
So6ware
EvoluEon
Week,Antwerp,
Belgium
4
5. So6ware
ecosystem
DefiniEon
Development-‐centric
view
Examples
• “a
collec5on
of
so#ware
products
that
have
some
given
degree
of
symbio5c
rela5onships.”
• Gnome
KDE
!
• Debian
Ubuntu
!
• R’s
CRAN
!
• Apache
• Messerschmie
&
Szyperski:
So#ware
ecosystem:
Understanding
an
indispensable
technology
and
industry.
MIT
Press,
2003.
• “a
collec5on
of
so#ware
projects
that
are
developed
and
evolve
together
in
the
same
environment.”
• M.
Lungu:
Towards
reverse
engineering
so6ware
ecosystems.
Int’l
Conf.
So#ware
Maintenance,
2008,
pp.
428–431.
5
February
2014
—
CSMR-‐WCRE
So6ware
EvoluEon
Week,Antwerp,
Belgium
5
6. Main
Research
QuesEons
• Which
control
mechanisms
driving
natural
ecosystems
can
be
used
to
explain
dynamics
of
so6ware
ecosystems?
!
• Which
mechanisms
and
measures
can
we
borrow
from
ecology
to
explain
and
predict
how
so6ware
projects
evolve?
5
February
2014
—
CSMR-‐WCRE
So6ware
EvoluEon
Week,Antwerp,
Belgium
6
7. Terminology
Biological
ecosystem
DefiniEons
Example:
coral
reefs
• Ecology:
the
scien5fic
study
of
the
interac5ons
that
determine
the
distribu5on
and
abundance
of
organisms
• Ecosystem:
the
physical
and
biological
components
of
an
environment
considered
in
rela5on
to
each
other
as
a
unit
– combines
all
living
organisms
(plants,
animals,
micro-‐organisms)
and
physical
components
(light,
water,
soil,
rocks,
minerals)
5
February
2014
—
CSMR-‐WCRE
So6ware
EvoluEon
Week,Antwerp,
Belgium
• High
biodiversity:
polyps,
sea
anemones,
fish,
mollusks,
sponges,
algae
7
9. Ecological
theories
of
evoluEon
of
species
• Jean-‐BapEste
Lamarck
(1744-‐
1829)
• animal
organs
and
behaviour
can
change
according
to
the
way
they
are
used
• those
characterisEcs
can
transmit
from
one
generaEon
to
the
next
to
reach
a
greater
level
of
perfecEon
• Example:
giraffe’s
necks
have
become
longer
while
trying
to
reach
the
upper
leaves
of
a
tree
• Charles
Darwin
(1809–1882)
• all
species
of
life
have
descended
over
Eme
from
common
ancestors
• this
branching
paeern
resulted
from
natural
selecEon
• evoluEon
history
is
represented
by
a
phylogene5c
tree
• Example:
13
types
of
Galapagos
finches,
same
habits
and
characterisEcs,
but
different
beaks
5
February
2014
—
CSMR-‐WCRE
So6ware
EvoluEon
Week,Antwerp,
Belgium
9
10. Ecological
theories
of
evoluEon
of
species
Hologenome
theory
• The
unit
of
natural
selecEon
is
the
holobiont:
the
organism
together
with
its
associated
microbial
communiEes,
that
live
together
in
symbiosis.
• The
holobiont
can
adapt
to
changing
environmental
condiEons
far
more
rapidly
than
by
geneEc
mutaEon
and
selecEon
alone.
• Darwinism
emphasises
compe55on
(survival
of
the
fieest),
hologenome
theory
also
includes
coopera5on
(through
symbiosis)
!
In
so6ware
evoluEon:
Hologenome
theory
may
be
closer
to
what
one
observes
in
open
source
projects
where
cooperaEon
plays
a
more
important
role.
5
February
2014
—
CSMR-‐WCRE
So6ware
EvoluEon
Week,Antwerp,
Belgium
10
11. Ecological
theories
of
evoluEon
of
species
ReEculate
evoluEon
• EvoluEon
history
is
represented
as
a
graph
structure.
Two
or
more
evoluEonary
lineages
can
be
recombined
at
some
level
• hybrid
specia5on
(2
lineages
recombine
to
create
a
new
one)
• horizontal
gene
transfer
(genes
are
transferred
across
species)
!
In
so6ware
evoluEon:
Distributed
VCS
like
Git
promote
reEculate
evoluEon
through
fork
and
merge
(but
few
projects
actually
merge)
!
See
Robles
et
al.
A
Comprehensive
Study
of
So#ware
Forks:
Dates,
Reasons
and
Outcomes.
OSS
Conference
2012,
Best
Paper
Award.
5
February
2014
—
CSMR-‐WCRE
So6ware
EvoluEon
Week,Antwerp,
Belgium
11
13. Trophic
web
(food
chain)
in
natural
ecosystems
5
February
2014
—
CSMR-‐WCRE
So6ware
EvoluEon
Week,Antwerp,
Belgium
13
14. Trophic
web
in
so6ware
ecosystems
•Producer-‐consumer
relaEon
TOP-‐DOWN
change
requests
&
bug
reports
BOTTOM-‐UP
changes
in
core
projects
and
architecture
Onion
model
Users
Peripheral
developers
Core
developers
5
February
2014
—
CSMR-‐WCRE
So6ware
EvoluEon
Week,Antwerp,
Belgium
14
15. Core
Architecture
-‐
or
Why
developers
are
polyps
Coral
reef
ecosystem
So6ware
ecosystem
• Sclerac5nian
coral
polyps
are
• Core
developers
are
responsible
for
creaEng
the
responsible
for
creaEng
the
coral
reef
structure
core
so6ware
architecture
• This
coral
reef
is
required
for
• Based
on
this
core
the
other
species
of
the
architecture,
other
ecosystem
to
thrive.
developers
and
third
parEes
can
create
other
projects,
services,
and
so
on.
5
February
2014
—
CSMR-‐WCRE
So6ware
EvoluEon
Week,Antwerp,
Belgium
15
16. So6ware
Ecosystem
Dynamics
Predator-‐prey
relaEonship
(Lotka-‐Volterra
1925/1926)
• Predators
(hunEng
animals)
feed
upon
their
prey
(aeacked
animals)
• Can
be
described
by
a
dynamic
model
with
mutually
dependent
parametric
differenEal
equaEons
Analogies
in
so6ware
maintenance
• Debuggers
are
predators,
so6ware
defects
are
prey
Calzolari
et
al.
Maintenance
and
tes5ng
effort
modeled
by
linear
and
nonlinear
dynamic
systems,”
Informa5on
and
So#ware
Technology,
2001
• Developers
are
predators,
the
informaEon
they
seek
is
prey
Lawrance
et
al.
Scents
in
programs:
Does
informa5on
foraging
theory
apply
to
program
maintenance?
VL/HCC
2007
• Dual
(socio-‐technical)
view:
• Developers
are
predators,
the
projects
they
work
on
are
prey
• Projects
are
predators
that
feed
upon
the
cogniEve
resources
of
their
developers
5
February
2014
—
CSMR-‐WCRE
So6ware
EvoluEon
Week,Antwerp,
Belgium
16
17. Desirable
ecosystem
characterisEcs
Biodiversity
measures
the
degree
of
variaEon
of
species
within
a
given
ecosystem
• Maximum
diversity
if
all
species
have
same
number
of
individuals
• Low
diversity
if
a
parEcular
species
dominates
the
others
• Many
different
metrics:
Shannon
entropy,
Simpson
index,
evenness,
…
!
• Posnee
et
al.
used
similar
noEon
to
measure
developer
ac5vity
focus
and
module
ac5vity
focus
Dual Ecological Measures of Focus
in Software Development
Daryl Posnett† , Raissa D’Souza∗ , Premkumar Devanbu,† and, Vladimir Filkov†
†∗ University
of California Davis, USA
† {dpposnett,ptdevanbu,vfilkov}@ucdavis.edu,∗ raissa@cse.ucdavis.edu
Abstract—Work practices vary among software developers.
Some are highly focused on a few artifacts; others make wideranging contributions. Similarly, some artifacts are mostly authored, or “owned”, by one or few developers; others have very
wide ownership. Focus and ownership are related but different
phenomena, both with strong effect on software quality. Prior
studies have mostly targeted ownership; the measures of ownership used have generally been based on either simple counts,
information-theoretic views of ownership, or social-network views
of contribution patterns. We argue for a more general conceptual view that unifies developer focus and artifact ownership.
We analogize the developer-artifact contribution network to a
predator-prey food web, and draw upon ideas from ecology to
produce a novel, and conceptually unified view of measuring
focus and ownership. These measures relate to both cross-entropy
and Kullback-Liebler divergence, and simultaneously provide
two normalized measures of focus from both the developer and
artifact perspectives. We argue that these measures are theoretically well-founded, and yield novel predictive, conceptual, and
actionable value in software projects. We find that more focused
developers introduce fewer defects than defocused developers. In
contrast, files that receive narrowly focused activity are more
likely to contain defects than other files.
I. I NTRODUCTION
Developers are the lifeblood of open source software, OSS,
and their contributions are vital for OSS to thrive. Rather
than being assigned tasks by management, OSS developers are
generally free to choose the style, focus, and breadth of their
contributions. Some might be quite focused, working on one
specific subsystem; others may contribute to many different
subsystems. An device driver expert, for example, may contribute very specialized knowledge to an open source project,
focusing on only a few files or packages. His contributions to a
small subset of modules1 may be his only contribution during
his tenure with the project. In contrast, a project leader may
work on a variety of different tasks touching many modules
within a project. While OSS developers are free to choose
their contribution styles, such choices are not inconsequential,
especially to the central issue of software quality.
A dominant theme emerging from previous work in this
area is module ownership [1], [2], [3]. Low ownership of a
module, i.e., too many contributors, can adversely impact code
quality. There is, however, an entirely different perspective,
developer’s attention focus, which is relatively unexplored.
Human attention and cognition are finite resoucres [4]. When
different tasks are simultaneously engaged, they can compete
1 We
use modules to mean either packages or files, depending on the context.
978-1-4673-3074-9/13/$31.00 c 2013 IEEE
5
February
2014
—
CSMR-‐WCRE
So6ware
EvoluEon
Week,Antwerp,
Belgium
ICSE
2013
for mental resources and task performance can suffer [5]. A
developer engaged in many different tasks carries a greater
cognitive burden than a more focused developer. Interestingly,
the developer and module perspectives are, conceptually symmetric, dualistic views of focus. From a module’s perspective,
strong ownership indicates a strong focused contribution. We
refer to this as module activity focus, or MAF, a measure of
how focused the activities are on a module. Symmetrically, we
refer to the developer’s attention focus, or DAF, a measure
of how focused the activities are of a particular developer.
A surprising, but natural analogy for MAF and DAF, are
predator-prey food webs from ecology. In a sense, modules
are predators that “feed upon” the cognitive resources of
developers. As the number of developers contributing to a
module increases, the diversity of cognitive resources upon
which the module “feeds” also increases; likewise, a developer
is a “prey” whose limited cognitive resources are spread over
the modules that “prey” upon her.
Ecosystem diversity is of great interest to ecologists.
Williams and Martinez call the roles complexity and diversity
play “[o]ne of the most important and least settled questions
in ecology.” [6] This diversity has two symmetric perspectives,
both from a prey’s perspective, and a predator’s perspective.
Ecologists have developed sophisticated symmetric measures
of predator-prey relationships, drawing upon ideas such as
entropy and Kulback-Leibler divergence, that simultaneously
capture both perspectives. We adapt these measures for software engineering projects into the metrics MAF and DAF.
In this work, we employ the methodology presented by El
Emam to validate our measures [7]. In particular, we show
that the DAF and MAF measures succeed in distinguishing
important cases that extant measures don’t capture. We make
the following contributions:
• We adapt terminology and motivation from ecology,
based on bipartite graphs;
• We incorporate and generalize previous results on developer and artifact diversity;
• We provide easy to compute measures of focus, MAF
and DAF, normalized to facilitate comparison within and
across projects;
• We show these measures more precisely capture outcomes relevant to software researchers and practitioners.
This novel analysis simultaneously considers focus both
from the artifact perspective and the author perspective.
Researchers can use our MAF and DAF metrics to more
452
ICSE 2013, San Francisco, CA, USA
17
18. Desirable
ecosystem
characterisEcs
• Stability
• the
capacity
to
maintain
an
equilibrium
over
longer
periods
of
Eme
• Resistance
• the
ability
to
withstand
environmental
changes
without
too
much
disturbances
of
its
biological
communiEes
• Resilience
• the
ability
to
return
to
an
equilibrium
a6er
a
disturbance
!
Goal:
Use
these
and
related
measures
to
study
maintainability
and
survivability
of
so6ware
projects
within
their
ecosystem
5
February
2014
—
CSMR-‐WCRE
So6ware
EvoluEon
Week,Antwerp,
Belgium
18
19. Ongoing
Research
2
case
studies
• CRAN
(Comprehensive
R
Archive
Network)
– CharacterisEcs
15
years
>
5000
packages
>
2500
contributors
different
OS
flavours
(Linux,
Windows,
MacOS,
Solaris)
superlinear
package
growth
– Goal
• Study
package
dependencies
and
maintainability
(number
of
errors
and
Eme
to
fix)
and
their
effect
on
package
survivability
• See
our
CSMR-‐WCRE
2014
ERA
paper
“On
the
maintainability
of
CRAN
packages”
5
February
2014
—
CSMR-‐WCRE
So6ware
EvoluEon
Week,Antwerp,
Belgium
19
20. Ongoing
Research
2
case
studies
• GNOME
– CharacterisEcs
16
years
>
1400
projects
>
5800
contributors
>
1.3M
commits
>
12M
file
touches
– Goals
1. Combine
different
ecosystem
measures
into
a
predicEve
model
of
project
survivability
2. Study
migra5on
paberns
of
contributors
and
their
effect
on
project
survivability
5
February
2014
—
CSMR-‐WCRE
So6ware
EvoluEon
Week,Antwerp,
Belgium
20
21. Ongoing
Research
GNOME
case
study
1
Combine
different
ecosystem
measures
into
a
predicEve
model
of
project
survivability
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING,
VOL. 38,
NO. 1,
JANUARY/FEBRUARY 2012
163
Defining and Evaluating a Measure
of Open Source Project Survivability
Uzma Raja, Member, IEEE Computer Society, and Marietta J. Tretter
– Replicate
and
generalise
the
empirical
study
by
Uzma
Raja
Abstract—In this paper, we define and validate a new multidimensional measure of Open Source Software (OSS) project survivability,
called Project Viability. Project viability has three dimensions: vigor, resilience, and organization. We define each of these dimensions
and formulate an index called the Viability Index (V I) to combine all three dimensions. Archival data of projects hosted at
SourceForge.net are used for the empirical validation of the measure. An Analysis Sample (n ¼ 136) is used to assign weights to each
dimension of project viability and to determine a suitable cut-off point for V I. Cross-validation of the measure is performed on a holdout Validation Sample (n ¼ 96). We demonstrate that project viability is a robust and valid measure of OSS project survivability that can
be used to predict the failure or survival of an OSS project accurately. It is a tangible measure that can be used by organizations to
compare various OSS projects and to make informed decisions regarding investment in the OSS domain.
Index Terms—Evaluation framework, external validity, open source software, project evaluation, software measurement, software
survivability.
Ç
1
INTRODUCTION
O
PEN Source Software (OSS) projects are developed and
distributed for free, with full access to the project
source code. Recently there has been a significant increase
in the use of these projects. Some OSS projects have earned
themselves a high reputation and corporate sponsorships.
Large corporations (e.g., IBM, SUN microsystems) are
becoming involved with the OSS movement in various
capacities. Projections indicate that the corporate interest in
OSS projects will grow stronger in the future [1] and these
projects will see integration in enterprise architecture [2].
This increased use of OSS projects creates the need for
better project evaluation measures.
Traditionally, software projects are evaluated by conformance to budget, schedule, and user requirements [3], [4],
[5], [6], [7], [8]. These measures, however, are difficult to
map to OSS projects, which are developed through a
network of volunteer participants, with no defined budget,
schedule, or customer. Although there is a surge in the
investment in OSS projects [1], research indicates that a large
number of OSS projects fail [9], [10]. Some have questioned
the operational reliability and quality of OSS projects [11].
Since there are no contractual or legal bindings for providing
OSS updates or maintenance services, businesses investing
human or financial capital on adoption of OSS projects need
the ability to evaluate whether the project will continue to
exist or not [12]. Development teams need to measure
5
February
2014
—
CSMR-‐WCRE
So6ware
EvoluEon
Week,Antwerp,
Belgium
. U. Raja is with the Department of Information Systems, Statistics and
Management Science, The University of Alabama, Box #870226,
300 Campus Drive, Tuscaloosa, AL 35487. E-mail: uraja@cba.ua.edu.
. M.J. Tretter is with the Department of Information and Operations
Management, Texas A&M University, Mail Stop #310D, Wehner
project survivability to control and improve performance.
Individual and corporate users need a measure of project
survivability to compare the available OSS projects before
making decisions regarding project adoption.
In this paper, we define and validate a new multidimensional measure of OSS project survivability, called
Project Viability. OSS projects provide access to their
development archives, thereby providing a unique opportunity to conduct empirical research [13] and develop
reliable measures [14], [15]. In the following sections, we
define, formulate, and validate project viability. Section 2
provides a brief overview of the existing empirical research
in OSS and the background of project survivability. Section 3
defines the dimensions of project viability and formulates
an index to measure it. Section 4 discusses the empirical
evaluation framework and validates the new measure using
OSS project data. Discussion of the results is presented in
Section 5 and conclusions are given Section 6.
2
BACKGROUND
A large number of OSS projects are available for use.
However, the failure rate of these projects is high [9]. The
evaluation of OSS projects is different than Commercial
Software Systems (CSS) [16]. The adopters of OSS projects
need a mechanism to compare the chances of failure or
survival of the available projects. This would allow better
decisions regarding corporate resource investment.
A range of measures has been used in prior research to
evaluate OSS projects. Godfrey and Tu [17] examined the
evolution of the Linux kernel and its growth pattern in one
21
22. Ongoing
Research
GNOME
case
study
2
Study
migra5on
paberns
of
contributors
and
their
effect
on
project
survivability
5
February
2014
—
CSMR-‐WCRE
So6ware
EvoluEon
Week,Antwerp,
Belgium
22
23. joiners are incoming coders in the considered project that were not active in any
of the G NOME projects during the preceding period. A similar definition holds for
the local and global leavers. Formally, the metrics are defined as follows. Let p be
a G NOME project, t a 6-month activity period (and t 1 the previous period), c a
coder, Gnome the set of G NOME’s code projects, and isDev(c,t, p) is a predicate
which is true if and only if c made a code commit in p during t:
Ongoing
Research
GNOME
case
study
2
Timeline
(6-‐month
intervals)
of
joiners
to
Gnome
projects
localLeavers(p,t) =
{c|isDev(c,t 1, p) ^ ¬isDev(c,t, p) ^ 9p2 (p2 2 Gnome ^ isDev(c,t, p2 ))}
globalLeavers(p,t) =
{c|isDev(c,t 1, p) ^ 8p2 (p2 2 Gnome ) ¬isDev(c,t, p2 ))}
localJoiners(p,t) =
{c|isDev(c,t, p) ^ ¬isDev(c,t 1, p) ^ 9p2 (p2 2 Gnome ^ isDev(c,t 1, p2 ))}
globalJoiners(p,t) =
{c|isDev(c,t, p) ^ 8p2 (p2 2 Gnome ) ¬isDev(c,t 1, p2 ))}
2001
2003
2005
2007
2009
2011
2013
30
1999
2001
2003
2005
2007
2009
Time
25
evolution
gtk+
5
0
2011
2013
1997
1999
2001
2003
2005
2007
2009
2011
2013
Time
gimp
15
20
Fig. 1.11 Historical evolution (timeline) of the number of local (black solid) and global (red
dashed) joiners (y-axis) for three G NOME projects.
Joiners
25
Joiners
15
20
25
20
15
10
35
1997
30
1999
Time
Joiners
GTK+
5
0
1997
15
Joiners
20
25
30
25
20
15
Joiners
10
15
10
5
Gimp
0
35
EvoluEon
30
30
35
Joiners
20
25
30
-‐
Black
=
local
joiners
from
other
Gnome
projects
-‐
Red
=
global
joiners
from
outside
of
Gnome
-‐
Blue
=
stayers
5
0
10
5
0
10
5
0
10
We did not find any general trend, the patterns of intake and loss of coders are
highly project-specific. Figure 1.11 illustrates the evolution of the number of local
and global joiners for some of the more important G NOME projects (the figures for
leavers are very similar). For some projects (e.g., evolution) we do not observe
a big difference between the number of local and global joiners, respectively. These
projects seem to attract new developers both from within and outside of G NOME.
Other projects, like gimp, 2013
attract most of its incoming developers from outside
1997
1999
2001
2003
2005
2007
2009
2011
2013
1997
1999
2001
2003
2005
2007
2009
2011
1997
1999
2001
2003
2005
2007
2009
2011
2013
G NOME. A third category of projects attracts most of its incoming developers from
Time
Time
Time
other G NOME projects. This is the case for gtk+, glib and libgnome, which
5
February
2014
—
CSMR-‐WCRE
So6ware
EvoluEon
Week,Antwerp,
Bcan be considered as belonging to the core of G NOME. This observation seems to 23
elgium
24. 28
Tom Mens, Ma¨ lick Claes, Philippe Grosjean and Alexander Serebrenik
e
MigraEon
in
so6ware
ecosystems
Gnome
case
study
project that were not active in this project during the preceding 6-month period,
but that were involved in some activity in other G NOME projects instead. Global
joiners are incoming coders in the considered project that were not active in any
of the G NOME projects during the preceding period. A similar definition holds for
the local and global leavers. Formally, the metrics are defined as follows. Let p be
a G NOME project, t a 6-month activity period (and t 1 the previous period), c a
coder, Gnome the set of G NOME’s code projects, and isDev(c,t, p) is a predicate
which is true if and only if c made a code commit in p during t:
Timeline
(6-‐month
intervals)
of
leavers
from
Gnome
projects
localLeavers(p,t) =
{c|isDev(c,t 1, p) ^ ¬isDev(c,t, p) ^ 9p2 (p2 2 Gnome ^ isDev(c,t, p2 ))}
globalLeavers(p,t) =
{c|isDev(c,t 1, p) ^ 8p2 (p2 2 Gnome ) ¬isDev(c,t, p2 ))}
localJoiners(p,t) =
{c|isDev(c,t, p) ^ ¬isDev(c,t 1, p) ^ 9p2 (p2 2 Gnome ^ isDev(c,t 1, p2 ))}
globalJoiners(p,t) =
{c|isDev(c,t, p) ^ 8p2 (p2 2 Gnome ) ¬isDev(c,t 1, p2 ))}
35
30
25
Joiners
20
30
5
10
25
evolution
2009
2011
2013
1997
0
20
2007
1999
2001
2003
2005
2007
2009
Time
15
2005
gtk+
2011
2013
1997
1999
2001
2003
2005
2007
2009
2011
2013
Time
gimp
10
Fig. 1.11 Historical evolution (timeline) of the number of local (black solid) and global (red
dashed) joiners (y-axis) for three G NOME projects.
5
15
10
15
30
20
15
Joiners
10
5
0
2003
Leavers
0
2001
Time
5
GTK+
25
30
25
20
Joiners
10
5
25
20
20
Leavers
1999
We did not find any general trend, the patterns of intake and loss of coders are
highly project-specific. Figure 1.11 illustrates the evolution of the number of local
1997
1999
2001
2003
2005
2007
2009
2011
2013
1997
1999
2001
2003
2005
2007
2009
2011
2013
1997
1999
2001
2003
2005
2007
2009
2011
and global joiners for some of the more important G NOME projects (the figures for2013
Time
Time
leavers are very similar). For some projects (e.g., evolution) we do not observe
Time
a elgium
big difference between the number of local and global joiners, respectively. These24
5
February
2014
—
CSMR-‐WCRE
So6ware
EvoluEon
Week,Antwerp,
B
0
0
0
Leavers
15
1997
5
10
Gimp
15
35
30
EvoluEon
25
30
35
-‐
Black
=
local
joiners
from
other
Gnome
projects
-‐
Red
=
global
joiners
from
outside
of
Gnome
-‐
Blue
=
stayers
25. Some
references
UMONS
Faculté des Sciences
Département d’Informatique
To appear in 2013 in Springer’s Empirical Software Engineering journal – manuscript No
(will be inserted by the editor)
On the variation and specialisation of workload – A
case study of the Gnome ecosystem community
Understanding the Evolution of
Socio-technical Aspects in Open Source
Ecosystems: An Empirical Analysis of
GNOME
Mathieu Goeminne
DOI: 10.1007/s10664-013-9244-1
A dissertation submitted in fulfillment of the requirements of
the degree of Docteur en Sciences
Advisor
Jury
Dr. TOM M ENS
Dr. X AVIER B LANC
Université de Mons, Belgium
Université de Bordeaux 1, France
Dr. V ÉRONIQUE B RUYÈRE
Université de Mons, Belgium
@
MSR
2013 Universidad Rey Juan Carlos, Spain
A historical dataset for G NOME contributors
Dr. T
M
Dr. J ESUS M. G ONZALEZ -B ARAHONA
OM
ENS
Université de Mons, Belgium
Mathieu Goeminne, Ma¨ lick Claes and Tom Mens
e
Software Engineering Lab, COMPLEXYS research institute,A LEXANDER S EREBRENIK
Dr. UMONS, Belgium
Technische Universiteit Eindhoven, The Netherlands
Abstract—We present a dataset of the open source
software ecosystem G NOME from a social point of view.
We have collected historical data about the contributors
to all G NOME projects stored on git.gnome.org, taking
June
into account the problem of identity matching, and associating different activity types to the contributors. This
type of information is very useful to complement the
traditional, source-code related information one can obtain by mining and analyzing the actual source code.
The dataset can be obtained at https://bitbucket.org/
mgoeminne/sgl-flossmetric-dbmerge.
Bogdan Vasilescu · Alexander Serebrenik ·
Mathieu Goeminne · Tom Mens
Dr. J EF we have
In this paper, we present the process W IJSENused
Université de Mons, information
to create a dataset containing the historicalBelgium
related to contributors to the G NOME ecosystem. Our
database
2013 and the tools and scripts used to created it can
be found on a dedicated Bitbucket repository2 .
In contrast to many other datasets, we do not focus on
source code, since a significant amount of files committed to G NOME’s project repositories do not even contain
code (e.g., image files, web pages, documentation, localization and many more). Such type of information is
often ignored in MSR research while it is very relevant
to understand which types of activities contributors are
I. I NTRODUCTION
5
February
2014
—
CSMR-‐WCRE
So6ware
EvoluEon
Week,Antwerp,
Belgium
Abstract Most empirical studies of open source software repositories focus on the
analysis of isolated projects, or restrict themselves to the study of the relationships between technical artifacts. In contrast, we have carried out a case study that
focuses on the actual contributors to software ecosystems, being collections of software projects that are maintained by the same community. To this aim, we defined
a new series of workload and involvement metrics, as well as a novel approach—
e
T-graphs—for reporting the results of comparing multiple distributions. We used
these techniques to statistically study how workload and involvement of ecosystem contributors varies across projects and across activity types, and we explored
to which extent projects and contributors specialise in particular activity types.
Using Gnome as a case study we observed that, next to coding, the activities of localization, development documentation and building are prevalent throughout the
ecosystem. We also observed notable di↵erences between frequent and occasional
contributors in terms of the activity types they are involved in and the number
of projects they contribute to. Occasional contributors and contributors that are
involved in many di↵erent projects tend to be more involved in the localization activity, while frequent contributors tend to be more involved in the coding activity
in a limited number of projects.
Keywords open source · software ecosystem · metrics · developer community ·
case study
B. Vasilescu and A. Serebrenik
MDSE, Eindhoven University of Technology, PO Box 513, 5600 MB Eindhoven, The Nether-
25
26. References
Mens, Tom; Serebrenik, Alexander; Cleve, Anthony (Eds.)
2014, XXIII, 404 p.
!
Springer, ISBN 978-3-642-45398-4
Chapter 10
Studying Evolving Software Ecosystems
based on Ecological Models
Tom Mens, Ma¨ lick Claes, Philippe Grosjean and Alexander Serebrenik
e
Research on software evolution is very active, but evolutionary principles, models
and theories that properly explain why and how software systems evolve over time
are still lacking. Similarly, more empirical research is needed to understand how
different software projects co-exist and co-evolve, and how contributors collaborate
within their encompassing software ecosystem.
In this chapter, we explore the differences and analogies between natural ecosystems and biological evolution on the one hand, and software ecosystems and software evolution on the other hand. The aim is to learn from research in ecology to
advance the understanding of evolving software ecosystems. Ultimately, we wish
to use such knowledge to derive diagnostic tools aiming to analyse and optimise
the fitness of software projects in their environment, and to help software project
communities in managing their projects better.
February
2014
-‐
CSMR-‐WCRE
So6ware
EvoluEon
Week,
Antwerp,
Belgium
5
February
2014
—
CSMR-‐WCRE
So6ware
EvoluEon
Week,Antwerp,
Belgium
26
27. Interested
in
joining?
• Open
PhD
posiEon
available
• 6
to
12
month
postdoc
visits
welcomed
5
February
2014
—
CSMR-‐WCRE
So6ware
EvoluEon
Week,Antwerp,
Belgium
27