The document discusses an empirical study of roles in open source software development projects. It finds that while developers contribute code, many open source projects also benefit greatly from non-coding roles like commenters, reviewers, and reactors. The study analyzed over 100 NPM package projects to determine the distribution of roles and how specialized communities are around each role. It was found that commenters have high activity levels and that non-coding roles become more important as projects grow. The diversity of roles in different projects and communities was also analyzed.
1. On the Analysis of Non-Coding
Roles in Open Source Development
Javier L. Cánovas Izquierdo, Jordi Cabot
Paper accepted at
EMPIRICAL SOFTWARE ENGINEERING 27, 18 (2022)
Published: November 2nd, 2021
An Empirical Study of NPM Package projects
2. OSS Sustainability
Open Source projects suffer from grave
sustainability issues as many people use the
software but very few contribute to it
How can we optimize the collaboration?
How can we improve the onboarding process?
Can we “capture” new contributors?
OSS is not only code…
…it’s community
How to enforce development process?
How to sustain the community?
…
unsplash/bekir-donmez
4. Role characterization in GitHub
DEVELOPER
REVIEWER
MERGER
REPORTER
COMMENTER
REACTOR
NON-CODING
CODING
5. Methodology
RESEARCH QUESTIONS
What is the role-based activity distribution in OSS?
RQ1
How specialized is the community around each role?
RQ2
unsplash/rawpixel
6. Methodology
RESEARCH QUESTIONS
What is the role-based activity distribution in OSS?
RQ1
How specialized is the community around each role?
RQ2
unsplash/rawpixel
APPROACH
Full set of projects
General Groups of projects
Specific
Project Type Community Size
vs.
7. Methodology
RESEARCH QUESTIONS
What is the role-based activity distribution in OSS?
RQ1
How specialized is the community around each role?
RQ2
unsplash/rawpixel
DATASET CONSTRUCTION
RETRIEVAL
& CLONING
REPOSITORY
ANALYSIS
GRAPH
GENERATION
NPM ecosystem
Top 100 repos
SourceCred
Analysis tool
Collaboration
Graphs
28,468 users / 38,502 commits / 13,941 issues / 12,312 pull requests / 89,484 comments
APPROACH
Full set of projects
General Groups of projects
Specific
Project Type Community Size
vs.
8. Methodology
RESEARCH QUESTIONS
What is the role-based activity distribution in OSS?
RQ1
How specialized is the community around each role?
RQ2
unsplash/rawpixel
DATASET CONSTRUCTION
RETRIEVAL
& CLONING
REPOSITORY
ANALYSIS
GRAPH
GENERATION
NPM ecosystem
Top 100 repos
SourceCred
Analysis tool
Collaboration
Graphs
28,468 users / 38,502 commits / 13,941 issues / 12,312 pull requests / 89,484 comments
APPROACH
Full set of projects
General Groups of projects
Specific
Project Type Community Size
vs.
10. RQ1. Role-based Activity Distribution
Activity Distribution Analysis Prototypical Contributor Profile
11. RQ1. Role-based Activity Distribution
Activity Distribution Analysis Prototypical Contributor Profile
12. RQ1. Role-based Activity Distribution
Activity Distribution Analysis Prototypical Contributor Profile
13. RQ1. Role-based Activity Distribution
Activity Distribution Analysis Prototypical Contributor Profile
14. RQ1. Role-based Activity Distribution
Activity Distribution Analysis Prototypical Contributor Profile
15. Results Summary
What is the role-based activity distribution in OSS?
RQ1
High presence of commenters’ actions (higher than developers’)
Reviewers’ and reactors’ actions grow as the community does
All roles have their importance highlighting the complexity of OSS
High collaboration rate
Increasing structure on the development side
Broader participation of non-coding contributors
24. Results Summary
What is the role-based activity distribution in OSS?
RQ1
How specialized is the community around each role?
RQ2
High presence of commenters’ actions (higher than developers’)
Reviewers’ and reactors’ actions grow as the community does
All roles have their importance highlighting the complexity of OSS
High collaboration rate
Increasing structure on the development side
Broader participation of non-coding contributors
Projects are diverse, with high presence of reactors, commenters and reporters Presence of non-coding roles
Reactors, commenters and reporters often appear in a one-role configuration Entry point for people joining the project
One-role configuration still persists or move to other non-coding roles Potential low onboarding rate
Lack of cross-role configurations combining coding and non-coding roles Specialization
26. Discussion
Photos from Unsplash by Jamie Street, Alvaro Reves, Iyan Kurnia, Chuttersnap, M.B.M. (top to bottom)
IMPROVE
ONBOARDING
GOVERNANCE OF
NON-CODING
CONTRIBUTORS
PROMOTION OF
MIGRATION PATHS
METHODS TO VISUALIZE
CONTRIBUTIONS
TEMPORAL
ANALYSIS
Situation: Efforts to attract and onboard new contributors are clearly targeting developers
Why not focusing on non-coding contributors and maybe then incentivize them to participate in coding tasks?
Situation: Governance rules (e.g., contributing.md) focus mainly on coding contributors
How to make non-coding contributions more visible in code hosting platforms?
Situation: Lack of information about the roles of the project and how (and where) they are welcome
Would it be possible to identify “careers” within the project?
Situation: It is hard to know the roles played by contributors in OSS projects
Could graphical representations (e.g., our radar graphs), help on profiling contributors (beyond coding tasks)?
Situation: Most empirical analysis focus on a project snapshot
How could we leverage on the temporal dimension of OSS project activities?
27. Thanks!
IMPROVE
ONBOARDING
GOVERNANCE OF
NON-CODING
CONTRIBUTORS
PROMOTION OF
MIGRATION PATHS
METHODS TO
VISUALIZE
CONTRIBUTIONS
TEMPORAL
ANALYSIS
Javier L. Cánovas Izquierdo
jcanovasi@uoc.edu
@jlcanovas
Jordi Cabot
jordi.cabot@icrea.cat
@jordiCabot
Except where otherwise noted, content on this presentation is licensed under a Creative Commons Attribution 4.0 International license.