This document summarizes a study of license violations in the npm and RubyGems package dependency networks. The researchers analyzed over 750,000 npm packages and 95,000 RubyGems packages to determine: 1) the most prevalent licenses in each ecosystem, 2) the extent that direct dependencies rely on incompatible licenses, and 3) how license incompatibility spreads across indirect dependencies. They found that MIT and Apache licenses are most common, direct dependencies rarely have incompatible licenses, and GPL dependencies cause most indirect violations that decrease with deeper dependency levels.
A Secure and Reliable Document Management System is Essential.docx
Prevalence and Evolution of License Violations in npm and RubyGems Dependency Networks
1. Prevalence and Evolution of License Violations
in npm and RubyGems Dependency Networks
Ilyas Saïd Makari Ilyas.Said.Makari@vub.be
Ahmed Zerouali Ahmed.Zerouali@vub.be
Coen De Roover Coen.De.Roover@vub.be
The International Conference on Software and Systems Reuse (ICSR)
Virtual (originally Montpellier, France) - June 15-17, 2022
3. Open source software can be distributed with varying degrees of freedom
● 1. Public domain
○ All rights are granted with no conditions whatsoever
○ For example: “The Unlicense”
● 2. Permissive licenses
○ Little restrictions imposed
○ Must include copyright notice from original author
○ MIT, Apache, BSD, etc
● 3. Restrictive licenses (copyleft)
Background
4. ● Strong copyleft licenses
○ All derivatives of the original work should be released under the same license
○ For example: GNU General Public License (GPL)
● Weak copyleft licenses
○ Exception when work is used as independent building block
○ For example: GNU Lesser General Public License (LGPL)
Background
5. Not all licenses may be legally combined in one software package
- For example: MIT (permissive) is compatible with GPL (restrictive), but not vice
versa.
We call a license A “one-way compatible” with license B, if software that
contains packages from both licenses may be legally licensed under
license B.
Background
10. Method
A new license Compatibility Matrix, based on:
1. Kapitsaki et al. [*]
[*] Georgia M Kapitsaki, Frederik Kramer, and Nikolaos D Tselikas. Automating the license compatibility process in open source
software with spdx. Journal of Systems and Software, 131:386–401, 2017.
Compatibility graph from Kapitsaki et al. [*]
Then, we manually included information from:
1. Free Software Foundation
2. The European Commission
=> answer for 1,681 pairs of licenses, from which 205 (12.2%) are labeled as “Unknown”
11. Research questions
RQ1: What are the most prevalent licenses in package repositories?
○ What are the climates of each ecosystem?
○ Permissive or restrictive climate?
○ Does it influence the number of incompatibilities?
RQ2: To which extent do packages rely on direct dependencies with incompatible
licenses?
○ How prevalent are license violations on the first dependency tree level?
RQ3: How does license incompatibility spread across package dependency
networks?
○ How prevalent are license violations on each dependency tree level?
12. Case studies
~750k packages (latest release)
~3.5M direct runtime
dependencies
~95k packages (latest release)
~211k direct runtime
dependencies
13. Open Data:
- Libraries.io gathers data from 32 package managers and 3 source code
repositories.
- They monitor over 5.4M unique open source packages, and more than 500M
interdependencies between them.
Dataset
15. Case studies
~750k packages
~3.5M direct dependencies
On January 12th, 2020
~ 66.4 M (all) runtime dependencies
~7.3% of the packages have
dependencies with incompatible
licenses,
~95k packages
~211k direct dependencies
On January 12th, 2020:
~ 1.2M (all) runtime dependencies
~13.9% of the packages have
dependencies with incompatible
licenses,
16. Research questions
RQ1: What are the most prevalent licenses in package repositories?
- MIT is the most popular license in npm and RubyGems.
17. Research questions
RQ1: What are the most prevalent licenses in package repositories?
- MIT has been popular in npm since its beginning.
- ISC becoming the new default license for npm packages increased its
popularity.
18. Research questions
RQ1: What are the most prevalent licenses in package repositories?
- MIT gradually evolved into the most popular license.
- Over the last few years, Apache has become the second most popular
license choice within the RubyGems ecosystem.
19. Research questions
RQ2: To which extent do packages rely on direct dependencies with incompatible
licenses?
- Only 0.9% and 4.3% of npm and RubyGems dependencies have licenses
that are incompatible with those of their dependents, respectively.
- The most common pair of incompatible licenses is MIT with GPL.
20. Research questions
RQ3: How does license incompatibility spread across package dependency networks?
- npm packages have more indirect dependencies with incompatible licenses than
RubyGems (due to the high number of dependencies that npm packages include.)
- However, RubyGems has proportionally more incompatible indirect dependencies
than npm.
21. Research questions
RQ3: How does license incompatibility spread across package dependency networks?
- The number of dependencies without a license decreases from one level to the
next as we go deeper in both package repositories.
22. Tooling
Screenshot of the license compatibility checking tool.
https://doi.org/10.5281/zenodo.5913761
23. Conclusion
● Deeper-level dependencies cause fewer incompatibilities than those at the shallow levels.
● GPL dependencies are the major cause for incompatibilities, and they are more present in the first
level of dependency trees.
● We found that a set of packages created by a single organization can influence an ecosystem when it
consistently releases useful packages under a particular license.
● Our results help in understanding the state of license incompatibilities in software package
ecosystems.
24. Threats to validity
● Libraries.io dataset.
● Various sources of information to construct our license compatibility matrix.
● We only considered the license of the latest release of each package.
● Many packages do not have any license.