Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Is my software ecosystem healthy? It depends!

61 visualizaciones

Publicado el

QUATIC 2020 keynote presentation by Tom Mens (University of Mons) on dependency-related health issues in software ecosystems and research advances to address such health issues. Part of the presented research has been conducted as part of the Belgian SECO-ASSIST Excellence of Science Research Project.

Publicado en: Software
  • Sé el primero en comentar

Is my software ecosystem healthy? It depends!

  1. 1. Software Engineering Lab Faculty of Sciences Tom Mens tom.mens@umons.ac.be @tom_mens Is my software ecosystem healthy?
  2. 2. Directed by Tom Mens Department of Computer Science Faculty of Sciences tom.mens@umons.ac.be http://informatique.umons.ac.be Software Engineering Lab
  3. 3. SECO-ASSIST "Excellence of Science” Research Project 2018-2021 secoassist.github.io@secoassist
  4. 4. What is a software packaging ecosystem? A collection of interdependent software packages that are developed and distributed by a large community of software developers • Distributed development, e.g., through git • Social coding (e.g., GitHub, GitLab, BitBucket) • Package distribution through dedicated package managers • Ecosystem-specific versioning and release policies © 2019 Théo Zimmermann. Challenges in the collaborative evolution of a proof language and its ecosystem. PhD dissertation, Université de Paris
  5. 5. OS Package manager Logo macOS MacPorts, Homebrew Linux dpkg, apt, RPM, pacman Windows winget, Windows Store, Chocolatey Android Play Store iOS App Store ROS rospkg Packaging ecosystems can be for a specific operating system
  6. 6. Project #packages Logo Eclipse >40M Wordpress >67K Atom >13K Emacs >5K … Packaging ecosystems can be for a specific (open source) project / community
  7. 7. Language Package manager #packages Logo JavaScript npm >1.4M PHP Packagist >0.33M Python PyPI >0.26M .NET NuGet >0.22M Java Maven >0.19M Ruby RubgyGems >0.16M Cargo (Rust), CPAN (Perl), CRAN (R), NuGet (.NET), Hackage (Haskell), … Packaging ecosystems can be for a specific programming language
  8. 8. Libraries.io monitors 7,387,590 open source packages across 37 different package managers https://libraries.io (20 May 2020)
  9. 9. Packaging ecosystems can be unhealthy • Bugs • Security vulnerabilities • Backward incompatibilities • Single maintainer packages • Abandoned packages • Deprecated packages • Outdated packages • Bloated packages • Micro-packages • Non-compliance to versioning policies • Incompatible or prohibited licenses • Suboptimal release and update policies • …
  10. 10. Dependency Hell • Too many direct and transitive dependencies • Broken dependencies due to backward incompatibilities • Co-installability problems • Deprecated dependencies • Outdated dependencies • Bloated dependencies
  11. 11. Case studies of dependency-related health issues in software ecosystems
  12. 12. < short description of the reported health issue > Symptom < How was the health issue observed? How did it impact the community or ecosystem? > Diagnosis < What was the cause of the health issue? > Cure < How was the health issue resolved? > Prevention < What could be done to prevent such health issues to (re-)occur in the future? > Case studies
  13. 13. data breach (May 2017) “attackers entered its system in mid-May through a web- application vulnerability (CVE-2017-5638) that had a patch available in March. In other words, the credit-reporting giant had more than two months to take precautions that would have defended the personal data of 143 million people from being exposed. It didn’t.” Wired Magazine, “Equifax Has No Excuse”, September 2017
  14. 14. data breach (May 2017) Symptom From May to July 2017, a security vulnerability was exploited to illegally obtain personal data from hundreds of millions of financial customers. Diagnosis The vulnerability was coming from an out-of-date dependency (Apache Struts). Cure Apply the available security package of the problematic dependency. Prevention • Use vulnerability monitoring tools. • Update dependencies if vulnerability fixes are available.
  15. 15. Typosquatting package names (December 2018) Two libraries were created by the same developer and mimicked other more popular libraries using a technique called typosquatting to register similarly- looking names. The first is "python3-dateutil," which imitated the popular "dateutil" library. The second is "jeIlyfish" (the first L is an I), which mimicked the "jellyfish" library. The jeIlyfish library had been available for nearly a year, since December 11, 2018. This is the sixth time the PyPI team intervenes to remove typo-squatted malicious Python libraries from the official repository. https://www.zdnet.com/article/two-malicious-python-libraries-removed-from-pypi/
  16. 16. Typosquatting package names (December 2018) Symptom Packages stealing SSH and GPG keys from the projects of infected developers. Diagnosis The packages use names that mimick other popular packages in the registry. Cure The Python security team discovered and removed the packages. (For one of them, only after a year.) Prevention • Be aware of depending on the right package. • Defenses against typo squatting: https://incolumitas.com/2016/06/08/typosquatting-package-managers/
  17. 17. “left-pad” incident (March 2016)
  18. 18. “left-pad” incident (March 2016) Symptom Significant downtime on major websites like Facebook, Instagram, LinkedIn, Netflix. Diagnosis • Unexpected removal of left-pad package caused 5400 npm packages to become uninstallable. • Most of these packages only depended transitively on left-pad, and were not even aware of its existence. Cure • Reintroduced older release of left-pad in the ecosystem. • Changed ecosystem package removal policy. Prevention • Ecosystem-level: Prevent package releases from being removed from the ecosystem. • Package-level: Avoid having too many transitive dependencies.
  19. 19. “is-promise” incident (April 2020) Symptom Update 2.2.0 impacted millions of JavaScript projects (failed builds). Update 2.2.1 failed to fix the problem. Diagnosis is-promise was directly used by 766 other npm packages. Its update 2.2.0 did not adhere to proper ES module standards, causing failed builds in its (transitive) dependents. Cure Update 2.2.2 rolled back the changes after a couple of hours. Prevention • Benefit from automatic “same-day” fixes by using dependency constraints that automatically accept patches. • Avoid having too many direct (and transitive) dependencies.
  20. 20. “left-pad” and “is-promise” revisited
  21. 21. “left-pad” and “is-promise” revisited Diagnosis Too much dependence on very small “trivial” packages. • left-pad contained only 11 lines of code • is-promise contained only 2 lines of code Cure Inline such overly simple code in your own package? Prevention Avoid depending on trivial micro-packages.
  22. 22. “event-stream” vulnerability (November 2018) https://github.com/dominictarr/event-stream/issues/115 https://medium.com/@cnorthwood/todays- javascript-trash-fire-and-pile-on-f3efcf8ac8c7
  23. 23. “event-stream” vulnerability (November 2018) Symptom An very popular npm package included code for stealing crypto-coins. Diagnosis • A developer volunteered to take over maintenance from the original package developer. • A dependency was added to flatmap-stream 0.1.0, which was then modified in 0.1.1 to include Bitcoin-siphoning malware. • All original dependents of event-stream were potentially at risk. Cure • Ecosystem took over ownership and removed vulnerable versions. • Clients: • Downgrade dependency on event-stream to older non- vulnerable version, or upgrade to newer fixed version. • Replace dependency on event-stream by another package. Prevention • Be aware of packages being taken over by new “untrusted” maintainers. https://blog.npmjs.org/post/180565383195/details-about-the-event-stream-incident
  24. 24. “request” deprecated (February 2020)
  25. 25. “request” deprecated (February 2020)
  26. 26. “request” deprecated (February 2020) Symptom In March 2019, request was announced to “go into maintenance mode and stop considering new features or major releases.” https://github.com/request/request/issues/3142 In February 2020, request is fully deprecated. Diagnosis In August 2020, still >55 thousand packages and >5.5 million GitHub repositories depend on request! Cure Replace dependencies on request by alternative packages. Prevention Provide better tool support for: • warning dependents of deprecated packages; • supporting migration strategies to alternatives.
  27. 27. “request” revisited Increasing complexity of transitive dependency graph https://npm.anvaka.com/
  28. 28. https://npm.anvaka.com/ “request” revisited Increasing complexity of transitive dependency graph
  29. 29. https://npm.anvaka.com/ “request” revisited Increasing complexity of transitive dependency graph
  30. 30. https://npm.anvaka.com/ “request” revisited Increasing complexity of transitive dependency graph
  31. 31. https://npm.anvaka.com/ “request” revisited Increasing complexity of transitive dependency graph
  32. 32. Empirical Research on dependency-related health issues in packaging ecosystems
  33. 33. Empirical research on packaging ecosystems Why do developers use trivial packages? An empirical case study on npm • R Abdalkareem et al., ESEC/FSE 2017 Do developers update their library dependencies? An empirical study on the impact of security advisories on library migration • R G Kula et al. Empirical Software Engineering Journal, 2017 On the impact of security vulnerabilities in the npm package dependency network • A Decan et al., MSR 2018 On the evolution of technical lag in the npm package dependency network • A Decan et al., ICSME 2018 An empirical study of dependency downgrades in the npm ecosystem • F R Cogo et al., IEEE Transactions on Software Engineering, 2019
  34. 34. Empirical research on packaging ecosystems What do package dependencies tell us about semantic versioning? • A Decan et al., IEEE Transactions on Software Engineering, 2019 An empirical comparison of dependency network evolution in seven software packaging ecosystems • A Decan et al., Empirical Software Engineering Journal, 2019 An empirical study of same-day releases of popular packages in the npm ecosystem • F R Cogo et al. Empirical Software Engineering Journal, 2020 (under review) A comprehensive study of bloated dependencies in the Maven ecosystem • C Soto-Valero et al. Empirical Software Engineering Journal, 2020 (under review) Deprecation of packages and releases in software ecosystems: A case study on npm • F Cogo et al., IEEE Transactions on Software Engineering, 2020 (under review)
  35. 35. Characterising the evolution of software packaging ecosystems Fast package dependency network growth in two years Packaging ecosystem #packages (2018-01) #packages (2020-01) % growth #deps (2018-01) #deps (2020-01) % growth npm 630K 1.218K 93% 19.0M 48.7M 156% RubyGems 141K 180K 28% 1.92M 2.40M 25% Packagist 121K 155K 28% 2.17M 4.73M 118% Cargo 13K 35K 169% 257K 796K 210%
  36. 36. Characterising the evolution of software packaging ecosystems 830K packages – 5.8M package versions – 20.5M dependencies (April 2017) An Empirical Comparison of Dependency Network Evolution in Seven Software Packaging Ecosystems A Decan, T. Mens, Ph. Grosjean (2019) Empirical Software Engineering 24(1)
  37. 37. Characterising the evolution of software packaging ecosystems Package dependency networks grow exponentially in terms of number of packages and/or dependencies Fastest growth for npm Slowest growth for CRAN
  38. 38. Characterising the evolution of software packaging ecosystems Continuing change • Number of package updates grows over time • >50% of package releases are updated within 2 months • Required and young packages are updated more frequently Fastest growth for npm Slowest growth for CRAN
  39. 39. Impact of transitive dependencies Incidents with left-pad, is-promise and request affected thousands of transitively dependent packages! Number of packages that are transitively required by at least 5% of all packages. Unhealthy packages may have a very high transitive impact.
  40. 40. Impact of transitive dependencies Transitive dependency depth of top-level packages Over 50% of top-level packages have a deep dependency graph Health issues may be difficult to detect due to transitive dependencies • Most dependency monitoring tools do not support transitive dependencies
  41. 41. Trivial Packages • Trivial packages implement simple and trivial tasks Cf. leftpad and “is-promise” case study • Trivial packages are prominent They make up 16.8% of 230K studied packages • Developers perceive trivial packages as well implemented and well-tested • In reality, less than half of all trivial packages have tests! Why do developers use trivial packages? An empirical case study on npm. R Abdalkareem, O. Nourry, et al. (2017) ESEC/FSE conference
  42. 42. Deprecated packages • 54% of all packages transitively depend on at least one deprecated package release. • In more than half of the cases, dependency depth is 4 or higher. Deprecation of packages and releases in software ecosystems: A case study on npm. F Cogo, G Oliva, A Hassan (2020) IEEE Transactions on Software Engineering (under review) • npminstall tool warns about deprecated packages, but does not signal where they can be found in the dependency tree • When a transitive package is deprecated, the client package has no control over the migration to a replacement release.
  43. 43. Outdated Dependencies Should package maintainers upgrade their dependencies to more recent versions? �Upgrades benefit from bug and security fixes � Upgrading allows to use new features � Upgrading requires effort � Upgrading may introduce breaking changes
  44. 44. Outdated Dependencies npm outdated checks which dependencies can be updated Can check for transitive dependencies, up to a preset maximum depth level (with --depth argument)
  45. 45. Outdated Dependencies How do dependency constraints play a role? • Implicit updates will install newer versions of dependencies without needing to change the version constraint • Explicit updates require replacing the version constraint to be able to update the dependency major minor patch 3 9 2 Most permissive Most Restrictive
  46. 46. Outdated Dependencies • 1 out of 3 packages never update their dependency • Outdatedness is related to the type of dependency constraint being used Strict constraints represent about 20% of all dependencies, but about 33% of all outdated dependencies All runtime dependencies Outdated runtime dependencies On the evolution of technical lag in the npm package dependency network. A Decan, T Mens, E Constantinou (2018) Int’l Conf. Software Maintenance and Evolution
  47. 47. Semantic Versioning https://semver.org major minor patch 3 9 2 Breaking changes Backwards compatible changes Bug fixes recommended to respect semantic versioning
  48. 48. Semantic Versioning By making dependency constraints “semver-compliant”, the proportion of outdated releases could be reduced by >17% “What if” analysis: What do package dependencies tell us of semantic versioning? A Decan, T Mens (2019) IEEE Transactions on Software Engineering
  49. 49. Semantic versioning Different packaging ecosystems interpret version constraints in different ways More restrictive than semver More permissive than semver
  50. 50. Semantic versioning To which extent do software packaging ecosystems enable/adhere to semantic versioning? All considered ecosystems become more semver-compliant over time. mostly semver-compliant More than 16% of the constraints are restrictive, preventing automatic adoption of backward compatible upgrades
  51. 51. Semantic Versioning Wisdom of the Crowds Maintainers of required packages should leverage test suites of dependent packages to discover whether a new release is likely to be backward-incompatible. Using Others' Tests to Identify Breaking Updates. S Mujahid, R Abdalkareem, et al. (2020) IEEE Int’l Conf. Mining Software Repositories What do package dependencies tell us of semantic versioning? A Decan, T Mens (2019) IEEE Transactions on Software Engineering Maintainers of dependent packages should look at how other packages depend on a required package to decide which version constraint to use.
  52. 52. Semantic Versioning Semantic versioning reduces outdatedness Ecosystem compliance to semver increases over time - Trust is important Not all provider packages respect semver  risk of unexpected breaking changes • Avoid dependency constraints that are too permissive or too restrictive • When using implicit dependencies on semver- compliant packages, write automated CI tests for the functionalities you use from those packages • Use “wisdom of the crowds”: to verify semver- compliance; to discover breaking releases
  53. 53. Technical Lag Technical lag measures how outdated a package or dependency is w.r.t. the “ideal” situation where “ideal” = “most recent”; “most secure”; “least bugs”; … A formal framework for measuring technical lag in component repositories – and its application to npm A Zerouali, T Mens, J Gonzalez-Barahona, et al. (2019) J. Software Evolution and Process On the evolution of technical lag in the npm package dependency network. A Decan, T Mens, E Constantinou (2018) IEEE Int’l Conf. Software Maintenance and Evolution
  54. 54. Technical Lag - Example 1.1.0 dependent package required package p 1.1.3 tech-lag(p) = delta(p, ideal(p)) for any required package p dependency
  55. 55. Technical Lag - Example Time-based measurement (ideal = most recent release; delta = time difference) 1.0.0 2.0.01.1.0 1.1.1 2.0.1 Time lag date(1.1.3) - date(1.1.0) 1.0.1 1.1.2 1.1.3 dependent package required package p
  56. 56. Technical Lag - Example Version-based measurement (ideal = highest release; delta = version difference) 1 major 1 patch Version lag 1 major + 1 patch required package p 1.0.0 1.1.01.0.1 dependent package 2.0.01.1.1 2.0.11.1.2 1.1.3
  57. 57. Technical Lag - Example Vulnerability-based measurement (ideal = least vulnerable release; delta = #vulnerabilities) Security lag less vulnerable and more recent required package p 1.0.0 1.1.01.0.1 2.0.01.1.1 2.0.11.1.2 1.1.3 dependent package
  58. 58. Technical Lag - Example Bug-based measurement (ideal = least known bugs; delta = #known bugs) Bug lag less bugs required package p dependent package 1.0.0 1.1.01.0.1 2.0.01.1.1 2.0.11.1.2 1.1.3 Dependency (constraint) needs to be downgraded to reduce bug lag … ~1.1.0
  59. 59. Technical Lag - Example Bug lag required package p dependent package 1.0.0 1.1.01.0.1 2.0.01.1.1 2.0.11.1.2 1.1.3 “Fixing” dependency constraint downgrades too far … =1.1.0 An empirical study of dependency downgrades in the npm ecosystem. F Roseiro Côgo, G Ansaldi Oliva, A E Hassan (2019) IEEE Transactions on Software Engineering This happens in 13% of all downgraded releases…
  60. 60. Technical Lag Quantitatively assesses the risk/benefit of package updates Along multiple dimensions (time, version, bugs, vulnerabilities, …) • Be aware of your outdatedness • Avoid depending on packages that are too outdated (perhaps deprecated or unmaintained?) • Use (and improve) tools to measure and reduce lag, e.g. through automated updates
  61. 61. Unmaintained Packages Bus Factor: The risk of a package becoming unmaintained if (some of) its core developers leave development On the abandonment and survival of open source projects: An empirical investigation. G Avelino, E Constantinou, MT Valente, A Serebrenik (2019) ESEM conference
  62. 62. Unmaintained Packages Bus Factor • Slows down development • Increases risk of bugs and vulnerabilities • That propagate (transitively) to dependent packages • Leads to unmaintained packages if no replacement is found • Increases risk of “hostile takeovers” by malevolent developers • Cf. npm event-stream vulnerability case study
  63. 63. Unmaintained Packages Challenges in the collaborative evolution of a proof language and its ecosystem. Théo Zimmerman (2019) Université de Paris Two out of three non-trivial packages on GitHub are likely to be single-maintainer packages
  64. 64. Unmaintained Packages How to predict future inactivity of package developers? GAP: Forecasting commit activity in git projects A Decan, E Constantinou, T Mens, H Rocha (2020) Journal on Systems and Software pip install git+https://github.com/AlexandreDecan/gap Based on a probalistic model of future days of activity
  65. 65. Unmaintained Packages Unmaintained packages increase risk of becoming outdated and vulnerable This risk is ecosystem-wide, due to the strongly connected transitive network of package dependencies • Avoid depending on unmaintained packages • Use (and improve) tools to detect and avoid developer abandonment • Rely on “community organizations” of volunteers willing to “steward” and maintain important abandoned packages A first look at an emerging model of community organizations for the long-term maintenance of ecosystems' packages. Théo Zimmermann (2020) ICSE Workshop on Software Health
  66. 66. How to increase health? Recommendations for package maintainers • Inform your dependents about • Incompatible upgrades (cf. semantic versioning) • Planned updates • Deprecated features or releases • Known bugs and security issues • Limit the number of direct and transitive dependencies • Remove any unused dependencies • Write automated tests for the functionality you use from your dependencies • Monitor and improve your dependencies for • outdated packages • unmaintained packages • security issues and bugs • micro-packages • deprecated packages
  67. 67. Conclusion Is my software ecosystem healthy? It probably isn’t, and it probably never will be. • Depending on unhealthy packages impact the whole ecosystem due to its dense package dependency network • Communities and package maintainers should be aware and vigilant. • Trust is important, but so is being cautious. • Monitoring tools and policies can help a long way but are no silver bullet. • More research is needed
  68. 68. Related Research More precise package dependency analysis at code level (e.g. function calls) to improve accuracy

×