This document discusses managing software dependencies and supply chains. It provides perspectives from commercial and open-source development. While dependencies can lower costs by reusing code, they also introduce risks around licensing, quality, security, build times, packaging, and keeping dependencies up to date. Best practices for managing dependencies include thinking critically about new dependencies, continuous integration and testing, using package managers, keeping dependencies up to date, investing in dependent libraries, and containerization for packaging.
2. Goal
Give both a commercial and an open-source perspective on the benefits, costs,
and risks of taking on dependencies.
3. About me
MIT Course VI-2 ‘02, MEng ‘03
17 years professional development 🤔
15 commercial enterprise software (startups at various stages)
● Oracle, DataPower/IBM, Vertica/HP, Nutonian, DataRobot
Last 2 years in open source commercial software development
● InfluxData, contributor to influxdb_iox
● Maintainer of arrow-rs, arrow-datafusion, and sqlparser-rs projects
● PMC member of Apache Arrow
4. Software “Supply Chain” ?
Code
Contributors
Project
Management
(e.g PRs)
User (😊)
AWS
Marketplace
Apple Pay
CI / CD
system
Software
Distribution
E.g.
Dockerhub,
App Store
5. Software Supply Chain Complexity
2005: Andrew’s First Startup (DataPower)
● C/C++, < 5 dependences (OpenSSL)
● Single binary, distributed to customers, on CD or via FTP
2022: Andrew’s Current Startup (InfluxDB)
● IOx has …. 606 dependencies
(rust alone)
Distributed as a
docker image on
GCR
6. Dependencies?
● Software Engineering 101 (6.001 / 6.037)
● “Don’t Reinvent the Wheel”: Use a pre-existing library of code
● The number and quality of pre-existing libraries grown massively
● Example:
○ 2004: DataPower had a custom written HTTP/S implementation, url parser,
and more!
○ 2022: Most languages have a library to do it (requests for python, node,
reqwest in Rust, etc)
7. (Dramatically) Lowers Cost of Building Software
● Low Barrier to Entry: Someone else designed the API, implemented
and (hopefully) tested it
○ E.g. can get a cross platform, secure webserver up and running almost instantly,
● Maintenance: You benefit from bugs fixed by others
● Debuggability: Source code is available, you can often even step
through it
8.
9. Managing Dependencies: Licensing
● Software Patent licensing is still a (huge) thing
○ IBM makes $1Bn a year on software licensing
● You need to ensure you have the legal right to use the software.
● Good news: Most organizations have figured out licensing, have
known good “approved” set of licenses.
○ As long as you stick to known good ones
● Example “Auto Approve” (permissive): MIT, BSD, Apache 2
● Example “Special Dispensation”: MongoDB server side license
● Example “Do not use”: GPL / LGPL
10. Managing Dependencies: Quality
Quality of many Open Source dependencies is outstanding
● Crowdsourcing means more investment into bug reporting and fixing
● In theory you can look at the code to assess the quality
● You have many options to choose from
11. Managing Dependencies: Quality
● Amount of time spent on reviewing / assessing open source is minimal (both
commercially and in open source) – think reviewing 606 packages
● No one to cry to: Maintainers have
limited time to respond to your issue
● Open source maintainers typically
stretched (very) thin
● Parable: “broke my old version, sorry”:
dtolnay/quote/#204
12. Managing Dependencies: Security
● Somewhat terrifying to read “Backstabber's toolkit” paper
● Open source maintainers do not have loads of time
○ Open source is fundamentally based on trust but verify (in the maintainers + community)
○ Possible to abuse that trust and insert malicious code
● Surface Area: dependencies of dependencies
13. Managing Dependencies: Build times / package bloat
● Dependencies add build time to compiled languages (C/C++, Rust)
● Add significant bloat to binary / distribution size (MBs!)
○ Parable: Dependency (python) stack in one startup was > 1.5GB package.
● “DLL Hell”: Version matching dependencies (of dependencies)
14. Managing Dependencies: Keeping up to date
● Dependencies get upgraded with unpredictable regularity
● Things like security fixes you want/need, also features you probably don’t
Challenges
● Open source projects invest relatively less time on maintaining past releases.
○ p.s. Microsoft Windows: programs written 20+ years ago still run fine
● ⇒ bump dependencies a lot (daily)
● “Semantic versioning” - helps auto update dependencies 🤗
○ Sometimes do release incompatibilities and break builds 😖
○ Can get different binaries depending on *when* you run your build 😱
○ “Backstabbers Toolkit” 😓
15. Managing Dependencies: Packaging
Packaging: Gathering your code and dependencies into an executable “package”
that user can run on their system
As number dependencies grow, so does challenges in packaging / DLL Hell
● Language Runtime
● Your direct dependencies (e.g. http library)
● Indirect dependencies (e.g url parser)
● System dependencies (libssl, libqt, etc)
17. Think Twice about Adding New Dependencies
“A little copying is better than a little
dependency.”
- Rob Pike via https://go-proverbs.github.io/
E.g. One data structure from a library of data structures
Anti-example: http clients / crypto library
18. Best Practice: CI/CD (test, test, and test some more)
CI: Run
Tests
on change
branch
Build
“Artifacts”
CD: release
/ deploy
Source
Code
(in git)
CI: Run Tests
(on main
branch)
Propose
change via Pull
Request
approve +
merge to
main
branch
CI == Continuous Integration
CD == Continuous Deployment
Likely more
tests here
Likely more
tests here
19. Best Practice: Package Manager
❏ Use package manager built into your ecosystem:
❏ Java; maven
❏ Python: Pip
❏ Nodejs: NPM
❏ Ruby: Ruby Gems
❏ Rust: cargo
❏ …
❏ C/C++ CMake (not quite a package manager, but closer than Makefiles)
❏ Use “freeze” “shrinkwrap” or “version lock” feature to control updates
❏ Ensure you use widely used packages (wisdom of crowds)
20. Managing Dependencies: Best Practices
❏ Invest heavily in automated testing
❏ Especially end to end tests, and key features that rely on behavior of dependencies
❏ Invest in keeping dependencies up to date
❏ Update direct dependencies (tools like Dependabot can help)
❏ Help debug and fix your dependent libraries
❏ Submit patches back upstream
❏ May need to fork / apply a fix while you wait for maintainer to release new version
21. Managing Dependencies: Packaging
Technology to the rescue (enabler)
● Static Linking
● yum + .rpm ; apt + .deb
● FX; Electron (for Java; nodejs / desktop apps)
● Containerization (docker, et al)
● VMs (“Virtual Appliances”)