SlideShare a Scribd company logo
1 of 23
Download to read offline
Managing Software
Dependencies and the Supply
Chain
Wrangling Software Engineering Projects
MIT EM.S20
Andrew Lamb
April 6, 2022
Goal
Give both a commercial and an open-source perspective on the benefits, costs,
and risks of taking on dependencies.
About me
MIT Course VI-2 ‘02, MEng ‘03
17 years professional development 🤔
15 commercial enterprise software (startups at various stages)
● Oracle, DataPower/IBM, Vertica/HP, Nutonian, DataRobot
Last 2 years in open source commercial software development
● InfluxData, contributor to influxdb_iox
● Maintainer of arrow-rs, arrow-datafusion, and sqlparser-rs projects
● PMC member of Apache Arrow
Software “Supply Chain” ?
Code
Contributors
Project
Management
(e.g PRs)
User (😊)
AWS
Marketplace
Apple Pay
CI / CD
system
Software
Distribution
E.g.
Dockerhub,
App Store
Software Supply Chain Complexity
2005: Andrew’s First Startup (DataPower)
● C/C++, < 5 dependences (OpenSSL)
● Single binary, distributed to customers, on CD or via FTP
2022: Andrew’s Current Startup (InfluxDB)
● IOx has …. 606 dependencies
(rust alone)
Distributed as a
docker image on
GCR
Dependencies?
● Software Engineering 101 (6.001 / 6.037)
● “Don’t Reinvent the Wheel”: Use a pre-existing library of code
● The number and quality of pre-existing libraries grown massively
● Example:
○ 2004: DataPower had a custom written HTTP/S implementation, url parser,
and more!
○ 2022: Most languages have a library to do it (requests for python, node,
reqwest in Rust, etc)
(Dramatically) Lowers Cost of Building Software
● Low Barrier to Entry: Someone else designed the API, implemented
and (hopefully) tested it
○ E.g. can get a cross platform, secure webserver up and running almost instantly,
● Maintenance: You benefit from bugs fixed by others
● Debuggability: Source code is available, you can often even step
through it
Managing Dependencies: Licensing
● Software Patent licensing is still a (huge) thing
○ IBM makes $1Bn a year on software licensing
● You need to ensure you have the legal right to use the software.
● Good news: Most organizations have figured out licensing, have
known good “approved” set of licenses.
○ As long as you stick to known good ones
● Example “Auto Approve” (permissive): MIT, BSD, Apache 2
● Example “Special Dispensation”: MongoDB server side license
● Example “Do not use”: GPL / LGPL
Managing Dependencies: Quality
Quality of many Open Source dependencies is outstanding
● Crowdsourcing means more investment into bug reporting and fixing
● In theory you can look at the code to assess the quality
● You have many options to choose from
Managing Dependencies: Quality
● Amount of time spent on reviewing / assessing open source is minimal (both
commercially and in open source) – think reviewing 606 packages
● No one to cry to: Maintainers have
limited time to respond to your issue
● Open source maintainers typically
stretched (very) thin
● Parable: “broke my old version, sorry”:
dtolnay/quote/#204
Managing Dependencies: Security
● Somewhat terrifying to read “Backstabber's toolkit” paper
● Open source maintainers do not have loads of time
○ Open source is fundamentally based on trust but verify (in the maintainers + community)
○ Possible to abuse that trust and insert malicious code
● Surface Area: dependencies of dependencies
Managing Dependencies: Build times / package bloat
● Dependencies add build time to compiled languages (C/C++, Rust)
● Add significant bloat to binary / distribution size (MBs!)
○ Parable: Dependency (python) stack in one startup was > 1.5GB package.
● “DLL Hell”: Version matching dependencies (of dependencies)
Managing Dependencies: Keeping up to date
● Dependencies get upgraded with unpredictable regularity
● Things like security fixes you want/need, also features you probably don’t
Challenges
● Open source projects invest relatively less time on maintaining past releases.
○ p.s. Microsoft Windows: programs written 20+ years ago still run fine
● ⇒ bump dependencies a lot (daily)
● “Semantic versioning” - helps auto update dependencies 🤗
○ Sometimes do release incompatibilities and break builds 😖
○ Can get different binaries depending on *when* you run your build 😱
○ “Backstabbers Toolkit” 😓
Managing Dependencies: Packaging
Packaging: Gathering your code and dependencies into an executable “package”
that user can run on their system
As number dependencies grow, so does challenges in packaging / DLL Hell
● Language Runtime
● Your direct dependencies (e.g. http library)
● Indirect dependencies (e.g url parser)
● System dependencies (libssl, libqt, etc)
How to Manage
Think Twice about Adding New Dependencies
“A little copying is better than a little
dependency.”
- Rob Pike via https://go-proverbs.github.io/
E.g. One data structure from a library of data structures
Anti-example: http clients / crypto library
Best Practice: CI/CD (test, test, and test some more)
CI: Run
Tests
on change
branch
Build
“Artifacts”
CD: release
/ deploy
Source
Code
(in git)
CI: Run Tests
(on main
branch)
Propose
change via Pull
Request
approve +
merge to
main
branch
CI == Continuous Integration
CD == Continuous Deployment
Likely more
tests here
Likely more
tests here
Best Practice: Package Manager
❏ Use package manager built into your ecosystem:
❏ Java; maven
❏ Python: Pip
❏ Nodejs: NPM
❏ Ruby: Ruby Gems
❏ Rust: cargo
❏ …
❏ C/C++ CMake (not quite a package manager, but closer than Makefiles)
❏ Use “freeze” “shrinkwrap” or “version lock” feature to control updates
❏ Ensure you use widely used packages (wisdom of crowds)
Managing Dependencies: Best Practices
❏ Invest heavily in automated testing
❏ Especially end to end tests, and key features that rely on behavior of dependencies
❏ Invest in keeping dependencies up to date
❏ Update direct dependencies (tools like Dependabot can help)
❏ Help debug and fix your dependent libraries
❏ Submit patches back upstream
❏ May need to fork / apply a fix while you wait for maintainer to release new version
Managing Dependencies: Packaging
Technology to the rescue (enabler)
● Static Linking
● yum + .rpm ; apt + .deb
● FX; Electron (for Java; nodejs / desktop apps)
● Containerization (docker, et al)
● VMs (“Virtual Appliances”)
Thank you
Questions?
Readings (tentative):
https://ieeexplore-ieee-org.libproxy.mit.edu/stamp/stamp.jsp?tp=&arnumber=242525 – software maturity
https://www.oreilly.com/library/view/understanding-open-source/0596005814/ch06.html – reasonably thorough overview of software licensing
https://arxiv.org/pdf/2005.09535.pdf – supply-chain attacks
https://blog.npmjs.org/post/141577284765/kik-left-pad-and-npm.html – specific example of how easy/common broad supply-chain breaks are today
[optional] https://blogs.sap.com/2020/06/26/attacks-on-open-source-supply-chains-how-hackers-poison-the-well/
[optional] https://www.gnu.org/licenses/license-compatibility.en.html
[optional] https://www.tandfonline.com/doi/pdf/10.1080/14783360500235819?needAccess=true – software maturity

More Related Content

What's hot

MySQL High Availability Solutions
MySQL High Availability SolutionsMySQL High Availability Solutions
MySQL High Availability SolutionsMydbops
 
Enterprise Security API (ESAPI) Java - Java User Group San Antonio
Enterprise Security API (ESAPI) Java - Java User Group San AntonioEnterprise Security API (ESAPI) Java - Java User Group San Antonio
Enterprise Security API (ESAPI) Java - Java User Group San AntonioDenim Group
 
Kafka At Scale in the Cloud
Kafka At Scale in the CloudKafka At Scale in the Cloud
Kafka At Scale in the Cloudconfluent
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!Guido Schmutz
 
Tuning kafka pipelines
Tuning kafka pipelinesTuning kafka pipelines
Tuning kafka pipelinesSumant Tambe
 
How to Design Resilient Odoo Crons
How to Design Resilient Odoo CronsHow to Design Resilient Odoo Crons
How to Design Resilient Odoo CronsOdoo
 
Approaching (almost) Any Machine Learning Problem (kaggledays dubai)
Approaching (almost) Any Machine Learning Problem (kaggledays dubai)Approaching (almost) Any Machine Learning Problem (kaggledays dubai)
Approaching (almost) Any Machine Learning Problem (kaggledays dubai)Abhishek Thakur
 
Design patterns through refactoring
Design patterns through refactoringDesign patterns through refactoring
Design patterns through refactoringGanesh Samarthyam
 
Java Foundations: Basic Syntax, Conditions, Loops
Java Foundations: Basic Syntax, Conditions, LoopsJava Foundations: Basic Syntax, Conditions, Loops
Java Foundations: Basic Syntax, Conditions, LoopsSvetlin Nakov
 
Ceph Month 2021: RADOS Update
Ceph Month 2021: RADOS UpdateCeph Month 2021: RADOS Update
Ceph Month 2021: RADOS UpdateCeph Community
 
The InfluxDB 2.0 Storage Engine | Jacob Marble | InfluxData
The InfluxDB 2.0 Storage Engine | Jacob Marble | InfluxDataThe InfluxDB 2.0 Storage Engine | Jacob Marble | InfluxData
The InfluxDB 2.0 Storage Engine | Jacob Marble | InfluxDataInfluxData
 
Logical Replication in PostgreSQL
Logical Replication in PostgreSQLLogical Replication in PostgreSQL
Logical Replication in PostgreSQLEDB
 
Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)DataWorks Summit
 
Data Structure in C (Lab Programs)
Data Structure in C (Lab Programs)Data Structure in C (Lab Programs)
Data Structure in C (Lab Programs)Saket Pathak
 
Aerospike: Key Value Data Access
Aerospike: Key Value Data AccessAerospike: Key Value Data Access
Aerospike: Key Value Data AccessAerospike, Inc.
 

What's hot (20)

MySQL High Availability Solutions
MySQL High Availability SolutionsMySQL High Availability Solutions
MySQL High Availability Solutions
 
Rxjs ngvikings
Rxjs ngvikingsRxjs ngvikings
Rxjs ngvikings
 
Theory of computation Lec2
Theory of computation Lec2Theory of computation Lec2
Theory of computation Lec2
 
Enterprise Security API (ESAPI) Java - Java User Group San Antonio
Enterprise Security API (ESAPI) Java - Java User Group San AntonioEnterprise Security API (ESAPI) Java - Java User Group San Antonio
Enterprise Security API (ESAPI) Java - Java User Group San Antonio
 
Kafka At Scale in the Cloud
Kafka At Scale in the CloudKafka At Scale in the Cloud
Kafka At Scale in the Cloud
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!
 
WebAssembly Overview
WebAssembly OverviewWebAssembly Overview
WebAssembly Overview
 
Tuning kafka pipelines
Tuning kafka pipelinesTuning kafka pipelines
Tuning kafka pipelines
 
How to Design Resilient Odoo Crons
How to Design Resilient Odoo CronsHow to Design Resilient Odoo Crons
How to Design Resilient Odoo Crons
 
Approaching (almost) Any Machine Learning Problem (kaggledays dubai)
Approaching (almost) Any Machine Learning Problem (kaggledays dubai)Approaching (almost) Any Machine Learning Problem (kaggledays dubai)
Approaching (almost) Any Machine Learning Problem (kaggledays dubai)
 
Design patterns through refactoring
Design patterns through refactoringDesign patterns through refactoring
Design patterns through refactoring
 
광고 CTR 예측
광고 CTR 예측광고 CTR 예측
광고 CTR 예측
 
Java Foundations: Basic Syntax, Conditions, Loops
Java Foundations: Basic Syntax, Conditions, LoopsJava Foundations: Basic Syntax, Conditions, Loops
Java Foundations: Basic Syntax, Conditions, Loops
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Ceph Month 2021: RADOS Update
Ceph Month 2021: RADOS UpdateCeph Month 2021: RADOS Update
Ceph Month 2021: RADOS Update
 
The InfluxDB 2.0 Storage Engine | Jacob Marble | InfluxData
The InfluxDB 2.0 Storage Engine | Jacob Marble | InfluxDataThe InfluxDB 2.0 Storage Engine | Jacob Marble | InfluxData
The InfluxDB 2.0 Storage Engine | Jacob Marble | InfluxData
 
Logical Replication in PostgreSQL
Logical Replication in PostgreSQLLogical Replication in PostgreSQL
Logical Replication in PostgreSQL
 
Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)
 
Data Structure in C (Lab Programs)
Data Structure in C (Lab Programs)Data Structure in C (Lab Programs)
Data Structure in C (Lab Programs)
 
Aerospike: Key Value Data Access
Aerospike: Key Value Data AccessAerospike: Key Value Data Access
Aerospike: Key Value Data Access
 

Similar to Managing Software Dependencies and Supply Chain Risks

Enterprise-Grade DevOps Solutions for a Start Up Budget
Enterprise-Grade DevOps Solutions for a Start Up BudgetEnterprise-Grade DevOps Solutions for a Start Up Budget
Enterprise-Grade DevOps Solutions for a Start Up BudgetDevOps.com
 
(DVO311) Containers, Red Hat & AWS For Extreme IT Agility
(DVO311) Containers, Red Hat & AWS For Extreme IT Agility(DVO311) Containers, Red Hat & AWS For Extreme IT Agility
(DVO311) Containers, Red Hat & AWS For Extreme IT AgilityAmazon Web Services
 
Kubernetes, Toolbox to fail or succeed for beginners - Demi Ben-Ari, VP R&D @...
Kubernetes, Toolbox to fail or succeed for beginners - Demi Ben-Ari, VP R&D @...Kubernetes, Toolbox to fail or succeed for beginners - Demi Ben-Ari, VP R&D @...
Kubernetes, Toolbox to fail or succeed for beginners - Demi Ben-Ari, VP R&D @...Demi Ben-Ari
 
Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...
Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...
Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...sparkfabrik
 
Backstage at CNCF Madison.pptx
Backstage at CNCF Madison.pptxBackstage at CNCF Madison.pptx
Backstage at CNCF Madison.pptxBrandenTimm1
 
Achieving Full Stack DevOps at Colonial Life
Achieving Full Stack DevOps at Colonial Life Achieving Full Stack DevOps at Colonial Life
Achieving Full Stack DevOps at Colonial Life DevOps.com
 
Choisir le bon business model et la bonne licence pour la survie de son proje...
Choisir le bon business model et la bonne licence pour la survie de son proje...Choisir le bon business model et la bonne licence pour la survie de son proje...
Choisir le bon business model et la bonne licence pour la survie de son proje...Open Source Experience
 
The "Holy Grail" of Dev/Ops
The "Holy Grail" of Dev/OpsThe "Holy Grail" of Dev/Ops
The "Holy Grail" of Dev/OpsErik Osterman
 
Leverage the power of Open Source in your company
Leverage the power of Open Source in your company Leverage the power of Open Source in your company
Leverage the power of Open Source in your company Guillaume POTIER
 
What is the Secure Supply Chain and the Current State of the PHP Ecosystem
What is the Secure Supply Chain and the Current State of the PHP EcosystemWhat is the Secure Supply Chain and the Current State of the PHP Ecosystem
What is the Secure Supply Chain and the Current State of the PHP Ecosystemsparkfabrik
 
[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...
[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...
[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...Srijan Technologies
 
Introduction to Go
Introduction to GoIntroduction to Go
Introduction to GoSimon Hewitt
 
Analysis of-quality-of-pkgs-in-packagist-univ-20171024
Analysis of-quality-of-pkgs-in-packagist-univ-20171024Analysis of-quality-of-pkgs-in-packagist-univ-20171024
Analysis of-quality-of-pkgs-in-packagist-univ-20171024Clark Everetts
 
Selecting an Open Source License and Business Model for Your Project to Have ...
Selecting an Open Source License and Business Model for Your Project to Have ...Selecting an Open Source License and Business Model for Your Project to Have ...
Selecting an Open Source License and Business Model for Your Project to Have ...All Things Open
 
Source Control with Domino Designer 8.5.3 and Git (DanNotes, November 28, 2012)
Source Control with Domino Designer 8.5.3 and Git (DanNotes, November 28, 2012)Source Control with Domino Designer 8.5.3 and Git (DanNotes, November 28, 2012)
Source Control with Domino Designer 8.5.3 and Git (DanNotes, November 28, 2012)Per Henrik Lausten
 
Continuous Delivery: Fly the Friendly CI in Pivotal Cloud Foundry with Concourse
Continuous Delivery: Fly the Friendly CI in Pivotal Cloud Foundry with ConcourseContinuous Delivery: Fly the Friendly CI in Pivotal Cloud Foundry with Concourse
Continuous Delivery: Fly the Friendly CI in Pivotal Cloud Foundry with ConcourseVMware Tanzu
 
Creating and Maintaining an Open Source Library
Creating and Maintaining an Open Source LibraryCreating and Maintaining an Open Source Library
Creating and Maintaining an Open Source LibraryNicholas Schweitzer
 
Aleksandr Kutsan "Managing Dependencies in C++"
Aleksandr Kutsan "Managing Dependencies in C++"Aleksandr Kutsan "Managing Dependencies in C++"
Aleksandr Kutsan "Managing Dependencies in C++"LogeekNightUkraine
 
System design for Web Application
System design for Web ApplicationSystem design for Web Application
System design for Web ApplicationMichael Choi
 

Similar to Managing Software Dependencies and Supply Chain Risks (20)

Enterprise-Grade DevOps Solutions for a Start Up Budget
Enterprise-Grade DevOps Solutions for a Start Up BudgetEnterprise-Grade DevOps Solutions for a Start Up Budget
Enterprise-Grade DevOps Solutions for a Start Up Budget
 
(DVO311) Containers, Red Hat & AWS For Extreme IT Agility
(DVO311) Containers, Red Hat & AWS For Extreme IT Agility(DVO311) Containers, Red Hat & AWS For Extreme IT Agility
(DVO311) Containers, Red Hat & AWS For Extreme IT Agility
 
Kubernetes, Toolbox to fail or succeed for beginners - Demi Ben-Ari, VP R&D @...
Kubernetes, Toolbox to fail or succeed for beginners - Demi Ben-Ari, VP R&D @...Kubernetes, Toolbox to fail or succeed for beginners - Demi Ben-Ari, VP R&D @...
Kubernetes, Toolbox to fail or succeed for beginners - Demi Ben-Ari, VP R&D @...
 
Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...
Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...
Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...
 
Backstage at CNCF Madison.pptx
Backstage at CNCF Madison.pptxBackstage at CNCF Madison.pptx
Backstage at CNCF Madison.pptx
 
Achieving Full Stack DevOps at Colonial Life
Achieving Full Stack DevOps at Colonial Life Achieving Full Stack DevOps at Colonial Life
Achieving Full Stack DevOps at Colonial Life
 
Choisir le bon business model et la bonne licence pour la survie de son proje...
Choisir le bon business model et la bonne licence pour la survie de son proje...Choisir le bon business model et la bonne licence pour la survie de son proje...
Choisir le bon business model et la bonne licence pour la survie de son proje...
 
The "Holy Grail" of Dev/Ops
The "Holy Grail" of Dev/OpsThe "Holy Grail" of Dev/Ops
The "Holy Grail" of Dev/Ops
 
Leverage the power of Open Source in your company
Leverage the power of Open Source in your company Leverage the power of Open Source in your company
Leverage the power of Open Source in your company
 
What is the Secure Supply Chain and the Current State of the PHP Ecosystem
What is the Secure Supply Chain and the Current State of the PHP EcosystemWhat is the Secure Supply Chain and the Current State of the PHP Ecosystem
What is the Secure Supply Chain and the Current State of the PHP Ecosystem
 
Case study
Case studyCase study
Case study
 
[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...
[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...
[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...
 
Introduction to Go
Introduction to GoIntroduction to Go
Introduction to Go
 
Analysis of-quality-of-pkgs-in-packagist-univ-20171024
Analysis of-quality-of-pkgs-in-packagist-univ-20171024Analysis of-quality-of-pkgs-in-packagist-univ-20171024
Analysis of-quality-of-pkgs-in-packagist-univ-20171024
 
Selecting an Open Source License and Business Model for Your Project to Have ...
Selecting an Open Source License and Business Model for Your Project to Have ...Selecting an Open Source License and Business Model for Your Project to Have ...
Selecting an Open Source License and Business Model for Your Project to Have ...
 
Source Control with Domino Designer 8.5.3 and Git (DanNotes, November 28, 2012)
Source Control with Domino Designer 8.5.3 and Git (DanNotes, November 28, 2012)Source Control with Domino Designer 8.5.3 and Git (DanNotes, November 28, 2012)
Source Control with Domino Designer 8.5.3 and Git (DanNotes, November 28, 2012)
 
Continuous Delivery: Fly the Friendly CI in Pivotal Cloud Foundry with Concourse
Continuous Delivery: Fly the Friendly CI in Pivotal Cloud Foundry with ConcourseContinuous Delivery: Fly the Friendly CI in Pivotal Cloud Foundry with Concourse
Continuous Delivery: Fly the Friendly CI in Pivotal Cloud Foundry with Concourse
 
Creating and Maintaining an Open Source Library
Creating and Maintaining an Open Source LibraryCreating and Maintaining an Open Source Library
Creating and Maintaining an Open Source Library
 
Aleksandr Kutsan "Managing Dependencies in C++"
Aleksandr Kutsan "Managing Dependencies in C++"Aleksandr Kutsan "Managing Dependencies in C++"
Aleksandr Kutsan "Managing Dependencies in C++"
 
System design for Web Application
System design for Web ApplicationSystem design for Web Application
System design for Web Application
 

Recently uploaded

Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
buds n tech IT solutions
buds n  tech IT                solutionsbuds n  tech IT                solutions
buds n tech IT solutionsmonugehlot87
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningVitsRangannavar
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 

Recently uploaded (20)

Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
buds n tech IT solutions
buds n  tech IT                solutionsbuds n  tech IT                solutions
buds n tech IT solutions
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 

Managing Software Dependencies and Supply Chain Risks

  • 1. Managing Software Dependencies and the Supply Chain Wrangling Software Engineering Projects MIT EM.S20 Andrew Lamb April 6, 2022
  • 2. Goal Give both a commercial and an open-source perspective on the benefits, costs, and risks of taking on dependencies.
  • 3. About me MIT Course VI-2 ‘02, MEng ‘03 17 years professional development 🤔 15 commercial enterprise software (startups at various stages) ● Oracle, DataPower/IBM, Vertica/HP, Nutonian, DataRobot Last 2 years in open source commercial software development ● InfluxData, contributor to influxdb_iox ● Maintainer of arrow-rs, arrow-datafusion, and sqlparser-rs projects ● PMC member of Apache Arrow
  • 4. Software “Supply Chain” ? Code Contributors Project Management (e.g PRs) User (😊) AWS Marketplace Apple Pay CI / CD system Software Distribution E.g. Dockerhub, App Store
  • 5. Software Supply Chain Complexity 2005: Andrew’s First Startup (DataPower) ● C/C++, < 5 dependences (OpenSSL) ● Single binary, distributed to customers, on CD or via FTP 2022: Andrew’s Current Startup (InfluxDB) ● IOx has …. 606 dependencies (rust alone) Distributed as a docker image on GCR
  • 6. Dependencies? ● Software Engineering 101 (6.001 / 6.037) ● “Don’t Reinvent the Wheel”: Use a pre-existing library of code ● The number and quality of pre-existing libraries grown massively ● Example: ○ 2004: DataPower had a custom written HTTP/S implementation, url parser, and more! ○ 2022: Most languages have a library to do it (requests for python, node, reqwest in Rust, etc)
  • 7. (Dramatically) Lowers Cost of Building Software ● Low Barrier to Entry: Someone else designed the API, implemented and (hopefully) tested it ○ E.g. can get a cross platform, secure webserver up and running almost instantly, ● Maintenance: You benefit from bugs fixed by others ● Debuggability: Source code is available, you can often even step through it
  • 8.
  • 9. Managing Dependencies: Licensing ● Software Patent licensing is still a (huge) thing ○ IBM makes $1Bn a year on software licensing ● You need to ensure you have the legal right to use the software. ● Good news: Most organizations have figured out licensing, have known good “approved” set of licenses. ○ As long as you stick to known good ones ● Example “Auto Approve” (permissive): MIT, BSD, Apache 2 ● Example “Special Dispensation”: MongoDB server side license ● Example “Do not use”: GPL / LGPL
  • 10. Managing Dependencies: Quality Quality of many Open Source dependencies is outstanding ● Crowdsourcing means more investment into bug reporting and fixing ● In theory you can look at the code to assess the quality ● You have many options to choose from
  • 11. Managing Dependencies: Quality ● Amount of time spent on reviewing / assessing open source is minimal (both commercially and in open source) – think reviewing 606 packages ● No one to cry to: Maintainers have limited time to respond to your issue ● Open source maintainers typically stretched (very) thin ● Parable: “broke my old version, sorry”: dtolnay/quote/#204
  • 12. Managing Dependencies: Security ● Somewhat terrifying to read “Backstabber's toolkit” paper ● Open source maintainers do not have loads of time ○ Open source is fundamentally based on trust but verify (in the maintainers + community) ○ Possible to abuse that trust and insert malicious code ● Surface Area: dependencies of dependencies
  • 13. Managing Dependencies: Build times / package bloat ● Dependencies add build time to compiled languages (C/C++, Rust) ● Add significant bloat to binary / distribution size (MBs!) ○ Parable: Dependency (python) stack in one startup was > 1.5GB package. ● “DLL Hell”: Version matching dependencies (of dependencies)
  • 14. Managing Dependencies: Keeping up to date ● Dependencies get upgraded with unpredictable regularity ● Things like security fixes you want/need, also features you probably don’t Challenges ● Open source projects invest relatively less time on maintaining past releases. ○ p.s. Microsoft Windows: programs written 20+ years ago still run fine ● ⇒ bump dependencies a lot (daily) ● “Semantic versioning” - helps auto update dependencies 🤗 ○ Sometimes do release incompatibilities and break builds 😖 ○ Can get different binaries depending on *when* you run your build 😱 ○ “Backstabbers Toolkit” 😓
  • 15. Managing Dependencies: Packaging Packaging: Gathering your code and dependencies into an executable “package” that user can run on their system As number dependencies grow, so does challenges in packaging / DLL Hell ● Language Runtime ● Your direct dependencies (e.g. http library) ● Indirect dependencies (e.g url parser) ● System dependencies (libssl, libqt, etc)
  • 17. Think Twice about Adding New Dependencies “A little copying is better than a little dependency.” - Rob Pike via https://go-proverbs.github.io/ E.g. One data structure from a library of data structures Anti-example: http clients / crypto library
  • 18. Best Practice: CI/CD (test, test, and test some more) CI: Run Tests on change branch Build “Artifacts” CD: release / deploy Source Code (in git) CI: Run Tests (on main branch) Propose change via Pull Request approve + merge to main branch CI == Continuous Integration CD == Continuous Deployment Likely more tests here Likely more tests here
  • 19. Best Practice: Package Manager ❏ Use package manager built into your ecosystem: ❏ Java; maven ❏ Python: Pip ❏ Nodejs: NPM ❏ Ruby: Ruby Gems ❏ Rust: cargo ❏ … ❏ C/C++ CMake (not quite a package manager, but closer than Makefiles) ❏ Use “freeze” “shrinkwrap” or “version lock” feature to control updates ❏ Ensure you use widely used packages (wisdom of crowds)
  • 20. Managing Dependencies: Best Practices ❏ Invest heavily in automated testing ❏ Especially end to end tests, and key features that rely on behavior of dependencies ❏ Invest in keeping dependencies up to date ❏ Update direct dependencies (tools like Dependabot can help) ❏ Help debug and fix your dependent libraries ❏ Submit patches back upstream ❏ May need to fork / apply a fix while you wait for maintainer to release new version
  • 21. Managing Dependencies: Packaging Technology to the rescue (enabler) ● Static Linking ● yum + .rpm ; apt + .deb ● FX; Electron (for Java; nodejs / desktop apps) ● Containerization (docker, et al) ● VMs (“Virtual Appliances”)
  • 23. Readings (tentative): https://ieeexplore-ieee-org.libproxy.mit.edu/stamp/stamp.jsp?tp=&arnumber=242525 – software maturity https://www.oreilly.com/library/view/understanding-open-source/0596005814/ch06.html – reasonably thorough overview of software licensing https://arxiv.org/pdf/2005.09535.pdf – supply-chain attacks https://blog.npmjs.org/post/141577284765/kik-left-pad-and-npm.html – specific example of how easy/common broad supply-chain breaks are today [optional] https://blogs.sap.com/2020/06/26/attacks-on-open-source-supply-chains-how-hackers-poison-the-well/ [optional] https://www.gnu.org/licenses/license-compatibility.en.html [optional] https://www.tandfonline.com/doi/pdf/10.1080/14783360500235819?needAccess=true – software maturity