SlideShare una empresa de Scribd logo
1 de 17
Descargar para leer sin conexión
Back to the future: Empirical
Revolution(s) in Software
Engineering
Audris Mockus
University of Tennessee
audris@utk.edu
MSR’22 [2022-05-19 Thu]
How to use history
“. . . my history will . . . be judged useful . . . as an
aid to the interpretation of the future, which in the
course of human things must resemble if it does
not reflect it.” The History of the Peloponnesian
War, Thucydides
I Fit model on past data and predict future
I This talk is about best features/predictors to
use
Modes of Production and Capability to
Measure
I Modes of software production
I Computers: direct/assembly
I Programming languages, tools, styles
I Development process/coordination
I Widespread collaboration
I Science also needs:
I Increased ability to measure
I Reduction of measurement costs
I Improved measurement accuracy
Examples from the past
I First languages: McCabe, Halstead
I OO programming: OO complexity/smells
I Large teams: coordination:
I What do developers do?
I Defects, development effort, delays
I CMM/Software Factories as the apotheosis of
process
I Open Source Software: Collaboration
I Increased ability to measure
I No cost: operational data in VCS/ITS/Mailing lists
I Accuracy: problematic accuracy
Present mode of production
I Massive collaboration, reuse, knowledge
transfer
I 60M contributors (43M aliased)
I 173M projects (107M deforked)
I 22 percent copied (to 14 projects on average)
I Largest communities
I Collaboration: 6M projects+contributors (75M
clique)
I Code reuse: 4M projects (46M clique), 5M
contributors (17M clique)
Conceptual models
I Software supply chains (SSCs)
I Key objectives: SSC security
I Vulnerabilities
I Upstream, orphan (copied), bad practice
I Sustainability
I Basic maintenance
I Response to user needs
Why think of supply chains
I Traditional SC
I Product, service, information flow
I SSC
I Dependencies, copying, knowledge
I The same character
I Complex Interdependencies
I Decentralized decision making
I Need to manage risks
SSC of the 1st kind
I Technical dependencies among projects with
change effort as product flow
I Primary risks: unknown vulnerabilities,
breaking changes, lack of maintenance, lack of
popularity
Examples of SSC of the first kind
I Python: import re
I Java: import java.util.Collection;
I JavaScript: package.json
SSC of the 2nd kind
I Copying of the source code from project to
project as product flow
I Primary risks: license compliance, unfixed
vulnerabilities/bugs, missing updated
functionality
Examples of SSC of the second kind
I Implementation of a complex algorithm
I Useful template
I Build configuration
SSC of the 3rd kind
I Knowledge (product) flow through code
changes as developers learn from and impart
their knowledge to the source code
I Primary risks: developers may leave, companies
may discontinue support
Examples of SSC of the third kind
I Developers gaining skills with
tools/packages/practices
I Developers spreading practices, e.g., testing
frameworks
SSCs Measurement: Key Needs
I Completeness: the entirety of OSS
I Autocuration: address data quality at scale
I Cross-referencing: to make analysis run in
minutes not months
What does SSC-based production mean?
I Sw. project is a small part of its supply chains
I e.g., activity, practices, quality, sustainability, and
popularity
I Reconstruct global dynamic properties of the
entire public source code
I Proper project/Author/Code sampling.
How to conduct NextGen SW research?
I World of Code Infrastructure[2, 1]:
I Complete, Current, Curated, Cross-referenced
I “Research Ready” for Supply Chain Research
I How to use: github.com/woc-hack/tutorial,
worldofcode.org
I May 21-22: https://github.com/woc-hack/
hackathon-pittsburgh-2022
I 170+M projects; 60+M Authors; 3.1B Commits;
12B Blobs and Trees
I Specialized database without redundancies: 300TB
Summary
I Revolutions are predicted by
I New modes of production (SSC)
I Suitable Measurement for SSCs (e.g., WoC) need:
I Completeness
I Low Cost
I High Accuracy
Bio
Audris Mockus worked at AT&T, then Lucent Bell Labs and Avaya Labs for 21 years.
Now he is the Ericsson-Harlan D. Mills Chair professor in the Department of Electrical
Engineering and Computer Science of the University of Tennessee.
He specializes in the recovery, documentation, and analysis of digital remains left as
traces of collective and individual activity. He would like to reconstruct and improve
the reality from these projections via methods that contextualize, correct, and
augment these digital traces, modeling techniques that present and affect the behavior
of teams and individuals, and statistical models and optimization techniques that help
understand the nature of individual and collective behavior. His work has improved the
understanding of how teams of software engineers interact and how to measure their
productivity.
Dr. Mockus received a B.S. and an M.S. in Applied Mathematics from Moscow
Institute of Physics and Technology in 1988. In 1991 he received an M.S. and in 1994
he received a Ph.D. in Statistics from Carnegie Mellon University.
Abstract
The desire to better understand software development lead to numerous attempts to
quantify it. Easy-to-measure artifacts, such as source code, could provide only the
most basic understanding of the entire development process and the attempts to
directly measure quality and effort were cost-prohibitive, error-prone, and rarely shared
with researchers or made public. The rise of open source not only provided a reliable
software infrastructure but also the rich data source for the software engineering
community to finally measure aspects of software development never seen before.
Over time and with increased use of open source software, actual developers
increasingly need to deal not just with their own project but with projects upstream,
downstream, or sideways in the huge open source software supply chain. Measures
derived from the entire software supply chain are now likely to bring about the next
software engineering revolution.
References
Yuxing Ma, Chris Bogart, Sadika Amreen, Russell Zaretzki, and Audris Mockus.
World of code: An infrastructure for mining the universe of open source vcs data.
In IEEE Working Conference on Mining Software Repositories, May 26 2019.
Audris Mockus.
Amassing and indexing a large sample of version control systems: towards the census of public source code
history.
In 6th IEEE Working Conference on Mining Software Repositories, May 16–17 2009.

Más contenido relacionado

Similar a AudrisMockus_MSR22.pdf

Seminar VU Amsterdam 2015
Seminar VU Amsterdam 2015Seminar VU Amsterdam 2015
Seminar VU Amsterdam 2015Philipp Leitner
 
Bill Resume Final Final 4.6.15
Bill Resume Final Final 4.6.15Bill Resume Final Final 4.6.15
Bill Resume Final Final 4.6.15Bill Chang
 
The Design of VLSI Design Methods
The Design of VLSI Design MethodsThe Design of VLSI Design Methods
The Design of VLSI Design MethodsLynn Conway
 
SBQS 2013 Keynote: Cooperative Testing and Analysis
SBQS 2013 Keynote: Cooperative Testing and AnalysisSBQS 2013 Keynote: Cooperative Testing and Analysis
SBQS 2013 Keynote: Cooperative Testing and AnalysisTao Xie
 
ICSME 2016 keynote: An ecosystemic and socio-technical view on software maint...
ICSME 2016 keynote: An ecosystemic and socio-technical view on software maint...ICSME 2016 keynote: An ecosystemic and socio-technical view on software maint...
ICSME 2016 keynote: An ecosystemic and socio-technical view on software maint...Tom Mens
 
IUI 2010: An Informal Summary of the International Conference on Intelligent ...
IUI 2010: An Informal Summary of the International Conference on Intelligent ...IUI 2010: An Informal Summary of the International Conference on Intelligent ...
IUI 2010: An Informal Summary of the International Conference on Intelligent ...J S
 
Summer school bz_fp7research_20100708
Summer school bz_fp7research_20100708Summer school bz_fp7research_20100708
Summer school bz_fp7research_20100708Sandro D'Elia
 
The Good the Bad and the Ugly of Dealing with Smelly Code (ITAKE Unconference)
The Good the Bad and the Ugly of Dealing with Smelly Code (ITAKE Unconference)The Good the Bad and the Ugly of Dealing with Smelly Code (ITAKE Unconference)
The Good the Bad and the Ugly of Dealing with Smelly Code (ITAKE Unconference)Radu Marinescu
 
Simulation Modelling Practice and Theory 47 (2014) 28–45Cont.docx
Simulation Modelling Practice and Theory 47 (2014) 28–45Cont.docxSimulation Modelling Practice and Theory 47 (2014) 28–45Cont.docx
Simulation Modelling Practice and Theory 47 (2014) 28–45Cont.docxedgar6wallace88877
 
Redes de sensores sem fio autonômicas: abordagens, aplicações e desafios
 Redes de sensores sem fio autonômicas: abordagens, aplicações e desafios Redes de sensores sem fio autonômicas: abordagens, aplicações e desafios
Redes de sensores sem fio autonômicas: abordagens, aplicações e desafiosPET Computação
 
Templates and other research methods in Telecommunications
Templates and other research methods in TelecommunicationsTemplates and other research methods in Telecommunications
Templates and other research methods in TelecommunicationsPavel Loskot
 
Agility in Software 2.0 - Notebook Interfaces and MLOps with Buttresses and R...
Agility in Software 2.0 - Notebook Interfaces and MLOps with Buttresses and R...Agility in Software 2.0 - Notebook Interfaces and MLOps with Buttresses and R...
Agility in Software 2.0 - Notebook Interfaces and MLOps with Buttresses and R...Markus Borg
 
I2126469
I2126469I2126469
I2126469aijbm
 
Understanding Continuous Design in F/OSS Projects
Understanding Continuous Design in F/OSS ProjectsUnderstanding Continuous Design in F/OSS Projects
Understanding Continuous Design in F/OSS ProjectsBetsey Merkel
 
Social Requirements Engineering and the RWTH Aachen University Test Bed
Social Requirements Engineering and the RWTH Aachen University Test BedSocial Requirements Engineering and the RWTH Aachen University Test Bed
Social Requirements Engineering and the RWTH Aachen University Test BedRalf Klamma
 

Similar a AudrisMockus_MSR22.pdf (20)

Seminar VU Amsterdam 2015
Seminar VU Amsterdam 2015Seminar VU Amsterdam 2015
Seminar VU Amsterdam 2015
 
Lopez
LopezLopez
Lopez
 
V5 i3201613
V5 i3201613V5 i3201613
V5 i3201613
 
Bill Resume Final Final 4.6.15
Bill Resume Final Final 4.6.15Bill Resume Final Final 4.6.15
Bill Resume Final Final 4.6.15
 
The Design of VLSI Design Methods
The Design of VLSI Design MethodsThe Design of VLSI Design Methods
The Design of VLSI Design Methods
 
SBQS 2013 Keynote: Cooperative Testing and Analysis
SBQS 2013 Keynote: Cooperative Testing and AnalysisSBQS 2013 Keynote: Cooperative Testing and Analysis
SBQS 2013 Keynote: Cooperative Testing and Analysis
 
ICSME 2016 keynote: An ecosystemic and socio-technical view on software maint...
ICSME 2016 keynote: An ecosystemic and socio-technical view on software maint...ICSME 2016 keynote: An ecosystemic and socio-technical view on software maint...
ICSME 2016 keynote: An ecosystemic and socio-technical view on software maint...
 
IUI 2010: An Informal Summary of the International Conference on Intelligent ...
IUI 2010: An Informal Summary of the International Conference on Intelligent ...IUI 2010: An Informal Summary of the International Conference on Intelligent ...
IUI 2010: An Informal Summary of the International Conference on Intelligent ...
 
Summer school bz_fp7research_20100708
Summer school bz_fp7research_20100708Summer school bz_fp7research_20100708
Summer school bz_fp7research_20100708
 
The Good the Bad and the Ugly of Dealing with Smelly Code (ITAKE Unconference)
The Good the Bad and the Ugly of Dealing with Smelly Code (ITAKE Unconference)The Good the Bad and the Ugly of Dealing with Smelly Code (ITAKE Unconference)
The Good the Bad and the Ugly of Dealing with Smelly Code (ITAKE Unconference)
 
Simulation Modelling Practice and Theory 47 (2014) 28–45Cont.docx
Simulation Modelling Practice and Theory 47 (2014) 28–45Cont.docxSimulation Modelling Practice and Theory 47 (2014) 28–45Cont.docx
Simulation Modelling Practice and Theory 47 (2014) 28–45Cont.docx
 
Redes de sensores sem fio autonômicas: abordagens, aplicações e desafios
 Redes de sensores sem fio autonômicas: abordagens, aplicações e desafios Redes de sensores sem fio autonômicas: abordagens, aplicações e desafios
Redes de sensores sem fio autonômicas: abordagens, aplicações e desafios
 
Templates and other research methods in Telecommunications
Templates and other research methods in TelecommunicationsTemplates and other research methods in Telecommunications
Templates and other research methods in Telecommunications
 
Agility in Software 2.0 - Notebook Interfaces and MLOps with Buttresses and R...
Agility in Software 2.0 - Notebook Interfaces and MLOps with Buttresses and R...Agility in Software 2.0 - Notebook Interfaces and MLOps with Buttresses and R...
Agility in Software 2.0 - Notebook Interfaces and MLOps with Buttresses and R...
 
I2126469
I2126469I2126469
I2126469
 
H Y P E R T X T
H Y P E R T X TH Y P E R T X T
H Y P E R T X T
 
Understanding Continuous Design in F/OSS Projects
Understanding Continuous Design in F/OSS ProjectsUnderstanding Continuous Design in F/OSS Projects
Understanding Continuous Design in F/OSS Projects
 
Social Requirements Engineering and the RWTH Aachen University Test Bed
Social Requirements Engineering and the RWTH Aachen University Test BedSocial Requirements Engineering and the RWTH Aachen University Test Bed
Social Requirements Engineering and the RWTH Aachen University Test Bed
 
CS846_report_akshat_kumar
CS846_report_akshat_kumarCS846_report_akshat_kumar
CS846_report_akshat_kumar
 
Reproducible Science and Deep Software Variability
Reproducible Science and Deep Software VariabilityReproducible Science and Deep Software Variability
Reproducible Science and Deep Software Variability
 

Último

POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxShobhayan Kirtania
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...anjaliyadav012327
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 

Último (20)

POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptx
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 

AudrisMockus_MSR22.pdf

  • 1. Back to the future: Empirical Revolution(s) in Software Engineering Audris Mockus University of Tennessee audris@utk.edu MSR’22 [2022-05-19 Thu]
  • 2. How to use history “. . . my history will . . . be judged useful . . . as an aid to the interpretation of the future, which in the course of human things must resemble if it does not reflect it.” The History of the Peloponnesian War, Thucydides I Fit model on past data and predict future I This talk is about best features/predictors to use
  • 3. Modes of Production and Capability to Measure I Modes of software production I Computers: direct/assembly I Programming languages, tools, styles I Development process/coordination I Widespread collaboration I Science also needs: I Increased ability to measure I Reduction of measurement costs I Improved measurement accuracy
  • 4. Examples from the past I First languages: McCabe, Halstead I OO programming: OO complexity/smells I Large teams: coordination: I What do developers do? I Defects, development effort, delays I CMM/Software Factories as the apotheosis of process I Open Source Software: Collaboration I Increased ability to measure I No cost: operational data in VCS/ITS/Mailing lists I Accuracy: problematic accuracy
  • 5. Present mode of production I Massive collaboration, reuse, knowledge transfer I 60M contributors (43M aliased) I 173M projects (107M deforked) I 22 percent copied (to 14 projects on average) I Largest communities I Collaboration: 6M projects+contributors (75M clique) I Code reuse: 4M projects (46M clique), 5M contributors (17M clique)
  • 6. Conceptual models I Software supply chains (SSCs) I Key objectives: SSC security I Vulnerabilities I Upstream, orphan (copied), bad practice I Sustainability I Basic maintenance I Response to user needs
  • 7. Why think of supply chains I Traditional SC I Product, service, information flow I SSC I Dependencies, copying, knowledge I The same character I Complex Interdependencies I Decentralized decision making I Need to manage risks
  • 8. SSC of the 1st kind I Technical dependencies among projects with change effort as product flow I Primary risks: unknown vulnerabilities, breaking changes, lack of maintenance, lack of popularity Examples of SSC of the first kind I Python: import re I Java: import java.util.Collection; I JavaScript: package.json
  • 9. SSC of the 2nd kind I Copying of the source code from project to project as product flow I Primary risks: license compliance, unfixed vulnerabilities/bugs, missing updated functionality Examples of SSC of the second kind I Implementation of a complex algorithm I Useful template I Build configuration
  • 10. SSC of the 3rd kind I Knowledge (product) flow through code changes as developers learn from and impart their knowledge to the source code I Primary risks: developers may leave, companies may discontinue support Examples of SSC of the third kind I Developers gaining skills with tools/packages/practices I Developers spreading practices, e.g., testing frameworks
  • 11. SSCs Measurement: Key Needs I Completeness: the entirety of OSS I Autocuration: address data quality at scale I Cross-referencing: to make analysis run in minutes not months
  • 12. What does SSC-based production mean? I Sw. project is a small part of its supply chains I e.g., activity, practices, quality, sustainability, and popularity I Reconstruct global dynamic properties of the entire public source code I Proper project/Author/Code sampling.
  • 13. How to conduct NextGen SW research? I World of Code Infrastructure[2, 1]: I Complete, Current, Curated, Cross-referenced I “Research Ready” for Supply Chain Research I How to use: github.com/woc-hack/tutorial, worldofcode.org I May 21-22: https://github.com/woc-hack/ hackathon-pittsburgh-2022 I 170+M projects; 60+M Authors; 3.1B Commits; 12B Blobs and Trees I Specialized database without redundancies: 300TB
  • 14. Summary I Revolutions are predicted by I New modes of production (SSC) I Suitable Measurement for SSCs (e.g., WoC) need: I Completeness I Low Cost I High Accuracy
  • 15. Bio Audris Mockus worked at AT&T, then Lucent Bell Labs and Avaya Labs for 21 years. Now he is the Ericsson-Harlan D. Mills Chair professor in the Department of Electrical Engineering and Computer Science of the University of Tennessee. He specializes in the recovery, documentation, and analysis of digital remains left as traces of collective and individual activity. He would like to reconstruct and improve the reality from these projections via methods that contextualize, correct, and augment these digital traces, modeling techniques that present and affect the behavior of teams and individuals, and statistical models and optimization techniques that help understand the nature of individual and collective behavior. His work has improved the understanding of how teams of software engineers interact and how to measure their productivity. Dr. Mockus received a B.S. and an M.S. in Applied Mathematics from Moscow Institute of Physics and Technology in 1988. In 1991 he received an M.S. and in 1994 he received a Ph.D. in Statistics from Carnegie Mellon University.
  • 16. Abstract The desire to better understand software development lead to numerous attempts to quantify it. Easy-to-measure artifacts, such as source code, could provide only the most basic understanding of the entire development process and the attempts to directly measure quality and effort were cost-prohibitive, error-prone, and rarely shared with researchers or made public. The rise of open source not only provided a reliable software infrastructure but also the rich data source for the software engineering community to finally measure aspects of software development never seen before. Over time and with increased use of open source software, actual developers increasingly need to deal not just with their own project but with projects upstream, downstream, or sideways in the huge open source software supply chain. Measures derived from the entire software supply chain are now likely to bring about the next software engineering revolution.
  • 17. References Yuxing Ma, Chris Bogart, Sadika Amreen, Russell Zaretzki, and Audris Mockus. World of code: An infrastructure for mining the universe of open source vcs data. In IEEE Working Conference on Mining Software Repositories, May 26 2019. Audris Mockus. Amassing and indexing a large sample of version control systems: towards the census of public source code history. In 6th IEEE Working Conference on Mining Software Repositories, May 16–17 2009.