This document discusses challenges and opportunities for developing sustainable software for science. It notes that software is increasingly important for science but current practices and incentives do not support long-term sustainability. The document summarizes discussions from the Working Towards Sustainable Software for Science conference, which identified key issues around developing sustainable software, best practices, policies around credit and careers, and building supportive communities. It proposes that better measuring contributions to software could help address incentives, career paths, and sustainability of software over the long term.
What's New in Teams Calling, Meetings and Devices March 2024
Sustainable Software for Science
1. Working towards Sustainable
Software for Science
(an NSF and community view)
Daniel S. Katz
dkatz@nsf.gov & d.katz@ieee.org
Program Director, Division of
Advanced Cyberinfrastructure
ESIP Summer Meeting 2014
Copper Mountain - July 9
2. Big Science and Infrastructure
• Hurricanes affect humans
• Multi-physics: atmosphere, ocean, coast, vegetation, soil
– Sensors and data as inputs
• Humans: what have they built, where are they, what will they do
– Data and models as inputs
• Infrastructure:
– Urgent/scheduled processing, workflows
– Software applications, workflows
– Networks
– Decision-support systems,
visualization
– Data storage,
interoperability
3. Long-tail Science and Infrastructure
• Exploding data volumes &
powerful simulation methods
mean that more researchers
need advanced infrastructure
• Such “long-tail” researchers
cannot afford expensive
expertise and unique
infrastructure
• Challenge: Outsource and/or
automate time-consuming
common processes
– Tools, e.g., Globus Online
for data management
– Gateways, e.g., nanoHUB,
CIPRES, access to scientific
simulation software
NSF grant size, 2007.
(“Dark data in the long tail
of science”, B. Heidorn)
4. Science Infrastructure Challenges
• Science
– Larger teams, more disciplines, more countries
• Data
– Size, complexity, rates all increasing rapidly
– Need for interoperability (systems and policies)
• Systems
– More cores, more architectures (GPUs), more memory hierarchy
– Changing balances (latency vs bandwidth)
– Changing limits (power, funds)
– System architecture and business models changing (clouds)
– Network capacity growing; increase networks -> increased security
• Software
– Multiphysics algorithms, frameworks
– Programing models and abstractions for science, data, and hardware
– V&V, reproducibility, fault tolerance
• People
– Education and training
– Career paths
– Credit and attribution
5. Cyberinfrastructure
“Cyberinfrastructure consists of
computing systems,
data storage systems,
advanced instruments and
data repositories,
visualization environments, and
people,
all linked together by
software and
high performance networks,
to improve research productivity and
enable breakthroughs not otherwise possible.”
-- Craig Stewart
6. Cyberinfrastructure Framework for 21st Century
Science and Engineering (CIF21)
• Cross-NSF portfolio of activities to provide integrated cyber resources
that will enable new multidisciplinary research opportunities in all
science and engineering fields by leveraging ongoing investments and
using common approaches and components (http://www.nsf.gov/cif21)
• ACCI task force reports (http://www.nsf.gov/od/oci/taskforces/index.jsp)
– Campus Bridging, Cyberlearning & Workforce Development, Data
& Visualization, Grand Challenges, HPC, Software for Science &
Engineering
• Vision and Strategy Reports
– ACI - http://www.nsf.gov/publications/pub_summ.jsp?ods_key=nsf12051
– Software - http://www.nsf.gov/publications/pub_summ.jsp?ods_key=nsf12113
– Data - http://www.nsf.gov/od/oci/cif21/DataVision2012.pdf
• Implementation
– Implementation of Software Vision
http://www.nsf.gov/funding/pgm_summ.jsp?pims_id=504817
7. Software as Infrastructure
Science
Software
Computing
Infrastructure
• Software (including services) essential for
the bulk of science
- About half the papers in recent issues of
Science were software-intensive projects
- Research becoming dependent upon
advances in software
- Significant software development being
conducted across NSF: NEON, OOI,
NEES, NCN, iPlant, etc
• Wide range of software types: system, applications, modeling,
gateways, analysis, algorithms, middleware, libraries
• Software is not a one-time effort, it must be sustained
• Development, production, and maintenance are people intensive
• Software life-times are long vs hardware
• Software has under-appreciated value
For software to be sustainable,
it must become infrastructure
8. Software Vision
NSF will take a leadership role in providing
software as enabling infrastructure for
science and engineering research and
education, and in promoting software as a
principal component of its comprehensive
CIF21 vision
• ...
• Reducing the complexity of software will be a
unifying theme across the CIF21 vision,
advancing both the use and development of
new software and promoting the ubiquitous
integration of scientific software across all
disciplines, in education, and in industry
– A Vision and Strategy for Software for Science,
Engineering, and Education – NSF 12-113
9. Create and maintain a
software ecosystem
providing new
capabilities that
advance and accelerate
scientific inquiry at
unprecedented
complexity and scale
Support the
foundational
research necessary
to continue to
efficiently advance
scientific software
Enable transformative,
interdisciplinary,
collaborative, science
and engineering
research and
education through the
use of advanced
software and services
Develop a next generation diverse
workforce of scientists and
engineers equipped with essential
skills to use and develop software,
with software and services used in
both the research and education
process
Infrastructure Role & Lifecycle
10. ACI Software Cluster Programs
• ACI co-funding (research -> infrastructure)
– Exploiting Parallelism and Scalability (XPS)
• CISE (including ACI) program for foundational research
– Computational and Data-Enabled Science &
Engineering (CDS&E)
• Virtual program for science-specific proofing of
algorithms and tools
– ENG, MPS, ACI now; BIO, GEO, IIS in FY15?
• ACI lead-funding (infrastructure)
– Software Infrastructure for Sustained Innovation (SI2)
• Transform innovations in research and education into
sustained software resources that are an integral part of
the cyberinfrastructure
– Includes all NSF directorates
11. 4 rounds of funding,
35 SSIs
2 rounds of funding,
14 S2I2
conceptualizations
Software Infrastructure Projects
See http://bit.ly/sw-ci for current projects
5 rounds of funding,
65 SSEs
12. SI2 Solicitation and Decision Process
• Cross-NSF software working group with
members from all directorates
• Determined how SI2 fits with other NSF
programs that support software
– See: Implementation of NSF Software Vision -
http://www.nsf.gov/funding/pgm_summ.jsp?pims_id=5
04817
• Discusses solicitations, determines who will
participate in each
• Discusses and participates in review process
• Work together to fund worthy proposals
13. SI2 Solicitation and Decision Process
• Proposal reviews well -> my role becomes
matchmaking
– I want to find program officers with funds, and convince them
that they should spend their funds on the proposal
• Unidisciplinary project (e.g. bioinformatics app)
– Work with single program officer, either likes the proposal or
not
• Multidisciplinary project (e.g., molecular
dynamics)
– Work with multiple program officers, ...
• Omnidisciplinary project (e.g. http, math library)
– Try to work with all program officers, often am told “it’s your
responsibility”
To judge software, need to
understand/forecast impact
14. ACI Software Cluster Programs
• In these programs, ACI works with other NSF
units to support projects that lead to software
as an element of infrastructure
• Issue: amount of software that is
infrastructure grows over time, and grows
faster than NSF funding
Q: How can NSF ensure that software as
infrastructure continues to appear, without
funding all of it?
A: Incentives
• The devil is in the details
• We are exploring this now...
15. Create and maintain a
software ecosystem
providing new
capabilities that
advance and accelerate
scientific inquiry at
unprecedented
complexity and scale
Support the
foundational
research necessary
to continue to
efficiently advance
scientific software
Enable transformative,
interdisciplinary,
collaborative, science
and engineering
research and
education through the
use of advanced
software and services
Transform practice through new
policies for software, addressing
challenges of academic culture, open
dissemination and use, reproducibility
and trust, curation, sustainability,
governance, citation, stewardship, and
attribution of software authorship
Develop a next generation diverse
workforce of scientists and
engineers equipped with essential
skills to use and develop software,
with software and services used in
both the research and education
process
Infrastructure Role & Lifecycle
16. Working Towards Sustainable
Software for Science: Practice and
Experiences (WSSSPE)
• http://wssspe.researchcomputing.org.uk
• Mailing list:
http://lists.researchcomputing.org.uk/listinfo.cgi/wssspe-
researchcomputing.org.uk
• First Workshop on Sustainable Software for Science:
Practice and Experiences (WSSSPE1), @ SC13, 17
November 2013, Denver
– 2 keynotes, 54 accepted papers
– Discussion sessions: Developing software; Policy; Communities
– Cross-cutting (emergent) topics: Defining sustainability; Career paths
– Post-workshop paper: http://arxiv.org/abs/1404.7414
• Upcoming events:
– WSSSPE1.1, @ SciPy2014, tomorrow!, Austin
– WSSSPE2, @ SC14, 16 November 2014, New Orleans
17. WSSSPE1 Context
• Science is becoming more complex
• Software is becoming more important in science
• Software is becoming part of the scientific
infrastructure
– Should be preserved and reused (for
reproducibility and cost effectiveness)
• But funding, culture, practice, etc. haven’t
caught up
– What changes are needed?
• Bundled under the name “software
sustainability”
18. WSSSPE1 Motivation
• Arfon Smith (GitHub) keynote: Scientific Software and the Open
Collaborative Web
• Example from data reduction in astronomy, where he needed to
remove interfering effects from the device; work needed was
persistent, but there was no practice of sharing this, so many
researchers repeated the same calculations; ~13 person-years
were wasted
• Why don’t we do better?
– Because we are taught to focus on immediate research outcomes and not on
continuously improving and building on tools for research
• When we do know better, why we do not act any different?
– Due to incentives and their lack: only the immediate products of research, not
the software, are valued
• Open source community has excellent cultures of code reuse,
where there is effectively low-friction collaboration through the
use of repositories
– This has generally not happened in highly numerical, compiled language
scientific software
19. WSSSPE1: Developing Software
• Widespread agreement that developing and maintaining
software is hard, but best practices can help
• Difficulty: Making sustainable software means paying
attention to many facets of software design, like APIs,
security, user experience, testing ...
– A single project that requires one fulltime software engineer
may actually require fractions of different kinds of engineers
– But long-tail projects can’t even fund one FTE, let alone one
that can address all these facets
• Team science (the science of teams) is important
• Lack of career paths for developers is an issue
Many of the issues in developing
sustainable software are social, not
technical
20. WSSSPE1: Best Practices
• Communities been built around projects, with:
– Public source code hosting, mailing lists, documentation, wikis, bug
trackers, software downloads, continuous integration, software
quality dashboards, and a general web presence to tie all of these
things together
• Dominant common themes
– Open development and community support
– Tighter interactions among domain scientists and code developers,
and developers and their users (which are often domain scientists)
– Users prefer robust and simple to use codes and platforms
– Continuous integration is good
– Sustainability is challenging for many reasons, including: changing
science, changing platforms and technology, and funding
• General recommendations
– Use existing code/tools where possible
– Code well, be simple and transparent
– Use your code, nurture your community, promote, find support
– Be satisfied with less than perfection
– Keep the scientific goal in focus
21. WSSSPE1: Policy
• Credit, citation, impact
Software work is inadequately visible in ways that “count”
within the reputation system underlying science
– Recognition of work on scientific software, linked to
questions of reward and thus motivation for particular
kinds of work on scientific software
– How to fix: Software papers? DOIs for software?
Altmetrics?
• Implementing Policy
– Need strategies to implement the many facets of
sustainable software for scientific software, in a
spectrum, between 2 extremes:
• “Co-funding” - large, multi-year collaborations, with equal
emphasis on both the science and the software
development
• “Software carpentry” - the scientists themselves will write
and maintain their code, but need good tools to do this well
22. WSSSPE1: Communities
• Communities can form around science areas (developers
work together to maximize science) or technologies
(technical catalysts take developments in one area and
apply them to another)
• Companies or organizations can use well-tested practices
to help communities to form
– E.g. Hackathons, communication channels, conferences,
orientations, and documentation
• Need to educate developers to produce complex, agile,
and sustainable domain-specific software, through:
– Focused community workshops, summer schools, boot
camps; general software development training (and science
training for computer scientists)
– Widely available prototype codes
– Increased credit for software developments
– Career paths in science
23. WSSSPE1: Defining Sustainability
• Sustainability for software does not just mean more
money from the government
– Though it does need a committed effort over time
• Software Sustainability Maturity Model (OSS Watch,
similar to discussion in Katz & Proctor)
– “When choosing software for procurement or development
reuse ... you need to consider the future. While a software
product may satisfy today’s needs, will it satisfy tomorrow’s
needs?... In other words, is the software sustainable?”
• Sustainability not integrated into software engineering due
to lack of accepted definition (Venters et al.)
• What is the goal of sustainability?
– reproducible science, or persistence, or quality, or ?
• How is success in sustainability measured?
– How does a group of developers know when they have
actually achieved sustainable software?
Sustainability: the effort that happens to make the
essential things continue (@pebourne)
24. WSSSPE1: Career Paths
• People are essential elements of research infrastructure -
they need:
– Education and training to be productive
– Career paths to remain motivated
– Incentives to move along their career paths
• It's difficult to motivate researchers to create sustainable
software - why?
– Few research career paths available for supporting software
– No incentives for researchers to develop broad skill sets
outside of domains
– Substantial competition from private companies
• Is there a role (career path) for non-tenure-track
researchers who produce software, data, etc. in
universities?
– Assuming yes, do universities recognize and support this?
• If no, how to get them to?
25. WSSSPE1: Career Paths
• Educate software developers and researchers to produce
sustainable software by teaching them to collaborate more
effectively; diminish distinctions
– Caveat: academic communities aren’t taught how to
evaluate cross-disciplinary work
– Lots of skills needed, domain science, software
development, applied mathematics, computer science, etc.
– Teach general software development to researchers and
general domain knowledge to software developers
– Provide focused community workshops, summer schools,
and boot camps to both, and give them a chance to interact
and work together on problems
• Recognize and reward software development and the
creation of other digital products
• Experiment, e.g. Moore & Sloan data science centers
26. Moving Forward
• WSSSPE1: Multiple linked social issues – Sustainability,
Incentives, Career Paths, Communities
• Computational scientists “have a responsibility to convince their
institutions, reviewers, and communities that software is
scholarship, frequently more valuable than a research article”
(Bourne)
• Hypothesis: better measurement of contributions can lead to
rewards (incentives), leading to career paths, willingness to join
communities, leading to more sustainable software
• Recent CISE/ACI & SBE/SES Dear Colleague Letter: Supporting
Scientific Discovery through Norms and Practices for Software
and Data Citation and Attribution (NSF 14-059,
http://www.nsf.gov/pubs/2014/nsf14059/nsf14059.jsp)
– There is a lack of well-developed metrics with which to assess the
impact and quality of scientific software and data
– NSF seeks to explore new norms and practices for software and data
citation and attribution, so that data producers, software and tool
developers, and data curators are credited”
• 6 EAGERs and 3 collaborative workshops to be funded
• Other ideas welcome – in WSSSPE2? Or discussion? Or email?
27. Resources
• These slides, on: http://slideshare.net/danielskatz
• NSF Software as Infrastructure Vision:
http://www.nsf.gov/publications/pub_summ.jsp?ods_key=nsf12113
• Implementation of Software Vision:
http://www.nsf.gov/funding/pgm_summ.jsp?pims_id=504817
• Software Infrastructure for Sustained Innovation (SI2) Program
– Scientific Software Elements (SSE) & Scientific Software Integration (SSI) solicitation:
http://www.nsf.gov/publications/pub_summ.jsp?ods_key=nsf14520
– 2013 PI meeting: https://sites.google.com/site/si2pimeeting/
– 2014 PI meeting: https://sites.google.com/site/si2pimeeting2014/
– Awards: http://bit.ly/sw-ci
• Working towards Sustainable Software for Science: Practice and Experiences
(WSSSPE)
– Home: http://wssspe.researchcomputing.org.uk (includes links to all slides & papers)
– 1st workshop paper: http://arxiv.org/abs/1404.7414
– 2nd workshop site: http://wssspe.researchcomputing.org.uk/wssspe2/
• NSF 14-059: “Dear Colleague Letter - Supporting Scientific Discovery through
Norms and Practices for Software and Data Citation and Attribution”
– http://www.nsf.gov/pubs/2014/nsf14059/nsf14059.jsp
28. Credits:
• SI2 Program:
– Current program officers: Daniel S. Katz, Rudolf Eigenmann,
Sumanta Acharya, William Y. B. Chang, John C. Cherniavsky,
Almadena Y. Chtchelkanova, Cheryl L. Eavey, Evelyn Goldfield, Sol
Greenspan, Daryl W. Hess, Peter H. McCartney, Bogdan Mihaila,
Dimitrios V. Papavassiliou, Andrew D. Pollington, Barbara Ransom,
Thomas Russell, Nigel A. Sharp, Thomas Siegmund, Paul Werbos,
Eva Zanzerkia
– Formerly-involved program officers: Manish Parashar, Gabrielle
Allen, Eduardo Misawa, Jean Cottam-Allen
• WSSSPE:
– Organizers: Daniel S. Katz, Gabrielle Allen, Neil Chue Hong, Karen
Cranston, Manish Parashar, David Proctor, Matthew Turk, Colin C.
Venters, Nancy Wilkins-Diehr
– Summary paper authors: Daniel S. Katz, Sou-Cheng T. Choi, Hilmar
Lapp, Ketan Maheshwari, Frank Löffler, Matthew Turk, Marcus D.
Hanwell, Nancy Wilkins-Diehr, James Hetherington, James
Howison, Shel Swenson, Gabrielle D. Allen, Anne C. Elster, Bruce
Berriman, Colin Venters
– Keynote speakers: Phil Bourne, Arfon Smith