SlideShare a Scribd company logo
1 of 26
Download to read offline
Software as Infrastructure
      at NSF/OCI

             Daniel S. Katz
          Program Director,
  Office of Cyberinfrastructure (OCI)
Software as Infrastructure
      at NSF/OCI

             Daniel S. Katz
          Program Director,
  Office of Cyberinfrastructure (OCI)
         Division of Advanced
       Cyberinfrastructure (ACI)
Big Science and Infrastructure
                    •  Higgs* boson discovery announced at CERN July 4, 2012
                    •  Instrument: Large Hadron Collider (LHC)
                    •  Infrastructure
                           –  Computing Hardware: Worldwide LHC Computing Grid (WLCG):
                              235,000 cores across 36 countries, including OpenScience Grid
                              (OSG, US), European Grid Infrastructure (EGI, Europe), ...
                           –  Data: ~20 PB of data created in 2011-2012
                           –  Software: grid middleware, physics analysis applications, ...
                           –  Networks
                           –  Education &
                              Training
                    •  Data generated
                       centrally, moved
                       (~3 PB/week)
                       across multi-tiered
                       infrastructure to be
                       computing upon

See: http://www.isgtw.org/feature/how-grid-computing-helped-cern-hunt-higgs
Big Science and Infrastructure
•  Hurricanes affect humans
•  Multi-physics: atmosphere, ocean, coast, vegetation, soil
    –  Sensors and data as inputs
•  Humans: what have they built, where are they, what will they do
    –  Data and models as inputs
•  Infrastructure:
    –  Urgent/scheduled processing, workflows
    –  Software applications, workflows
    –  Networks
    –  Decision-support systems,
       visualization
    –  Data storage,
       interoperability
Long-tail Science and Infrastructure
•  Exploding data volumes &
   powerful simulation methods            NSF grant size, 2007.
                                          (“Dark data in the long tail
   mean that more researchers
                                          of science”, B. Heidorn)
   need advanced infrastructure
•  Such “long-tail” researchers
   cannot afford expensive
   expertise and unique
   infrastructure
•  Challenge: Outsource and/or
   automate time-consuming
   common processes
   –  Tools, e.g., Globus Online
      and data management
       •  Note: much LHC data is moved
          by Globus GridFTP, e.g., May/
          June 2012, >20 PB, >20M files
   –  Gateways, e.g., nanoHUB,
      CIPRES, access to scientific
      simulation software
Long-tail Science and Infrastructure
                    •  CIPRES Science Gateway for Phylogenetics
                            –  Study of diversification of life and relationships among
                               living things through time
                    •  Highly used
                            –    Cited in at least 400 publications, e.g., Nature, PNAS, Cell
                            –    More than 5000 unique users in 3 years
                            –    Used routinely in at least 68 undergraduate classes
                            –    45% US (including most states), 55% 70 other countries
                    •  Infrastructure
                            –  Flexible web application
                                  •  A science gateway, uses software and lessons from XSEDE
                                     gateways team, e.g., identify management, HPC job control
                            –  Science software: tree inference and sequence alignment
                                  •  Parallel versions of MrBayes, RAxML, GARLI, BEAST, MAFFT
                                  •  PAUP*, Poy, ClustalW, Contralign, FSA, MUSCLE, ...
                            –  Data
                                  •  Personal user space for storing
                                     results
                                  •  Tools to transfer and view data

Credit: Mark Miller, SDSC
Infrastructure Challenges
•  Science
   –  Larger teams, more disciplines, more countries
•  Data
   –  Size, complexity, rates all increasing rapidly
   –  Need for interoperability (systems and policies)
•  Systems
   –    More cores, more architectures (GPUs), more memory hierarchy
   –    Changing balances (latency vs bandwidth)
   –    Changing limits (power, funds)
   –    System architecture and business models changing (clouds)
   –    Network capacity growing; increase networks -> increased security
•  Software
   –  Multiphysics algorithms, frameworks
   –  Programing models and abstractions for science, data, and hardware
   –  V&V, reproducibility, fault tolerance
•  People
   –  Education and training
   –  Career paths
   –  Credit and attribution
Cyberinfrastructure (e-Research)
•  “Cyberinfrastructure consists of computing systems,
   data storage systems, advanced instruments and
   data repositories, visualization environments, and
   people, all linked together by software and high
   performance networks to improve research
   productivity and enable breakthroughs not otherwise
   possible.”
                                    -- Craig Stewart

•  Infrastructure elements:
    –  parts of an infrastructure,
    –  developed by individuals and groups,
    –  international,
    –  developed for a purpose,
    –  used by a community
Software is Infrastructure
                                                                Science	
  
•  Software essential for the bulk of science
    -  About half the papers in recent issues of
       Science were software-intensive projects
    -  Research becoming dependent upon                     So(ware	
  	
  
       advances in software
    -  Significant software development being
       conducted across NSF: NEON, OOI,                    Compu0ng	
  
       NEES, NCN, iPlant, etc                            Infrastructure	
  
•  Wide range of software types: system,           Scientific
   applications, modeling, gateways,               Discovery         Technological
                                                                        Innovation
   analysis, algorithms, middleware, libraries
•  Development, production and
   maintenance are people intensive
•  Software life-times are long compared to
   hardware
•  Under-appreciated value                               Software



                           Software                        Education
Cyberinfrastructure Framework for 21st Century
Science and Engineering (CIF21)
•    Cross-NSF portfolio of activities to provide integrated cyber resources
     that will enable new multidisciplinary research opportunities in all
     science and engineering fields by leveraging ongoing investments and
     using common approaches and components (http://www.nsf.gov/cif21)

•    ACCI task force reports (http://www.nsf.gov/od/oci/taskforces/index.jsp)
      –  Campus Bridging, Cyberlearning & Workforce Development, Data
         & Visualization, Grand Challenges, HPC, Software for Science &
         Engineering
      –  Included recommendation for NSF-wide CDS&E program
•    Vision and Strategy Reports
      –  ACI - http://www.nsf.gov/publications/pub_summ.jsp?ods_key=nsf12051
      –  Software - http://www.nsf.gov/publications/pub_summ.jsp?ods_key=nsf12113
      –  Data - http://www.nsf.gov/od/oci/cif21/DataVision2012.pdf
•    Implementation
      –  Implementation of Software Vision
         http://www.nsf.gov/funding/pgm_summ.jsp?pims_id=504817
Software Vision
      NSF will take a leadership role in providing
      software as enabling infrastructure for
      science and engineering research and
      education, and in promoting software as a
      principal component of its comprehensive
      CIF21 vision
   •  ...
   •  Reducing the complexity of software will be a
      unifying theme across the CIF21 vision,
      advancing both the use and development of
      new software and promoting the ubiquitous
      integration of scientific software across all
      disciplines, in education, and in industry
          –  A Vision and Strategy for Software for Science,
             Engineering, and Education – NSF 12-113
Infrastructure Role & Lifecycle
                              Create and maintain a
                              software ecosystem
                              providing new
                              capabilities that             Enable transformative,
                              advance and accelerate        interdisciplinary,
 Support the                  scientific inquiry at         collaborative, science
 foundational                 unprecedented                 and engineering
 research necessary           complexity and scale          research and
 to continue to                                             education through the
 efficiently advance                                        use of advanced
 scientific software                                        software and services



Transform practice through new                 Develop a next generation diverse
policies for software addressing               workforce of scientists and
challenges of academic culture, open           engineers equipped with essential
dissemination and use, reproducibility         skills to use and develop software,
and trust, curation, sustainability,           with software and services used in
governance, citation, stewardship, and         both the research and education
attribution of software authorship             process
ACI Software Cluster Programs
•  Exploiting Parallelism and Scalability (XPS)
    –  New CISE & OCI program for foundational groundbreaking
       research leading to a new era of parallel (and distributed)
       computing
    –  Issued in Oct., proposals submitted in Feb.
•  Computational and Data-Enabled Science & Engineering
   (CDS&E)
    –  Virtual program (ENG, MPS, OCI) for science-specific proofing of
       algorithms and codes
    –  Identify and capitalize on opportunities for major scientific and
       engineering breakthroughs through new computational and data
       analysis approaches
•  Software Infrastructure for Sustained Innovation (SI2)
    –  Transform innovations in research and education into sustained
       software resources that are an integral part of the
       cyberinfrastructure
    –  Develop and maintain sustainable software infrastructure that can
       enhance productivity and accelerate innovation in science and
       engineering
Software Infrastructure Projects
SI2 Software Activities
•  Elements (SSE) & Frameworks (SSI)
    –  Past general solicitations, with most of NSF (BIO, CISE, EHR,
       ENG, MPS, SBE): NSF 10-551 (2011), NSF 11-539 (2012)
        •  About 27 SSE and 20 SSI projects (19 SSE & 13 SSI in FY12)
    –  Current focused solicitation, with MPS/CHE and EPSRC: US/UK
       collaborations in computational chemistry, NSF 12-576 (2012)
        •  Will fund 4 awards from 18 proposals
    –  Solicitation open (NSF 13-525)
•  Institutes (S2I2)
    –  Solicitation for conceptualization awards, NSF 11-589 (2012)
        •  13 projects (co-funded with BIO, CISE, ENG, MPS)
    –  Second solicitation for 3-5 more S2I2s (NSF 13-511)
    –  Full institute solicitation in late FY14
•  US/China DCL (with CISE/CNS, loosely with NSFC)
    –  NSF 12-096: will make decisions soon on small set of initial
       projects
    –  Included in fuure SSE&SSI solicitation
•  See http://bit.ly/sw-ci for current projects
SI2 Solicitation and Decision Process

•  Cross-NSF software working group with
   members from all directorates
•  Determined how SI2 fits with other NSF
   programs that support software
   –  See: Implementation of NSF Software Vision - http://
      www.nsf.gov/funding/pgm_summ.jsp?pims_id=504817
•  Discusses solicitations, determines who will
   participate in each
•  Discusses and participates in review process
•  Work together to fund worthy proposals
SI2 Solicitation and Decision Process

•  Proposal reviews well -> my role becomes
   matchmaking
   –  I want to find program officers with funds, and convince them
      that they should spend their funds on the proposal
•  Unidisciplinary project (e.g. bioinformatics app)
   –  Work with single program officer, either likes the proposal or
      not
•  Multidisciplinary project (e.g., molecular
   dynamics)
   –  Work with multiple program officers, ...
•  Onmidisciplinary project (e.g. http, math library)
   –  Try to work with all program officers, often am told “it’s your
      responsibility”
•  In all cases, need to forecast impact
   –  Past performance does predict future results
Measuring Impact – Scenarios
1.  Developer of open source physics simulation
   –  Possible metrics
       •    How many downloads? (easiest to measure, least value)
       •    How many contributors?
       •    How many uses?
       •    How many papers cite it?
       •    How many papers that cite it are cited? (hardest to measure,
            most value)

2.  Developer of open source math library
   –  Possible metrics are similar, but citations are less
      likely
   –  What if users don’t download it?
       •    It’s part of a distro
       •    It’s pre-installed (and optimized) on an HPC system
       •    It’s part of a cloud image
       •    It’s a service
Vision for Metrics & Citation, part 1
•  Products (software, paper, data set) are
   registered
   –  Credit map (weighted list of contributors—people,
      products, etc.) is an input
   –  DOI is an output
   –  Leads to transitive credit
       •  E.g., paper 1 provides 25% credit to software A, and software A
          provides 10% credit to library X -> library X gets 2.5% credit for
          paper 1
       •  Helps developer – “my tools are widely used, give me tenure” or
          “NSF should fund my tool maintenance”
   –  Issues:
       •  Social: Trust in person who registers a product
            –  This seems to work for papers today (without weights) for both
               author lists and for citations
            –  Do weights require more than human memory?
       •  Technological: Registration system
            –  Where is it/them, what are interfaces, how do they work together?
Vision for Metrics & Citation, part 2
•  Product usage is recorded
   –  Where?
       •  Both the developer and user want to track usage
       •  Privacy issues? (legal, competitive, ...)
       •  Via a phone home mechanism?
   –  What does “using” a data set mean? And how could
      trigger a usage record
   –  Can general code be developed for this, to be
      incorporated in software packages?
•  Ties to provenance
•  With user input, tie later products to usage
   –  User may not know science outcome when using tool
   –  After science outcome is known, may be hard to
      determine which product usages were involved
Vision for Metrics & Citation, thoughts
•  Can this be done incrementally?
•  Lack of credit is a larger problem than often
   perceived
   –  Lack of credit is a disincentive for sharing software
      and data
   –  Providing credit would both remove disincentive as
      well as adding incentive
   –  See Lewin’s principal of force field analysis (1943)
•  For commercial tools, credit is tracked by $
   –  But this doesn’t help understand what tools were used
      for what outcomes
   –  Does this encourage collaboration?
•  Could a more economic model be used?
   –  NSF gives tokens are part of science grants, users
      distribute tokens while/after using tools
Software Questions: Sustainability
•  My definition as a program officer:
   –  How will you support your software without me
      continuing to pay for it?
•  What does support mean?
   –  Can I build and run it on my current system?
       •  Adapt to changing underlying hardware/software
   –  Do I understand what it does?
       •  Documentation, training
   –  Does it do what it does correctly?
       •  Bug tracking and updates
       •  Verification and validation
   –  Does it do what I want?
       •  Requirement tracking and updates
   –  Is it changing?
       •  Heritage (legacy) vs. developing software

•  How can Apache help?
Software Questions: Governance
•  Why do we care?
   –  Governance tells users and contributors how the
      project makes decisions, how they can be involved
•  What are the issues?
   –  Community: Users? Developers? Both?
   –  Models: dictatorship (Linux kernel), meritocracy
      (Apache), other?
   –  Tie to development models: cathedral, bazaar
•  How can Apache help?
   –  Study how these work in smaller specialized projects?
Other Questions for Apache

•  Does the Apache Way work for science?
   –  Or just for underlying tools that are useful both for
      science and other applications?
•  How many users/developers are needed for
   success?
•  Incubator model
   –  Can it be used as is for general science software?
   –  Or forked and modified?
•  Open Source for understanding (available) vs
   Open Source for reuse/development
   (changeable)?
General Software Questions
•  Software that is intended to be infrastructure has
   challenges
   –  Unlike in business, more users means more work
   –  The last 20% takes 80% of the effort
   –  What can NSF do to make these things easier?
•  What fraction of funds should be spent of
   support of existing infrastructure vs.
   development of new infrastructure?
•  How do we decide when to stop supporting a
   software element?
•  How do we encourage reuse and discourage
   duplication?
•  How do we more effectively support career
   paths for software developers (with universities,
   labs, etc.)
What Can You Do?
•  Look at the current set of SI2 software
   and institutes, and get involved with one
   –  http://bit.ly/sw-ci
•  Tell me what we should be doing
   differently
   –  Here or email: dkatz@nsf.gov

More Related Content

What's hot

Taming the Big Data Beast - Together
Taming the Big Data Beast - TogetherTaming the Big Data Beast - Together
Taming the Big Data Beast - TogetherKennisalliantie
 
BeSTGRID OpenGridForum 29 GIN session
BeSTGRID OpenGridForum 29 GIN sessionBeSTGRID OpenGridForum 29 GIN session
BeSTGRID OpenGridForum 29 GIN sessionNick Jones
 
The Developing Needs for e-infrastructures
The Developing Needs for e-infrastructuresThe Developing Needs for e-infrastructures
The Developing Needs for e-infrastructuresguest0dc425
 
Advancing Science through Coordinated Cyberinfrastructure
Advancing Science through Coordinated CyberinfrastructureAdvancing Science through Coordinated Cyberinfrastructure
Advancing Science through Coordinated CyberinfrastructureDaniel S. Katz
 
Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2Dan Taylor
 
An Integrated West Coast Science DMZ for Data-Intensive Research
An Integrated West Coast Science DMZ for Data-Intensive ResearchAn Integrated West Coast Science DMZ for Data-Intensive Research
An Integrated West Coast Science DMZ for Data-Intensive ResearchLarry Smarr
 
Rethinking how we provide science IT in an era of massive data but modest bud...
Rethinking how we provide science IT in an era of massive data but modest bud...Rethinking how we provide science IT in an era of massive data but modest bud...
Rethinking how we provide science IT in an era of massive data but modest bud...Ian Foster
 
A VIVO VIEW OF CANCER RESEARCH: Dream, Vision and Reality
A VIVO VIEW OF CANCER RESEARCH: Dream, Vision and RealityA VIVO VIEW OF CANCER RESEARCH: Dream, Vision and Reality
A VIVO VIEW OF CANCER RESEARCH: Dream, Vision and Reality Paul Courtney
 
big_data_casestudies_2.ppt
big_data_casestudies_2.pptbig_data_casestudies_2.ppt
big_data_casestudies_2.pptvishal choudhary
 
UC-Wide Cyberinfrastructure for Data-Intensive Research
UC-Wide Cyberinfrastructure for Data-Intensive ResearchUC-Wide Cyberinfrastructure for Data-Intensive Research
UC-Wide Cyberinfrastructure for Data-Intensive ResearchLarry Smarr
 
HPC lab projects
HPC lab projectsHPC lab projects
HPC lab projectsJason Riedy
 
The Pacific Research Platform
The Pacific Research PlatformThe Pacific Research Platform
The Pacific Research PlatformLarry Smarr
 
dkNET Webinar "Pancreatlas™: Mapping the Human Pancreas in Health and Disease...
dkNET Webinar "Pancreatlas™: Mapping the Human Pancreas in Health and Disease...dkNET Webinar "Pancreatlas™: Mapping the Human Pancreas in Health and Disease...
dkNET Webinar "Pancreatlas™: Mapping the Human Pancreas in Health and Disease...dkNET
 
Whitepaper : CHI: Hadoop's Rise in Life Sciences
Whitepaper : CHI: Hadoop's Rise in Life Sciences Whitepaper : CHI: Hadoop's Rise in Life Sciences
Whitepaper : CHI: Hadoop's Rise in Life Sciences EMC
 
Montana State, Research Networking and the Outcomes from the First National R...
Montana State, Research Networking and the Outcomes from the First National R...Montana State, Research Networking and the Outcomes from the First National R...
Montana State, Research Networking and the Outcomes from the First National R...Jerry Sheehan
 
Building the FAIR Research Commons: A Data Driven Society of Scientists
Building the FAIR Research Commons: A Data Driven Society of ScientistsBuilding the FAIR Research Commons: A Data Driven Society of Scientists
Building the FAIR Research Commons: A Data Driven Society of ScientistsCarole Goble
 

What's hot (20)

Cyberistructure
CyberistructureCyberistructure
Cyberistructure
 
Taming the Big Data Beast - Together
Taming the Big Data Beast - TogetherTaming the Big Data Beast - Together
Taming the Big Data Beast - Together
 
BeSTGRID OpenGridForum 29 GIN session
BeSTGRID OpenGridForum 29 GIN sessionBeSTGRID OpenGridForum 29 GIN session
BeSTGRID OpenGridForum 29 GIN session
 
The Developing Needs for e-infrastructures
The Developing Needs for e-infrastructuresThe Developing Needs for e-infrastructures
The Developing Needs for e-infrastructures
 
Advancing Science through Coordinated Cyberinfrastructure
Advancing Science through Coordinated CyberinfrastructureAdvancing Science through Coordinated Cyberinfrastructure
Advancing Science through Coordinated Cyberinfrastructure
 
Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2
 
An Integrated West Coast Science DMZ for Data-Intensive Research
An Integrated West Coast Science DMZ for Data-Intensive ResearchAn Integrated West Coast Science DMZ for Data-Intensive Research
An Integrated West Coast Science DMZ for Data-Intensive Research
 
Ci days notre_dame_april2010
Ci days notre_dame_april2010Ci days notre_dame_april2010
Ci days notre_dame_april2010
 
Rethinking how we provide science IT in an era of massive data but modest bud...
Rethinking how we provide science IT in an era of massive data but modest bud...Rethinking how we provide science IT in an era of massive data but modest bud...
Rethinking how we provide science IT in an era of massive data but modest bud...
 
A VIVO VIEW OF CANCER RESEARCH: Dream, Vision and Reality
A VIVO VIEW OF CANCER RESEARCH: Dream, Vision and RealityA VIVO VIEW OF CANCER RESEARCH: Dream, Vision and Reality
A VIVO VIEW OF CANCER RESEARCH: Dream, Vision and Reality
 
big_data_casestudies_2.ppt
big_data_casestudies_2.pptbig_data_casestudies_2.ppt
big_data_casestudies_2.ppt
 
UC-Wide Cyberinfrastructure for Data-Intensive Research
UC-Wide Cyberinfrastructure for Data-Intensive ResearchUC-Wide Cyberinfrastructure for Data-Intensive Research
UC-Wide Cyberinfrastructure for Data-Intensive Research
 
HPC lab projects
HPC lab projectsHPC lab projects
HPC lab projects
 
Sgci nsf-2-22-17
Sgci nsf-2-22-17Sgci nsf-2-22-17
Sgci nsf-2-22-17
 
The Pacific Research Platform
The Pacific Research PlatformThe Pacific Research Platform
The Pacific Research Platform
 
dkNET Webinar "Pancreatlas™: Mapping the Human Pancreas in Health and Disease...
dkNET Webinar "Pancreatlas™: Mapping the Human Pancreas in Health and Disease...dkNET Webinar "Pancreatlas™: Mapping the Human Pancreas in Health and Disease...
dkNET Webinar "Pancreatlas™: Mapping the Human Pancreas in Health and Disease...
 
ELIXIR
ELIXIRELIXIR
ELIXIR
 
Whitepaper : CHI: Hadoop's Rise in Life Sciences
Whitepaper : CHI: Hadoop's Rise in Life Sciences Whitepaper : CHI: Hadoop's Rise in Life Sciences
Whitepaper : CHI: Hadoop's Rise in Life Sciences
 
Montana State, Research Networking and the Outcomes from the First National R...
Montana State, Research Networking and the Outcomes from the First National R...Montana State, Research Networking and the Outcomes from the First National R...
Montana State, Research Networking and the Outcomes from the First National R...
 
Building the FAIR Research Commons: A Data Driven Society of Scientists
Building the FAIR Research Commons: A Data Driven Society of ScientistsBuilding the FAIR Research Commons: A Data Driven Society of Scientists
Building the FAIR Research Commons: A Data Driven Society of Scientists
 

Similar to NSF Software @ ApacheConNA

Working towards Sustainable Software for Science (an NSF and community view)
Working towards Sustainable Software for Science (an NSF and community view)Working towards Sustainable Software for Science (an NSF and community view)
Working towards Sustainable Software for Science (an NSF and community view)Daniel S. Katz
 
Software and Education at NSF/ACI
Software and Education at NSF/ACISoftware and Education at NSF/ACI
Software and Education at NSF/ACIDaniel S. Katz
 
Open Source and Science at the National Science Foundation (NSF)
Open Source and Science at the National Science Foundation (NSF)Open Source and Science at the National Science Foundation (NSF)
Open Source and Science at the National Science Foundation (NSF)Daniel S. Katz
 
XSEDE: an ecosystem of advanced digital services accelerating scientific disc...
XSEDE: an ecosystem of advanced digital services accelerating scientific disc...XSEDE: an ecosystem of advanced digital services accelerating scientific disc...
XSEDE: an ecosystem of advanced digital services accelerating scientific disc...John Towns
 
Overview of XSEDE and Introduction to XSEDE 2.0 and Beyond
Overview of XSEDE and Introduction to XSEDE 2.0 and BeyondOverview of XSEDE and Introduction to XSEDE 2.0 and Beyond
Overview of XSEDE and Introduction to XSEDE 2.0 and BeyondJohn Towns
 
Amy Walton - NSF’s Computational Ecosystem for 21st Century Science & Enginee...
Amy Walton - NSF’s Computational Ecosystem for 21st Century Science & Enginee...Amy Walton - NSF’s Computational Ecosystem for 21st Century Science & Enginee...
Amy Walton - NSF’s Computational Ecosystem for 21st Century Science & Enginee...Larry Smarr
 
Xsede for-nlhpc
Xsede for-nlhpcXsede for-nlhpc
Xsede for-nlhpcJohn Towns
 
Scientific Software Challenges and Community Responses
Scientific Software Challenges and Community ResponsesScientific Software Challenges and Community Responses
Scientific Software Challenges and Community ResponsesDaniel S. Katz
 
Scientific Software Innovation Institutes (S2I2s) as part of NSF’s SI2 program
Scientific Software Innovation Institutes (S2I2s) as part of NSF’s SI2 programScientific Software Innovation Institutes (S2I2s) as part of NSF’s SI2 program
Scientific Software Innovation Institutes (S2I2s) as part of NSF’s SI2 programDaniel S. Katz
 
Data-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and CloudData-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and CloudOla Spjuth
 
XSEDE Overview (March 2014)
XSEDE Overview (March 2014)XSEDE Overview (March 2014)
XSEDE Overview (March 2014)John Towns
 
100503 bioinfo instsymp
100503 bioinfo instsymp100503 bioinfo instsymp
100503 bioinfo instsympNick Jones
 
Democratizing Science through Cyberinfrastructure - Manish Parashar
Democratizing Science through Cyberinfrastructure - Manish ParasharDemocratizing Science through Cyberinfrastructure - Manish Parashar
Democratizing Science through Cyberinfrastructure - Manish ParasharLarry Smarr
 
Panel: The Global Research Platform: An Overview
Panel: The Global Research Platform: An OverviewPanel: The Global Research Platform: An Overview
Panel: The Global Research Platform: An OverviewLarry Smarr
 
Data-intensive applications on cloud computing resources: Applications in lif...
Data-intensive applications on cloud computing resources: Applications in lif...Data-intensive applications on cloud computing resources: Applications in lif...
Data-intensive applications on cloud computing resources: Applications in lif...Ola Spjuth
 
g-Social - Enhancing e-Science Tools with Social Networking Functionality
g-Social - Enhancing e-Science Tools with Social Networking Functionalityg-Social - Enhancing e-Science Tools with Social Networking Functionality
g-Social - Enhancing e-Science Tools with Social Networking FunctionalityNicholas Loulloudes
 

Similar to NSF Software @ ApacheConNA (20)

Working towards Sustainable Software for Science (an NSF and community view)
Working towards Sustainable Software for Science (an NSF and community view)Working towards Sustainable Software for Science (an NSF and community view)
Working towards Sustainable Software for Science (an NSF and community view)
 
Software and Education at NSF/ACI
Software and Education at NSF/ACISoftware and Education at NSF/ACI
Software and Education at NSF/ACI
 
Open Source and Science at the National Science Foundation (NSF)
Open Source and Science at the National Science Foundation (NSF)Open Source and Science at the National Science Foundation (NSF)
Open Source and Science at the National Science Foundation (NSF)
 
XSEDE: an ecosystem of advanced digital services accelerating scientific disc...
XSEDE: an ecosystem of advanced digital services accelerating scientific disc...XSEDE: an ecosystem of advanced digital services accelerating scientific disc...
XSEDE: an ecosystem of advanced digital services accelerating scientific disc...
 
Overview of XSEDE and Introduction to XSEDE 2.0 and Beyond
Overview of XSEDE and Introduction to XSEDE 2.0 and BeyondOverview of XSEDE and Introduction to XSEDE 2.0 and Beyond
Overview of XSEDE and Introduction to XSEDE 2.0 and Beyond
 
Amy Walton - NSF’s Computational Ecosystem for 21st Century Science & Enginee...
Amy Walton - NSF’s Computational Ecosystem for 21st Century Science & Enginee...Amy Walton - NSF’s Computational Ecosystem for 21st Century Science & Enginee...
Amy Walton - NSF’s Computational Ecosystem for 21st Century Science & Enginee...
 
Xsede for-nlhpc
Xsede for-nlhpcXsede for-nlhpc
Xsede for-nlhpc
 
Scientific Software Challenges and Community Responses
Scientific Software Challenges and Community ResponsesScientific Software Challenges and Community Responses
Scientific Software Challenges and Community Responses
 
Scientific Software Innovation Institutes (S2I2s) as part of NSF’s SI2 program
Scientific Software Innovation Institutes (S2I2s) as part of NSF’s SI2 programScientific Software Innovation Institutes (S2I2s) as part of NSF’s SI2 program
Scientific Software Innovation Institutes (S2I2s) as part of NSF’s SI2 program
 
Shifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data ProviderShifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data Provider
 
Sgci esip-7-20-18
Sgci esip-7-20-18Sgci esip-7-20-18
Sgci esip-7-20-18
 
Data-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and CloudData-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and Cloud
 
Big Data
Big Data Big Data
Big Data
 
XSEDE Overview (March 2014)
XSEDE Overview (March 2014)XSEDE Overview (March 2014)
XSEDE Overview (March 2014)
 
100503 bioinfo instsymp
100503 bioinfo instsymp100503 bioinfo instsymp
100503 bioinfo instsymp
 
100503 bioinfo instsymp
100503 bioinfo instsymp100503 bioinfo instsymp
100503 bioinfo instsymp
 
Democratizing Science through Cyberinfrastructure - Manish Parashar
Democratizing Science through Cyberinfrastructure - Manish ParasharDemocratizing Science through Cyberinfrastructure - Manish Parashar
Democratizing Science through Cyberinfrastructure - Manish Parashar
 
Panel: The Global Research Platform: An Overview
Panel: The Global Research Platform: An OverviewPanel: The Global Research Platform: An Overview
Panel: The Global Research Platform: An Overview
 
Data-intensive applications on cloud computing resources: Applications in lif...
Data-intensive applications on cloud computing resources: Applications in lif...Data-intensive applications on cloud computing resources: Applications in lif...
Data-intensive applications on cloud computing resources: Applications in lif...
 
g-Social - Enhancing e-Science Tools with Social Networking Functionality
g-Social - Enhancing e-Science Tools with Social Networking Functionalityg-Social - Enhancing e-Science Tools with Social Networking Functionality
g-Social - Enhancing e-Science Tools with Social Networking Functionality
 

More from Daniel S. Katz

Research software susainability
Research software susainabilityResearch software susainability
Research software susainabilityDaniel S. Katz
 
Software Professionals (RSEs) at NCSA
Software Professionals (RSEs) at NCSASoftware Professionals (RSEs) at NCSA
Software Professionals (RSEs) at NCSADaniel S. Katz
 
Parsl: Pervasive Parallel Programming in Python
Parsl: Pervasive Parallel Programming in PythonParsl: Pervasive Parallel Programming in Python
Parsl: Pervasive Parallel Programming in PythonDaniel S. Katz
 
Requiring Publicly-Funded Software, Algorithms, and Workflows to be Made Publ...
Requiring Publicly-Funded Software, Algorithms, and Workflows to be Made Publ...Requiring Publicly-Funded Software, Algorithms, and Workflows to be Made Publ...
Requiring Publicly-Funded Software, Algorithms, and Workflows to be Made Publ...Daniel S. Katz
 
What is eScience, and where does it go from here?
What is eScience, and where does it go from here?What is eScience, and where does it go from here?
What is eScience, and where does it go from here?Daniel S. Katz
 
Citation and Research Objects: Toward Active Research Objects
Citation and Research Objects: Toward Active Research ObjectsCitation and Research Objects: Toward Active Research Objects
Citation and Research Objects: Toward Active Research ObjectsDaniel S. Katz
 
FAIR is not Fair Enough, Particularly for Software Citation, Availability, or...
FAIR is not Fair Enough, Particularly for Software Citation, Availability, or...FAIR is not Fair Enough, Particularly for Software Citation, Availability, or...
FAIR is not Fair Enough, Particularly for Software Citation, Availability, or...Daniel S. Katz
 
Fundamentals of software sustainability
Fundamentals of software sustainabilityFundamentals of software sustainability
Fundamentals of software sustainabilityDaniel S. Katz
 
Software Citation in Theory and Practice
Software Citation in Theory and PracticeSoftware Citation in Theory and Practice
Software Citation in Theory and PracticeDaniel S. Katz
 
Research Software Sustainability: WSSSPE & URSSI
Research Software Sustainability: WSSSPE & URSSIResearch Software Sustainability: WSSSPE & URSSI
Research Software Sustainability: WSSSPE & URSSIDaniel S. Katz
 
Expressing and sharing workflows
Expressing and sharing workflowsExpressing and sharing workflows
Expressing and sharing workflowsDaniel S. Katz
 
Citation and reproducibility in software
Citation and reproducibility in softwareCitation and reproducibility in software
Citation and reproducibility in softwareDaniel S. Katz
 
Software Citation: Principles, Implementation, and Impact
Software Citation:  Principles, Implementation, and ImpactSoftware Citation:  Principles, Implementation, and Impact
Software Citation: Principles, Implementation, and ImpactDaniel S. Katz
 
Summary of WSSSPE and its working groups
Summary of WSSSPE and its working groupsSummary of WSSSPE and its working groups
Summary of WSSSPE and its working groupsDaniel S. Katz
 
Working towards Sustainable Software for Science: Practice and Experience (WS...
Working towards Sustainable Software for Science: Practice and Experience (WS...Working towards Sustainable Software for Science: Practice and Experience (WS...
Working towards Sustainable Software for Science: Practice and Experience (WS...Daniel S. Katz
 
20160607 citation4software panel
20160607 citation4software panel20160607 citation4software panel
20160607 citation4software panelDaniel S. Katz
 
20160607 citation4software opening
20160607 citation4software opening20160607 citation4software opening
20160607 citation4software openingDaniel S. Katz
 
What do we need beyond a DOI?
What do we need beyond a DOI?What do we need beyond a DOI?
What do we need beyond a DOI?Daniel S. Katz
 

More from Daniel S. Katz (20)

Research software susainability
Research software susainabilityResearch software susainability
Research software susainability
 
Software Professionals (RSEs) at NCSA
Software Professionals (RSEs) at NCSASoftware Professionals (RSEs) at NCSA
Software Professionals (RSEs) at NCSA
 
Parsl: Pervasive Parallel Programming in Python
Parsl: Pervasive Parallel Programming in PythonParsl: Pervasive Parallel Programming in Python
Parsl: Pervasive Parallel Programming in Python
 
Requiring Publicly-Funded Software, Algorithms, and Workflows to be Made Publ...
Requiring Publicly-Funded Software, Algorithms, and Workflows to be Made Publ...Requiring Publicly-Funded Software, Algorithms, and Workflows to be Made Publ...
Requiring Publicly-Funded Software, Algorithms, and Workflows to be Made Publ...
 
What is eScience, and where does it go from here?
What is eScience, and where does it go from here?What is eScience, and where does it go from here?
What is eScience, and where does it go from here?
 
Citation and Research Objects: Toward Active Research Objects
Citation and Research Objects: Toward Active Research ObjectsCitation and Research Objects: Toward Active Research Objects
Citation and Research Objects: Toward Active Research Objects
 
FAIR is not Fair Enough, Particularly for Software Citation, Availability, or...
FAIR is not Fair Enough, Particularly for Software Citation, Availability, or...FAIR is not Fair Enough, Particularly for Software Citation, Availability, or...
FAIR is not Fair Enough, Particularly for Software Citation, Availability, or...
 
Fundamentals of software sustainability
Fundamentals of software sustainabilityFundamentals of software sustainability
Fundamentals of software sustainability
 
Software Citation in Theory and Practice
Software Citation in Theory and PracticeSoftware Citation in Theory and Practice
Software Citation in Theory and Practice
 
URSSI
URSSIURSSI
URSSI
 
Research Software Sustainability: WSSSPE & URSSI
Research Software Sustainability: WSSSPE & URSSIResearch Software Sustainability: WSSSPE & URSSI
Research Software Sustainability: WSSSPE & URSSI
 
Software citation
Software citationSoftware citation
Software citation
 
Expressing and sharing workflows
Expressing and sharing workflowsExpressing and sharing workflows
Expressing and sharing workflows
 
Citation and reproducibility in software
Citation and reproducibility in softwareCitation and reproducibility in software
Citation and reproducibility in software
 
Software Citation: Principles, Implementation, and Impact
Software Citation:  Principles, Implementation, and ImpactSoftware Citation:  Principles, Implementation, and Impact
Software Citation: Principles, Implementation, and Impact
 
Summary of WSSSPE and its working groups
Summary of WSSSPE and its working groupsSummary of WSSSPE and its working groups
Summary of WSSSPE and its working groups
 
Working towards Sustainable Software for Science: Practice and Experience (WS...
Working towards Sustainable Software for Science: Practice and Experience (WS...Working towards Sustainable Software for Science: Practice and Experience (WS...
Working towards Sustainable Software for Science: Practice and Experience (WS...
 
20160607 citation4software panel
20160607 citation4software panel20160607 citation4software panel
20160607 citation4software panel
 
20160607 citation4software opening
20160607 citation4software opening20160607 citation4software opening
20160607 citation4software opening
 
What do we need beyond a DOI?
What do we need beyond a DOI?What do we need beyond a DOI?
What do we need beyond a DOI?
 

Recently uploaded

Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 

Recently uploaded (20)

Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 

NSF Software @ ApacheConNA

  • 1. Software as Infrastructure at NSF/OCI Daniel S. Katz Program Director, Office of Cyberinfrastructure (OCI)
  • 2. Software as Infrastructure at NSF/OCI Daniel S. Katz Program Director, Office of Cyberinfrastructure (OCI) Division of Advanced Cyberinfrastructure (ACI)
  • 3. Big Science and Infrastructure •  Higgs* boson discovery announced at CERN July 4, 2012 •  Instrument: Large Hadron Collider (LHC) •  Infrastructure –  Computing Hardware: Worldwide LHC Computing Grid (WLCG): 235,000 cores across 36 countries, including OpenScience Grid (OSG, US), European Grid Infrastructure (EGI, Europe), ... –  Data: ~20 PB of data created in 2011-2012 –  Software: grid middleware, physics analysis applications, ... –  Networks –  Education & Training •  Data generated centrally, moved (~3 PB/week) across multi-tiered infrastructure to be computing upon See: http://www.isgtw.org/feature/how-grid-computing-helped-cern-hunt-higgs
  • 4. Big Science and Infrastructure •  Hurricanes affect humans •  Multi-physics: atmosphere, ocean, coast, vegetation, soil –  Sensors and data as inputs •  Humans: what have they built, where are they, what will they do –  Data and models as inputs •  Infrastructure: –  Urgent/scheduled processing, workflows –  Software applications, workflows –  Networks –  Decision-support systems, visualization –  Data storage, interoperability
  • 5. Long-tail Science and Infrastructure •  Exploding data volumes & powerful simulation methods NSF grant size, 2007. (“Dark data in the long tail mean that more researchers of science”, B. Heidorn) need advanced infrastructure •  Such “long-tail” researchers cannot afford expensive expertise and unique infrastructure •  Challenge: Outsource and/or automate time-consuming common processes –  Tools, e.g., Globus Online and data management •  Note: much LHC data is moved by Globus GridFTP, e.g., May/ June 2012, >20 PB, >20M files –  Gateways, e.g., nanoHUB, CIPRES, access to scientific simulation software
  • 6. Long-tail Science and Infrastructure •  CIPRES Science Gateway for Phylogenetics –  Study of diversification of life and relationships among living things through time •  Highly used –  Cited in at least 400 publications, e.g., Nature, PNAS, Cell –  More than 5000 unique users in 3 years –  Used routinely in at least 68 undergraduate classes –  45% US (including most states), 55% 70 other countries •  Infrastructure –  Flexible web application •  A science gateway, uses software and lessons from XSEDE gateways team, e.g., identify management, HPC job control –  Science software: tree inference and sequence alignment •  Parallel versions of MrBayes, RAxML, GARLI, BEAST, MAFFT •  PAUP*, Poy, ClustalW, Contralign, FSA, MUSCLE, ... –  Data •  Personal user space for storing results •  Tools to transfer and view data Credit: Mark Miller, SDSC
  • 7. Infrastructure Challenges •  Science –  Larger teams, more disciplines, more countries •  Data –  Size, complexity, rates all increasing rapidly –  Need for interoperability (systems and policies) •  Systems –  More cores, more architectures (GPUs), more memory hierarchy –  Changing balances (latency vs bandwidth) –  Changing limits (power, funds) –  System architecture and business models changing (clouds) –  Network capacity growing; increase networks -> increased security •  Software –  Multiphysics algorithms, frameworks –  Programing models and abstractions for science, data, and hardware –  V&V, reproducibility, fault tolerance •  People –  Education and training –  Career paths –  Credit and attribution
  • 8. Cyberinfrastructure (e-Research) •  “Cyberinfrastructure consists of computing systems, data storage systems, advanced instruments and data repositories, visualization environments, and people, all linked together by software and high performance networks to improve research productivity and enable breakthroughs not otherwise possible.” -- Craig Stewart •  Infrastructure elements: –  parts of an infrastructure, –  developed by individuals and groups, –  international, –  developed for a purpose, –  used by a community
  • 9. Software is Infrastructure Science   •  Software essential for the bulk of science -  About half the papers in recent issues of Science were software-intensive projects -  Research becoming dependent upon So(ware     advances in software -  Significant software development being conducted across NSF: NEON, OOI, Compu0ng   NEES, NCN, iPlant, etc Infrastructure   •  Wide range of software types: system, Scientific applications, modeling, gateways, Discovery Technological Innovation analysis, algorithms, middleware, libraries •  Development, production and maintenance are people intensive •  Software life-times are long compared to hardware •  Under-appreciated value Software Software Education
  • 10. Cyberinfrastructure Framework for 21st Century Science and Engineering (CIF21) •  Cross-NSF portfolio of activities to provide integrated cyber resources that will enable new multidisciplinary research opportunities in all science and engineering fields by leveraging ongoing investments and using common approaches and components (http://www.nsf.gov/cif21) •  ACCI task force reports (http://www.nsf.gov/od/oci/taskforces/index.jsp) –  Campus Bridging, Cyberlearning & Workforce Development, Data & Visualization, Grand Challenges, HPC, Software for Science & Engineering –  Included recommendation for NSF-wide CDS&E program •  Vision and Strategy Reports –  ACI - http://www.nsf.gov/publications/pub_summ.jsp?ods_key=nsf12051 –  Software - http://www.nsf.gov/publications/pub_summ.jsp?ods_key=nsf12113 –  Data - http://www.nsf.gov/od/oci/cif21/DataVision2012.pdf •  Implementation –  Implementation of Software Vision http://www.nsf.gov/funding/pgm_summ.jsp?pims_id=504817
  • 11. Software Vision NSF will take a leadership role in providing software as enabling infrastructure for science and engineering research and education, and in promoting software as a principal component of its comprehensive CIF21 vision •  ... •  Reducing the complexity of software will be a unifying theme across the CIF21 vision, advancing both the use and development of new software and promoting the ubiquitous integration of scientific software across all disciplines, in education, and in industry –  A Vision and Strategy for Software for Science, Engineering, and Education – NSF 12-113
  • 12. Infrastructure Role & Lifecycle Create and maintain a software ecosystem providing new capabilities that Enable transformative, advance and accelerate interdisciplinary, Support the scientific inquiry at collaborative, science foundational unprecedented and engineering research necessary complexity and scale research and to continue to education through the efficiently advance use of advanced scientific software software and services Transform practice through new Develop a next generation diverse policies for software addressing workforce of scientists and challenges of academic culture, open engineers equipped with essential dissemination and use, reproducibility skills to use and develop software, and trust, curation, sustainability, with software and services used in governance, citation, stewardship, and both the research and education attribution of software authorship process
  • 13. ACI Software Cluster Programs •  Exploiting Parallelism and Scalability (XPS) –  New CISE & OCI program for foundational groundbreaking research leading to a new era of parallel (and distributed) computing –  Issued in Oct., proposals submitted in Feb. •  Computational and Data-Enabled Science & Engineering (CDS&E) –  Virtual program (ENG, MPS, OCI) for science-specific proofing of algorithms and codes –  Identify and capitalize on opportunities for major scientific and engineering breakthroughs through new computational and data analysis approaches •  Software Infrastructure for Sustained Innovation (SI2) –  Transform innovations in research and education into sustained software resources that are an integral part of the cyberinfrastructure –  Develop and maintain sustainable software infrastructure that can enhance productivity and accelerate innovation in science and engineering
  • 15. SI2 Software Activities •  Elements (SSE) & Frameworks (SSI) –  Past general solicitations, with most of NSF (BIO, CISE, EHR, ENG, MPS, SBE): NSF 10-551 (2011), NSF 11-539 (2012) •  About 27 SSE and 20 SSI projects (19 SSE & 13 SSI in FY12) –  Current focused solicitation, with MPS/CHE and EPSRC: US/UK collaborations in computational chemistry, NSF 12-576 (2012) •  Will fund 4 awards from 18 proposals –  Solicitation open (NSF 13-525) •  Institutes (S2I2) –  Solicitation for conceptualization awards, NSF 11-589 (2012) •  13 projects (co-funded with BIO, CISE, ENG, MPS) –  Second solicitation for 3-5 more S2I2s (NSF 13-511) –  Full institute solicitation in late FY14 •  US/China DCL (with CISE/CNS, loosely with NSFC) –  NSF 12-096: will make decisions soon on small set of initial projects –  Included in fuure SSE&SSI solicitation •  See http://bit.ly/sw-ci for current projects
  • 16. SI2 Solicitation and Decision Process •  Cross-NSF software working group with members from all directorates •  Determined how SI2 fits with other NSF programs that support software –  See: Implementation of NSF Software Vision - http:// www.nsf.gov/funding/pgm_summ.jsp?pims_id=504817 •  Discusses solicitations, determines who will participate in each •  Discusses and participates in review process •  Work together to fund worthy proposals
  • 17. SI2 Solicitation and Decision Process •  Proposal reviews well -> my role becomes matchmaking –  I want to find program officers with funds, and convince them that they should spend their funds on the proposal •  Unidisciplinary project (e.g. bioinformatics app) –  Work with single program officer, either likes the proposal or not •  Multidisciplinary project (e.g., molecular dynamics) –  Work with multiple program officers, ... •  Onmidisciplinary project (e.g. http, math library) –  Try to work with all program officers, often am told “it’s your responsibility” •  In all cases, need to forecast impact –  Past performance does predict future results
  • 18. Measuring Impact – Scenarios 1.  Developer of open source physics simulation –  Possible metrics •  How many downloads? (easiest to measure, least value) •  How many contributors? •  How many uses? •  How many papers cite it? •  How many papers that cite it are cited? (hardest to measure, most value) 2.  Developer of open source math library –  Possible metrics are similar, but citations are less likely –  What if users don’t download it? •  It’s part of a distro •  It’s pre-installed (and optimized) on an HPC system •  It’s part of a cloud image •  It’s a service
  • 19. Vision for Metrics & Citation, part 1 •  Products (software, paper, data set) are registered –  Credit map (weighted list of contributors—people, products, etc.) is an input –  DOI is an output –  Leads to transitive credit •  E.g., paper 1 provides 25% credit to software A, and software A provides 10% credit to library X -> library X gets 2.5% credit for paper 1 •  Helps developer – “my tools are widely used, give me tenure” or “NSF should fund my tool maintenance” –  Issues: •  Social: Trust in person who registers a product –  This seems to work for papers today (without weights) for both author lists and for citations –  Do weights require more than human memory? •  Technological: Registration system –  Where is it/them, what are interfaces, how do they work together?
  • 20. Vision for Metrics & Citation, part 2 •  Product usage is recorded –  Where? •  Both the developer and user want to track usage •  Privacy issues? (legal, competitive, ...) •  Via a phone home mechanism? –  What does “using” a data set mean? And how could trigger a usage record –  Can general code be developed for this, to be incorporated in software packages? •  Ties to provenance •  With user input, tie later products to usage –  User may not know science outcome when using tool –  After science outcome is known, may be hard to determine which product usages were involved
  • 21. Vision for Metrics & Citation, thoughts •  Can this be done incrementally? •  Lack of credit is a larger problem than often perceived –  Lack of credit is a disincentive for sharing software and data –  Providing credit would both remove disincentive as well as adding incentive –  See Lewin’s principal of force field analysis (1943) •  For commercial tools, credit is tracked by $ –  But this doesn’t help understand what tools were used for what outcomes –  Does this encourage collaboration? •  Could a more economic model be used? –  NSF gives tokens are part of science grants, users distribute tokens while/after using tools
  • 22. Software Questions: Sustainability •  My definition as a program officer: –  How will you support your software without me continuing to pay for it? •  What does support mean? –  Can I build and run it on my current system? •  Adapt to changing underlying hardware/software –  Do I understand what it does? •  Documentation, training –  Does it do what it does correctly? •  Bug tracking and updates •  Verification and validation –  Does it do what I want? •  Requirement tracking and updates –  Is it changing? •  Heritage (legacy) vs. developing software •  How can Apache help?
  • 23. Software Questions: Governance •  Why do we care? –  Governance tells users and contributors how the project makes decisions, how they can be involved •  What are the issues? –  Community: Users? Developers? Both? –  Models: dictatorship (Linux kernel), meritocracy (Apache), other? –  Tie to development models: cathedral, bazaar •  How can Apache help? –  Study how these work in smaller specialized projects?
  • 24. Other Questions for Apache •  Does the Apache Way work for science? –  Or just for underlying tools that are useful both for science and other applications? •  How many users/developers are needed for success? •  Incubator model –  Can it be used as is for general science software? –  Or forked and modified? •  Open Source for understanding (available) vs Open Source for reuse/development (changeable)?
  • 25. General Software Questions •  Software that is intended to be infrastructure has challenges –  Unlike in business, more users means more work –  The last 20% takes 80% of the effort –  What can NSF do to make these things easier? •  What fraction of funds should be spent of support of existing infrastructure vs. development of new infrastructure? •  How do we decide when to stop supporting a software element? •  How do we encourage reuse and discourage duplication? •  How do we more effectively support career paths for software developers (with universities, labs, etc.)
  • 26. What Can You Do? •  Look at the current set of SI2 software and institutes, and get involved with one –  http://bit.ly/sw-ci •  Tell me what we should be doing differently –  Here or email: dkatz@nsf.gov