Research computing is in an exciting era and has never as fast evolved as in the last 20 years. We can nowadays answer research questions that we could not even ask two decades ago. This has led to discoveries such as the analyses of DNA from Next-Generation Sequencing technologies. The increased complexity of software, data, hardware and lab instruments demands for more openness and sharing of data and methods. Researchers and educators are not necessarily IT specialists though. Thus, a further trend in research computing is the shift from system-centric design to user-centric design and interdisciplinary teams – complex solutions are offered in self-explanatory user interfaces, so-called science gateways or virtual research environments. I will present solutions and projects supporting users to be able to focus on their research questions without the need to become acquainted with the nitty-gritty details of the complex research computing infrastructure. Key aspects of the presented projects are usability and interoperability of computational methods, reproducibility of research results as well as sustainability of research software. Sustainability of research software has many facets. I advocate for improving the diversity in workforce development, career paths for research software engineers and for incentivizing their work via means beyond the traditional academic rewarding system.
Bridging Gaps and Broadening Participation inToday's and Future Research Computing Ecosystem
1. Bridging Gaps and
Broadening Participation in
Today's and Future
Research Computing
Ecosystem
Sandra Gesing
sandra.gesing@nd.edu
http://sandra-gesing.com/
May 20, 2021
3. Increased
complexity of
Today’s research
questions
Hardware and
software
Skills required
Greater need for
openness and
reproducibility
Driving policy
questions
Interdisciplinary
distributed projects
Opportunity to
integrate
research with
teaching
Better workforce
preparation
We need end-to-end solutions that provide
broad access to advanced resources and allow
all to tackle
today’s challenging science questions
è Science Gateways
Research
Computing
4. Lab Instruments
High-speed networks
Research Software
Web-based
agile frameworks
Distributed data and
computing infrastructures
Data and Compute-intensive
Problems
Research
Computing
5. Lab Instruments
High-speed networks
Research Software
Web-based
agile frameworks
Distributed data and
computing infrastructures
Data and Compute-intensive
Problems
Research
Computing
6. Lab Instruments
High-speed networks
Research Software
Web-based
agile frameworks
Distributed data and
computing infrastructures
Data and Compute-intensive
Problems
Need for science gateways
Research
Computing
7. Lab Instruments
High-speed networks
Research Software
Web-based
agile frameworks
Distributed data and
computing infrastructures
Data and Compute-intensive
Problems
Research
Computing
12. Usability
“After all, usability really
just means that making
sure that something
works well: that a
person … can use the
thing - whether it's a
Web site, a fighter jet, or
a revolving door - for its
intended purpose
without getting
hopelessly frustrated.”
Steve Krug in “Don't make me think!:
A Common Sense Approach to Web Usability”, 2005
13. Reproducibility
Replicability
“Reproducibility means
obtaining consistent
computational results
using the same input
data, computational
steps, methods, code,
and conditions of
analysis. Replicability
means obtaining
consistent results across
studies aimed at
answering the same
scientific question, each
of which has obtained
its own data.”
https://www.slideshare.net/carolegoble/what-is-reproducibility-gobleclean
Re-usability
“The key to productivity
is reusability.
The easiest way to
produce code is
obviously to
have it already!"
John R. Bourne in “Object-oriented
Engineering: Building Engineering Systems
Using Smalltalk-80”, 1992
https://phys.org/news/2019-05-replicability-science.html
14. Interoperability
“capability to communicate, execute programs, or
transfer data among various functional units in a
manner that requires the user to have little or no
knowledge of the unique characteristics of those
units”
ISO/EIC 2382 Information Technology Vocabulary
“the ability of two or more systems or
components to exchange information and to use
the information that has been exchanged”
IEEE Standard Computer Dictionary: A Compilation of IEEE Standard Computer Glossaries
15. Sustainability
Sustainability means that
the software you use today will be
available - and continue to be
improved and supported –
in the future.
Software Sustainability Institute
17. YOU ARE NOT
ALONE!
https://sciencegateways.org/
Longer term
support engagements
Diverse expertise
on demand
Software and visibility
for gateways
Information exchange
in a community
environment
Student opportunities and
more stable career paths
Gesing, S., Wilkins-Diehr, N., Dahan, M., Lawrence, K., Zentner, M., Pierce, M., Hayden, L.B., and Marru, S.
"Science Gateways: The Long Road to the Birth of an Institute" Proc. of HICSS-50 (50th Hawaii International
Conference on System Sciences), 4-7 January 2017, Hilton Waikoloa, HI, USA, http://hdl.handle.net/10125/41919
18. • Hands-on Development Support
• Usability Design / Analysis
• Operations & Sustainability Advice from Consultants
• Cybersecurity Consulting
• Catalog of Gateways & Software
• Website Resources & Materials
• Webinar Series
• Gateway Focus Week
• Student-focused Internship and
Hackathon Programs
• Certificate / Educational
Programs
• Partner Program
• Affiliates Program
• Ambassador Programs
• Blog, News & Job Postings
• Community Forum
• Annual Conference
• Outreach
19. HUBzero instances world wide
HUBzero users world wide
Lessons learned:
Approaches should be
technology agnostic, using APIs
and standard web technologies
OR deliver a complete solution
Community Engagement is key
• Widely used complete frameworks (HUBzero, Open Science Framework, Galaxy,
Globus Data Portal, etc.)
• RESTful APIs and support of multiple programming languages in widely used
frameworks (Apache Airavata, TAPIS, etc.)
• Reused interface implementations (CIPRES, etc.)
• Science gateways as a service with provision of hardware in the background
(SciGap, etc.)
Science Gateway Technologies
20. Ambassador
Programs
A program to support those who would like to
promote gateways for research and education—
bridging the gaps for communities to accelerate
their research and teaching
Science Ambassadors
Gateway Ambassadors
Gateway Ambassadors serve as community
builders, making connections between
people, experts, and resources.
Gesing, S., Lawrence, K., Dahan, M., Pierce, M.E., Wilkins-Diehr, N. and Zentner, M. "Science gateways:
Sustainability via on-campus teams", Future Generation Computer Systems, volume 94, pages 97-102, May 2019.
21. Collaborations on
Science Gateways Make
Challenges Less Steep
• Great visibility for the institution’s research
activities
• Synergy effects between projects
• Shared resources, costs and expertise across
departments
• Lower learning curves
• Expertise that is otherwise difficult for individual
projects to obtain
22. Community
Building
Do you want to share advanced
software or digital products for
research or teaching?
You probably benefit from a science
gateway!
23. Community
Building
Do you need people with
knowledge about data
preservation, data lifecycle, and
with programming skills?
Digital librarians are probably a
great fit for this skill set.
24. Community
Building
Do you need people with
knowledge about machine
learning, meta-data,
ontologies, statistics?
Data scientists are probably a
great fit for this skill set.
25. Community
Building
Do you need people with
knowledge about how to access
HPC resources, VMs,
containerization, distributed data
management?
HPC specialists are probably
a great fit for this skill set.
26. Community
Building
Do you need people with
knowledge about project
management, marketing,
outreach?
Business specialists are
probably a great fit for this
skill set.
27. Gateway
Ambassadors
Members of a professional
community
• Community activities
• learning from peers
• sharing information
• Ambassador activities
• meeting with individuals
• hosting awareness sessions
28. What are the Gaps in the
Current Community
Building Landscape?
Gateway Ambassadors
can fill gaps as
community builders!
SGCI can fill gaps on
expertise and services!
29. 2834 196
289 623
Attendees at SGCI Events Letters of Collaboration
Publicly Available Research Products Gateway Catalog Entries
681 (469)
Faculty and Students Receiving Support
(Underrepresented)
121
Consultations
SGCI in numbers after 4.5 years
30. Impact Our clients collectively have
over 17,000 publications,
over 2,000 since our
engagement.
Our clients’ publications
have over 680,000 citations,
with over 190,000 since
engagement.
Our clients’ gateways have
been cited more than 47,100
times.
31. HUBzero
• Open source platform
• Development
• Hosting services
• 60+ Hubs
• 2 million visitors worldwide
Gesing, S. et al. "HUBzero®: Novel Concepts Applied to Established Computing Infrastructures to Address
Communities’ Needs" PEARC '19: Proceedings of the Practice and Experience in Advanced Research Computing on
Rise of the Machines (learning), Chicago, IL, 2019, July 28 - August 1, 2019
34. HUBzero
• Open source platform
• Development
• Hosting services
• 60+ Hubs
• 2 million visitors worldwide
Worldwide nanoHUB activity over a 24-hour period
35. HUBzero
• Open source platform
• Development
• Hosting services
• 60+ Hubs
• 2 million visitors worldwide
36. HUBzero
• Open source platform
• Development
• Hosting services
• 60+ Hubs
• 2 million visitors worldwide
37. HUBzero
• Open source platform
• Development
• Hosting services
• 60+ Hubs
• 2 million visitors worldwide
38. HUBzero
• Open source platform
• Development
• Hosting services
• 60+ Hubs
• 2 million visitors worldwide
MyGeoHub
This hub supports the geospatial modeling, data analysis and visualization
needs of the broad research and education communities.
39. HUBzero
• Open source platform
• Development
• Hosting services
• 60+ Hubs
• 2 million visitors worldwide
MyGeoHub
Simple-G-US and SWATShare via GABBs
40. HUBzero
• Open source platform
• Development
• Hosting services
• 60+ Hubs
• 2 million visitors worldwide
MyGeoHub
Simple-G-US and SWATShare via GABBs
41. HUBzero
• Build communities
• Use & Publish tools
• Jupyter Notebooks
• Linux Apps
• RStudio
• Web Apps
NOT reinventing the wheel
• Publish Research
42. HUBzero
• Build communities
• Use & Publish tools
• Jupyter Notebooks
• Linux Apps
• RStudio
• Web Apps
NOT reinventing the wheel
Getting new features more
easily
• Publish Research
43. HUBzero in 2021 • 20 full time software professionals
specializing in:
• Cybersecurity
• Web programming
• User experience design
• Scientific application development
• Analytics
• Middleware
• High performance computing
• System administration
• Customer service
• Entirely self funded
Even with success we
must continue to innovate
new business models,
partnerships, delivery
capabilities, and service
architectures and
continue to address
additional research
domains.
45. URSSI
Conceptualizing a US
Research Software Sustainability
Institute
NSF made 18,592 awards totaling $9.6 billion
(1995 – 2016) that topically reference “software” in
their abstracts
46. URSSI
Conceptualizing a US
Research Software Sustainability
Institute
Conceptualize (plan) a US
Research Software Sustainability
Institute
Cut across existing activities
funded by NSF and beyond
Directly and indirectly positively
impact all software development
and maintenance projects
Focus on the entire research
software ecosystem, including the
people who create, maintain, and
use research software
Outputs:
• Eager supportive & inclusive community
• Concrete institute plan configured to offer
valued services
47. URSSI
Conceptualizing a US
Research Software Sustainability
Institute
• Workshops
• Survey
• Ethnographic studies
• Winterschool
• Communication and outreach
Through all activities, iteratively build on existing, extensive
understanding of the challenges for sustainable software and its
developers
Carver, J., Gesing, S., Katz, D.S., Ram, K. and Weber, N. "Conceptualization of a US Research Software Sustainability Institute (URSSI)" IEEE Computing in Science
& Engineering, Volume: 20, Issue: 3, pp. 4-9, 2018, DOI: 10.1109/MCSE.2018.03221924
48. URSSI
Conceptualizing a US
Research Software Sustainability
Institute
Supporting
Software
Supporting
People
Supporting the
community
Science &
research
impact
Development Support
(consulting & short term
small project support)
X X
Incubator (technology
advice, business planning,
usability advice, etc.)
X X
Training (courses & guides) X X X
Policy (research &
campaigns)
X X X
Community (fellowships,
workshops, blogs, website)
X X X X
51. US-RSE
US Research Software Engineer
Association
Movement and Term: Born in the UK
• 2012 SSI’s Collaborations Workshop -
“Research Software Engineer”
• Late 2013 UKRSE Association forms
• Since 2019 Society of Research Software
Engineering
52. US-RSE
US Research Software Engineer
Association
More countries followed the concept
• Australia/New Zealand
• Belgium
• Germany
• The Netherlands
• Nordic
53. US-RSE
US Research Software Engineer
Association
• US: Winter 2017-2018
• US Survey of RSEs: ~175 responses
• 12 responded with interest to build a
national community
• UK Sponsored 1st International RSE Leaders
Meeting
• 5 US delegates
• Jan-Feb 2018: US-RSE Slack workspace and
first website go live
• Building of steering committee
54. US-RSE
US Research Software Engineer
Association
Mission
Community
Create a professional community to share
knowledge, connections, and resources
Advocacy
Promote RSEs impact on research,
highlighting the critical and valuable role
RSEs serve
Resources
Access to information and material to
support individuals and RSE groups
Carver, J., Cosden, I., Hill, C., Gesing, S., Katz, D.S. Sustaining Research Software via Research Software Engineers
and Professional Associations ICSE 2021 BokSS Workshop, June 2021, arXiv:2103.01880 [cs.SE]
55. US-RSE
US Research Software Engineer
Association
• Newsletters
• Community calls
• Working groups
• Diversity, Equity and Inclusion
• Speaker series
• Book club
• Website
• Education and Training
• Workshops
58. PresQT
Preservation
Quality Tool
Bridging the Gap to Data and
Software Sharing
Researchers
“the local academic community struggles to effectively manage
its assets which manifested itself in a number of challenges, and
as for researchers, they lacked storage capacity and data
curation processes, and the institution lacked standard metadata
and indexing technologies, as well as tools that would support
the whole research workflow” - Digital Asset Strategy
Committee, DigitalND, 2011
Libraries
Typically, data curation happens retroactively, and as a result
data is either not captured at all or available metadata is
incomplete.
59. Lifecycle of Research Projects
Selection/
development of
tools
Data
assembling/
creating
Reports
Preservation of
Data
Funding ends
New
project
Work-intensive and
too late in the lifecycle
60. Target Lifecycle of Research Projects
Selection/
development of
tools
Data
assembling/
creating
Reports
Preservation of
Data
Funding ends
New
project
EASY
STEP!!!
(ideally)
Assure quality
of data
Assure
quality
of data
61. PresQT
PresQT
Preservation
Quality Tool
Bridging the Gap to Data and
Software Sharing
Researchers
“the local academic community struggles to effectively manage
its assets which manifested itself in a number of challenges, and
as for researchers, they lacked storage capacity and data
curation processes, and the institution lacked standard metadata
and indexing technologies, as well as tools that would support
the whole research workflow” - Digital Asset Strategy
Committee, DigitalND, 2011
Libraries
Typically, data curation happens retroactively, and as a result
data is either not captured at all or available metadata is
incomplete.
Pressures from the Outside
“...digitally formatted scientific data resulting from unclassified
research supported wholly or in part should be stored and
publicly accessible to search, retrieve, and analyze.” - White
House OSTP Public Access Memo, Feb. 2013
65. Collaborative Effort
An implementation grant and previous planning
grant funded effort to address needs for
preserving data and software. The goal is to
collaboratively design, develop, and connect
interoperable and repository agnostic Data and
Software Preservation Quality Tools.
Gesing, S., Meyers, N., Johnson, R. and Wang, J. PresQT – Services to Improve Re-use and FAIRness of Research
Data and Software. Collaborations Workshop 21, March 30 - April 1, 2021
66. PresQT
Acknowledgements
•Funded Subawardees
• Sheridan Libraries, John Hopkins University
• NDS
• UC San Diego Library
• HUBzero team
• Yale University Library
•Collaborators and Testing Partners
• Libraries at Amherst College, Fontbonne University,
Tuskegee University, Confederation of Open Access
Repositories (COAR)
• ReproZip, Jupyter, CERN, RDA grous
• Midwest Big Data Hub, Science Gateways
Community Institute, URSSI, Center for Open
Science, Data Curation Network, Software
Preservation Network
• Workshops
• Data Futures: Preserving Annotation with Peter
Cornwell
• SDSC: David Valentine and Ilya Zaslavsky
• Mark Wilkinson: FAIR Evaluation Services
• Daniel Clarke and Avi Ma’ayan FAIRShake ,
Assessment Rubrics
67. Collaborative Effort
• Will it help my job
performance?
• Is it USEFUL?
• Is it better than the old
way?
• Easy to use?
• Easy to learn?
• Time-consuming?
• Efficient?
• Can I do this? Do I have
knowledge? Support?
Resources?
• Does it fit in with my
work style?
Unified Theory of Acceptance and Use of Technology (UTAUT) Venkatesh et al, 2003
68. Collaborative Effort
Concept
• not standalone solutions
• partner systems and services easily integrable via RESTful
APIs and services
• user-centered open design and collaborative development
70. PresQT
Connects and
Enhances
• Configure additional partner systems via JSON and Python
• Check fixity via hash algorithms
• Add metadata via JSON
• Transfer in BagIt format
• Enhance keywords
• Check available tools via EaaSI
• Check FAIRness of research objects
75. Methods
• Science gateway based on HUBzero
• Web scrape toolkit based on Scrapy Python
framework
• Second toolkit based on Python's Pdfminer framework
seeks for sets of pedagogical keywords in syllabi
• Torrance test for creativity
• Descriptive statistics
• Pearson’s chi-squared test to determine
homogeneity of responses by demographic
• a Kruskal-Wallis H test was used in lieu of ANOVA
to examine averages between groups preferring
difference means of representing information
Example for preliminary results:
Female students expressed a stronger preference
for some sort of image than did male students,
and male participants were more likely to
express a preference for an equation.
76. VisDict
Visual Dictionaries for Enhancing
the Communication between
Domain Scientists and
Scientific Workflow Providers
12181 acatttctac caacagtgga tgaggttgtt
ggtctatgtt ctcaccaaat ttggtgttgt 12241
cagtctttta aattttaacc tttagagaag agtcatacag
tcaatagcct tttttagctt 12301 gaccatccta
atagatacac agtggtgtct cactgtgatt ttaatttgca
ttttcctgct 12361 gactaattat gttgagcttg
ttaccattta gacaacttca ttagagaagt gtctaatatt
12421 taggtgactt gcctgttttt ttttaattgg
gatcttaatt tttttaaatt attgatttgt 12481
aggagctatt tatatattct ggatacaagt tctttatcag
atacacagtt tgtgactatt 12541 ttcttataag
tctgtggttt ttatattaat gtttttattg atgactgttt
tttacaattg 12601 tggttaagta tacatgacat
aaaacggatt atcttaacca ttttaaaatg taaaattcga
12661 tggcattaag tacatccaca atattgtgca
actatcacca ctatcatact ccaaaagggc 12721
atccaatacc cattaagctg tcactcccca atctcccatt
ttcccacccc tgacaatcaa 12781 taacccattt
tctgtctcta tggatttgcc tgttctggat attcatatta
atagaatcaa
Workflows
A sequence of connected steps in a defined order
based on their control and data dependencies
77. VisDict
Visual Dictionaries for Enhancing
the Communication between
Domain Scientists and
Scientific Workflow Providers
Workflows
A sequence of connected steps in a defined order
based on their control and data dependencies
Computational workflow systems serve
research domains
80. VisDict
Visual Dictionaries for Enhancing
the Communication between
Domain Scientists and
Scientific Workflow Providers
Why VisDict?
Creation of workflows requires an
understanding of the targeted problem and
can be a labor-intensive and error-prone
process.
Communication one source of error
• scientific background
• native language
• culture of the researchers
For natural languages often translators are
involved – translation between domains has
a lack of tools
81. VisDict
Visual Dictionaries for Enhancing
the Communication between
Domain Scientists and
Scientific Workflow Providers
“A picture is worth a thousand words.”
Activities
• Surveys
• Dictionary as a
service
• Science gateway
83. Chicago
Hopes for Kids
4 Foundations
Division
H4F Raj Chetty, The Equality of Opportunity Project, Stanford University
“Lost Einsteins”, Innovation and Opportunity in America
• Innovation fuels economic growth
• It has slowed down in the US since 1970s
• Because of inequality the US may have
missed millions of innovators
Lost Einsteins…
84. Chicago
Hopes for Kids
4 Foundations
Division
H4F
National Model for Long-Term Support of High-
Potential Kids in Unstable Housing
• 7 Continents in 7 Weeks
• The World of Computing
• Science gateway
Lost Einsteins…
85. Chicago
Hopes for Kids
4 Foundations
Division
H4F
National Model for Long-Term Support of High-
Potential Kids in Unstable Housing
• 7 Continents in 7 Weeks
• The World of Computing
• Science gateway
Lost Einsteins…
86. Chicago
Hopes for Kids
4 Foundations
Division
H4F
National Model for Long-Term Support of High-
Potential Kids in Unstable Housing
• 7 Continents in 7 Weeks
• The World of Computing
• Science gateway
Lost Einsteins…
87. Outlook
Science Gateways
• Broader engagement
• Education with K12
• More citizen science
• Accessibility
• More open available methods
• Novel computing architectures
• Quantum computing
• Social sciences
• Machine learning
• Virtual reality
Gesing, S., et al. "A Vision for Science Gateways: Bridging the Gap and Broadening the Outreach" In
Practice and Experience in Advanced Research Computing (PEARC ’21), July 18–22, 2021, Virtual
Conference, ACM, New York, NY, USA, 8 pages, to appear.
88. Outlook
Changing
Academic Culture
• Centers of Excellence
• Community of Communities
• RSEs
• HPC Facilitators
• Research Computing Professionals
• Research Data Professionals
Katz, D. S., McInnes, L.C., Bernholdt, D.E., Mayes, A.C., Hong, N.P.C, Duckles, J., Gesing, S., Heroux, M.A., Hettrick,
S., Jimenez, R.C., Pierce, M., Weaver, B. and Wilkins-Diehr, N. Community Organizations: Changing the Culture in
Which Research Software Is Developed and Sustained. Computing in Science & Engineering,