SlideShare una empresa de Scribd logo
1 de 9
Team Argon
“A Commons Platform for Promoting Continuous FAIRness”
NIH Data Commons Pilot
Globus, University of Chicago
University of Southern California
Contact: Ian Foster, foster@uchicago.edu
PIs: Kyle Chard, Ian Foster, Carl Kesselman, Ravi Madduri
Three big picture themes
• Continuous FAIRness: Make all data findable, accessible,
interoperable, reusable at every stage, via pervasive use of
simple identifier and exchange format conventions
• Build on proven security, data, and computation building
blocks that have large user communities inside and outside
biomedicine (see subsequent slides for details)
• Solutions leverage industry best practices and professional
services team to meet scalability, interoperability,
sustainability, and reliability needs
App
(Client)
Service
(Resource Server)
Service
(Resource Server)
Globus Auth: A foundational service for an
authentication and authorization ecosystem
• A flexible security infrastructure that can be used across the Commons
• Enables federation across services using arbitrary linked identities (e.g., @gmail
@xsede @uchicago)
• Facilitates secure/authorized communication between users, services, clients
• Supports arbitrary clients including REST, web, command line, software
• Flexible token management
• Secure sharing between services
• Fine-grain user consents and revocation
Service
(Resource Server)
Resource
Owner
Resource
server operator
App
(Client)
3
https://docs.globus.org/api/auth/
Standards-based, reliable, performant data management
• Globus Connect Server: S3-compatible
HTTP/OAuth interface for secure
client-server transfer
• Endpoints have DNS names
• Globus Transfer: Managed, high-
performance, secure, reliable bulk
asynchronous transfer
• In-place data sharing with flexible and
secure ACLs
• Standards compliant
• S3, OAuth, OIDC, HTTP, GridFTP
4
https://docs.globus.org/api/transfer/
Interoperability: naming and exchange
Minid
• Lightweight identifiers for any product
at any stage
• Easily created, dereferenced, validated
• Global integrity – validate content
across the commons
BDBag
• Self-describing and flexible format
for exchange
• Extended BagIt Specification
• Standard manifest representation
that supports different protocols
Data
Metadata
File1 2AG230..
File2 A31FDC.. FTP
File3 D0F142.. HTTP
…
Minid 001
Minid 007
Minid 719
http://minid.bd2k.org http://bd2k.ini.usc.edu/tools/bdbag/
Infrastructure
My Workspace
• Workspaces bring together data and tools
• Infrastructure designed for scalability and portability
• Leverages
• Federated identities & access control
• Secure access to distributed data
• Data interoperability, exchange
• Provenance
• Tracking activity around data
• By whom? With what?
• Publication & sharing of tools
and workflows
• Cost aware resource allocation for
both compute and data movement
Workspaces: Scalable compute for distributed data
Data Tools
6
Search, navigation, and virtual cohorts
• DERIVA: Digital asset management for heterogeneous data
• Organize, navigate, discover interrelated objects (e.g., assays from a sample over time)
• REST interface
• Entity/Relation model for organizing data
• Supports various DCPPC metadata models
• Fine grain access control to support diverse
collaboration models
• Model evolution to enable continuous
publication, diverse, heterogeneous use cases
• Model driven user interface that
self-configures to current data model
• Integration with Globus Auth, Minids, BDBags, and other components
• Complements Globus Search: Access-controlled search of derived data products
7
Workspace Manager
Bags Workspaces Pipelines
minid_1 Galaxy GTExRNA
minid_2 Jupyter GATKVar
minid_3 RStudio
UCSC
GTEx
TOPMed
MOD
User catalogs
User catalogs
User catalogs
User catalogs
Search
Analyze Visualize
Publish & Reproduce
Discover
Uniform,
secure,
reliable
access to
storage
Virtual cohorts
in standard
manifest with
lightweight ID
Uniform search
across multiple
data sources
All results tracked via standard
manifest and lightweight IDs
Workspaces support Jupyter and
Galaxy on different clouds
Publication
assigns DOIs
and indexes
datasets
Integrating scenario
Summary: Reusable components include...
• Globus Connect Server for data access, transfer, and sharing
• HTTP/S3 access to many storage systems (Posix, object store, etc.)
• GridFTP for managed, reliable, secure, efficient transfers
• Integration with Globus Auth for authentication and authorization
• Offers: A universal storage API
• Globus Auth for securing all REST API interactions
• OAuth2 and OIDC + fine-grained consents and revocation
• Offers: A universal authentication and authorization API
• BDBag (“big data bags”: profiles on BagIt) tools
• BagIt specification with profiles for “holey bags”, etc.
• Offers: Common manifest for exchange of query results, virtual cohorts
• Identifier service for creating lightweight identifiers
• ARKs, created on demand, associated checksum, simple metadata
• Offers: Common mechanism for naming and tracking derived data products9

Más contenido relacionado

La actualidad más candente

An Approach for RDF-based Semantic Access to NoSQL Repositories
An Approach for RDF-based Semantic Access to NoSQL RepositoriesAn Approach for RDF-based Semantic Access to NoSQL Repositories
An Approach for RDF-based Semantic Access to NoSQL RepositoriesLuiz Henrique Zambom Santana
 
Linked Open Data and DANS
Linked Open Data and DANSLinked Open Data and DANS
Linked Open Data and DANSvty
 
Globus and Dataverse: Towards big Data Publication
Globus and Dataverse: Towards big Data PublicationGlobus and Dataverse: Towards big Data Publication
Globus and Dataverse: Towards big Data PublicationGlobus
 
DataverseNL as structured data hub
DataverseNL as structured data hubDataverseNL as structured data hub
DataverseNL as structured data hubvty
 
Or2019 DSpace 7 Enhanced submission & workflow
Or2019 DSpace 7 Enhanced submission & workflowOr2019 DSpace 7 Enhanced submission & workflow
Or2019 DSpace 7 Enhanced submission & workflow4Science
 
6.15.17 DSpace-Cris Webinar Presentation Slides
6.15.17 DSpace-Cris Webinar Presentation Slides6.15.17 DSpace-Cris Webinar Presentation Slides
6.15.17 DSpace-Cris Webinar Presentation SlidesDuraSpace
 
iRODS UGM 2018 Fair data management and DISQOVERability
iRODS UGM 2018 Fair data management and DISQOVERabilityiRODS UGM 2018 Fair data management and DISQOVERability
iRODS UGM 2018 Fair data management and DISQOVERabilityMaarten Coonen
 
Architecting An Enterprise Storage Platform Using Object Stores
Architecting An Enterprise Storage Platform Using Object StoresArchitecting An Enterprise Storage Platform Using Object Stores
Architecting An Enterprise Storage Platform Using Object StoresNiraj Tolia
 
Leverage DSpace for an enterprise, mission critical platform
Leverage DSpace for an enterprise, mission critical platformLeverage DSpace for an enterprise, mission critical platform
Leverage DSpace for an enterprise, mission critical platformAndrea Bollini
 
DSpace-CRIS & OpenAIRE
DSpace-CRIS & OpenAIREDSpace-CRIS & OpenAIRE
DSpace-CRIS & OpenAIRE4Science
 
Webinar Slides: Tungsten Replicator for Elasticsearch - Real-time data loadin...
Webinar Slides: Tungsten Replicator for Elasticsearch - Real-time data loadin...Webinar Slides: Tungsten Replicator for Elasticsearch - Real-time data loadin...
Webinar Slides: Tungsten Replicator for Elasticsearch - Real-time data loadin...Continuent
 
API economy
API economyAPI economy
API economyvty
 
CPaaS.io Y1 Review Meeting - Holistic Data Management
CPaaS.io Y1 Review Meeting - Holistic Data ManagementCPaaS.io Y1 Review Meeting - Holistic Data Management
CPaaS.io Y1 Review Meeting - Holistic Data ManagementStephan Haller
 
BlogMyData at AllHands 2010
BlogMyData at AllHands 2010BlogMyData at AllHands 2010
BlogMyData at AllHands 2010Andrew Milsted
 
Webinar: Utilisations courantes de MongoDB
Webinar: Utilisations courantes de MongoDBWebinar: Utilisations courantes de MongoDB
Webinar: Utilisations courantes de MongoDBMongoDB
 
Giovane Moura - Cybersecurity voor .nl
Giovane Moura - Cybersecurity voor .nlGiovane Moura - Cybersecurity voor .nl
Giovane Moura - Cybersecurity voor .nlMichiel Cazemier
 

La actualidad más candente (20)

A4 r overview deck_1.7
A4 r overview deck_1.7A4 r overview deck_1.7
A4 r overview deck_1.7
 
An Approach for RDF-based Semantic Access to NoSQL Repositories
An Approach for RDF-based Semantic Access to NoSQL RepositoriesAn Approach for RDF-based Semantic Access to NoSQL Repositories
An Approach for RDF-based Semantic Access to NoSQL Repositories
 
Linked Open Data and DANS
Linked Open Data and DANSLinked Open Data and DANS
Linked Open Data and DANS
 
Globus and Dataverse: Towards big Data Publication
Globus and Dataverse: Towards big Data PublicationGlobus and Dataverse: Towards big Data Publication
Globus and Dataverse: Towards big Data Publication
 
DataverseNL as structured data hub
DataverseNL as structured data hubDataverseNL as structured data hub
DataverseNL as structured data hub
 
Or2019 DSpace 7 Enhanced submission & workflow
Or2019 DSpace 7 Enhanced submission & workflowOr2019 DSpace 7 Enhanced submission & workflow
Or2019 DSpace 7 Enhanced submission & workflow
 
6.15.17 DSpace-Cris Webinar Presentation Slides
6.15.17 DSpace-Cris Webinar Presentation Slides6.15.17 DSpace-Cris Webinar Presentation Slides
6.15.17 DSpace-Cris Webinar Presentation Slides
 
iRODS UGM 2018 Fair data management and DISQOVERability
iRODS UGM 2018 Fair data management and DISQOVERabilityiRODS UGM 2018 Fair data management and DISQOVERability
iRODS UGM 2018 Fair data management and DISQOVERability
 
Architecting An Enterprise Storage Platform Using Object Stores
Architecting An Enterprise Storage Platform Using Object StoresArchitecting An Enterprise Storage Platform Using Object Stores
Architecting An Enterprise Storage Platform Using Object Stores
 
Leverage DSpace for an enterprise, mission critical platform
Leverage DSpace for an enterprise, mission critical platformLeverage DSpace for an enterprise, mission critical platform
Leverage DSpace for an enterprise, mission critical platform
 
Psicquic applications
Psicquic applicationsPsicquic applications
Psicquic applications
 
DSpace-CRIS & OpenAIRE
DSpace-CRIS & OpenAIREDSpace-CRIS & OpenAIRE
DSpace-CRIS & OpenAIRE
 
Survey on NoSQL integration
Survey on NoSQL integrationSurvey on NoSQL integration
Survey on NoSQL integration
 
Webinar Slides: Tungsten Replicator for Elasticsearch - Real-time data loadin...
Webinar Slides: Tungsten Replicator for Elasticsearch - Real-time data loadin...Webinar Slides: Tungsten Replicator for Elasticsearch - Real-time data loadin...
Webinar Slides: Tungsten Replicator for Elasticsearch - Real-time data loadin...
 
API economy
API economyAPI economy
API economy
 
CPaaS.io Y1 Review Meeting - Holistic Data Management
CPaaS.io Y1 Review Meeting - Holistic Data ManagementCPaaS.io Y1 Review Meeting - Holistic Data Management
CPaaS.io Y1 Review Meeting - Holistic Data Management
 
BlogMyData at AllHands 2010
BlogMyData at AllHands 2010BlogMyData at AllHands 2010
BlogMyData at AllHands 2010
 
Webinar: Utilisations courantes de MongoDB
Webinar: Utilisations courantes de MongoDBWebinar: Utilisations courantes de MongoDB
Webinar: Utilisations courantes de MongoDB
 
II-SDV 2016 RightsDirect
II-SDV 2016 RightsDirectII-SDV 2016 RightsDirect
II-SDV 2016 RightsDirect
 
Giovane Moura - Cybersecurity voor .nl
Giovane Moura - Cybersecurity voor .nlGiovane Moura - Cybersecurity voor .nl
Giovane Moura - Cybersecurity voor .nl
 

Similar a NIH Data Commons Architecture Ideas

Introduction to Globus for New Users (GlobusWorld Tour - Columbia University)
Introduction to Globus for New Users (GlobusWorld Tour - Columbia University)Introduction to Globus for New Users (GlobusWorld Tour - Columbia University)
Introduction to Globus for New Users (GlobusWorld Tour - Columbia University)Globus
 
Simplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus PlatformSimplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus PlatformGlobus
 
Introduction to Globus for New Users
Introduction to Globus for New UsersIntroduction to Globus for New Users
Introduction to Globus for New UsersGlobus
 
Introduction to Globus for New Users (GlobusWorld Tour - UCSD)
Introduction to Globus for New Users (GlobusWorld Tour - UCSD)Introduction to Globus for New Users (GlobusWorld Tour - UCSD)
Introduction to Globus for New Users (GlobusWorld Tour - UCSD)Globus
 
Scalable Data Management: Automation and the Modern Research Data Portal
Scalable Data Management: Automation and the Modern Research Data PortalScalable Data Management: Automation and the Modern Research Data Portal
Scalable Data Management: Automation and the Modern Research Data PortalGlobus
 
What's New With Globus
What's New With GlobusWhat's New With Globus
What's New With GlobusGlobus
 
Managing Protected and Controlled Data with Globus
Managing Protected and Controlled Data with Globus Managing Protected and Controlled Data with Globus
Managing Protected and Controlled Data with Globus Globus
 
Globus: Beyond File Transfer
Globus: Beyond File TransferGlobus: Beyond File Transfer
Globus: Beyond File TransferGlobus
 
Introduction to Globus
Introduction to GlobusIntroduction to Globus
Introduction to GlobusGlobus
 
Globus: A Data Management Platform for Collaborative Research (CHPC 2019 - So...
Globus: A Data Management Platform for Collaborative Research (CHPC 2019 - So...Globus: A Data Management Platform for Collaborative Research (CHPC 2019 - So...
Globus: A Data Management Platform for Collaborative Research (CHPC 2019 - So...Globus
 
Introduction to the Globus SaaS (GlobusWorld Tour - STFC)
Introduction to the Globus SaaS (GlobusWorld Tour - STFC)Introduction to the Globus SaaS (GlobusWorld Tour - STFC)
Introduction to the Globus SaaS (GlobusWorld Tour - STFC)Globus
 
Facilitating Collaboration with Globus (GlobusWorld Tour - STFC)
Facilitating Collaboration with Globus (GlobusWorld Tour - STFC)Facilitating Collaboration with Globus (GlobusWorld Tour - STFC)
Facilitating Collaboration with Globus (GlobusWorld Tour - STFC)Globus
 
Introduction to Globus (GlobusWorld Tour West)
Introduction to Globus (GlobusWorld Tour West)Introduction to Globus (GlobusWorld Tour West)
Introduction to Globus (GlobusWorld Tour West)Globus
 
Introduction to Globus (APS Workshop)
Introduction to Globus (APS Workshop)Introduction to Globus (APS Workshop)
Introduction to Globus (APS Workshop)Globus
 
Globus High Assurance for Protected Data (GlobusWorld Tour - UCSD)
Globus High Assurance for Protected Data (GlobusWorld Tour - UCSD)Globus High Assurance for Protected Data (GlobusWorld Tour - UCSD)
Globus High Assurance for Protected Data (GlobusWorld Tour - UCSD)Globus
 
Introduction to Globus - XSEDE14 Tutorial
Introduction to Globus - XSEDE14 TutorialIntroduction to Globus - XSEDE14 Tutorial
Introduction to Globus - XSEDE14 TutorialGlobus
 
Introduction to Globus (GlobusWorld Tour - UMich)
Introduction to Globus (GlobusWorld Tour - UMich)Introduction to Globus (GlobusWorld Tour - UMich)
Introduction to Globus (GlobusWorld Tour - UMich)Globus
 
GlobusWorld 2021 Tutorial: Introduction to Globus
GlobusWorld 2021 Tutorial: Introduction to GlobusGlobusWorld 2021 Tutorial: Introduction to Globus
GlobusWorld 2021 Tutorial: Introduction to GlobusGlobus
 
KoprowskiT_session1_SDNEvent_WASDforBeginners
KoprowskiT_session1_SDNEvent_WASDforBeginnersKoprowskiT_session1_SDNEvent_WASDforBeginners
KoprowskiT_session1_SDNEvent_WASDforBeginnersTobias Koprowski
 

Similar a NIH Data Commons Architecture Ideas (20)

Introduction to Globus for New Users (GlobusWorld Tour - Columbia University)
Introduction to Globus for New Users (GlobusWorld Tour - Columbia University)Introduction to Globus for New Users (GlobusWorld Tour - Columbia University)
Introduction to Globus for New Users (GlobusWorld Tour - Columbia University)
 
Simplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus PlatformSimplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus Platform
 
Introduction to Globus for New Users
Introduction to Globus for New UsersIntroduction to Globus for New Users
Introduction to Globus for New Users
 
Introduction to Globus for New Users (GlobusWorld Tour - UCSD)
Introduction to Globus for New Users (GlobusWorld Tour - UCSD)Introduction to Globus for New Users (GlobusWorld Tour - UCSD)
Introduction to Globus for New Users (GlobusWorld Tour - UCSD)
 
Scalable Data Management: Automation and the Modern Research Data Portal
Scalable Data Management: Automation and the Modern Research Data PortalScalable Data Management: Automation and the Modern Research Data Portal
Scalable Data Management: Automation and the Modern Research Data Portal
 
What's New With Globus
What's New With GlobusWhat's New With Globus
What's New With Globus
 
Managing Protected and Controlled Data with Globus
Managing Protected and Controlled Data with Globus Managing Protected and Controlled Data with Globus
Managing Protected and Controlled Data with Globus
 
Globus: Beyond File Transfer
Globus: Beyond File TransferGlobus: Beyond File Transfer
Globus: Beyond File Transfer
 
Introduction to Globus
Introduction to GlobusIntroduction to Globus
Introduction to Globus
 
Globus: A Data Management Platform for Collaborative Research (CHPC 2019 - So...
Globus: A Data Management Platform for Collaborative Research (CHPC 2019 - So...Globus: A Data Management Platform for Collaborative Research (CHPC 2019 - So...
Globus: A Data Management Platform for Collaborative Research (CHPC 2019 - So...
 
Introduction to the Globus SaaS (GlobusWorld Tour - STFC)
Introduction to the Globus SaaS (GlobusWorld Tour - STFC)Introduction to the Globus SaaS (GlobusWorld Tour - STFC)
Introduction to the Globus SaaS (GlobusWorld Tour - STFC)
 
Facilitating Collaboration with Globus (GlobusWorld Tour - STFC)
Facilitating Collaboration with Globus (GlobusWorld Tour - STFC)Facilitating Collaboration with Globus (GlobusWorld Tour - STFC)
Facilitating Collaboration with Globus (GlobusWorld Tour - STFC)
 
Introduction to Globus (GlobusWorld Tour West)
Introduction to Globus (GlobusWorld Tour West)Introduction to Globus (GlobusWorld Tour West)
Introduction to Globus (GlobusWorld Tour West)
 
Introduction to Globus (APS Workshop)
Introduction to Globus (APS Workshop)Introduction to Globus (APS Workshop)
Introduction to Globus (APS Workshop)
 
Document Management System
Document Management System Document Management System
Document Management System
 
Globus High Assurance for Protected Data (GlobusWorld Tour - UCSD)
Globus High Assurance for Protected Data (GlobusWorld Tour - UCSD)Globus High Assurance for Protected Data (GlobusWorld Tour - UCSD)
Globus High Assurance for Protected Data (GlobusWorld Tour - UCSD)
 
Introduction to Globus - XSEDE14 Tutorial
Introduction to Globus - XSEDE14 TutorialIntroduction to Globus - XSEDE14 Tutorial
Introduction to Globus - XSEDE14 Tutorial
 
Introduction to Globus (GlobusWorld Tour - UMich)
Introduction to Globus (GlobusWorld Tour - UMich)Introduction to Globus (GlobusWorld Tour - UMich)
Introduction to Globus (GlobusWorld Tour - UMich)
 
GlobusWorld 2021 Tutorial: Introduction to Globus
GlobusWorld 2021 Tutorial: Introduction to GlobusGlobusWorld 2021 Tutorial: Introduction to Globus
GlobusWorld 2021 Tutorial: Introduction to Globus
 
KoprowskiT_session1_SDNEvent_WASDforBeginners
KoprowskiT_session1_SDNEvent_WASDforBeginnersKoprowskiT_session1_SDNEvent_WASDforBeginners
KoprowskiT_session1_SDNEvent_WASDforBeginners
 

Más de Ian Foster

Global Services for Global Science March 2023.pptx
Global Services for Global Science March 2023.pptxGlobal Services for Global Science March 2023.pptx
Global Services for Global Science March 2023.pptxIan Foster
 
The Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, EvolutionThe Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, EvolutionIan Foster
 
Better Information Faster: Programming the Continuum
Better Information Faster: Programming the ContinuumBetter Information Faster: Programming the Continuum
Better Information Faster: Programming the ContinuumIan Foster
 
ESnet6 and Smart Instruments
ESnet6 and Smart InstrumentsESnet6 and Smart Instruments
ESnet6 and Smart InstrumentsIan Foster
 
Linking Scientific Instruments and Computation
Linking Scientific Instruments and ComputationLinking Scientific Instruments and Computation
Linking Scientific Instruments and ComputationIan Foster
 
A Global Research Data Platform: How Globus Services Enable Scientific Discovery
A Global Research Data Platform: How Globus Services Enable Scientific DiscoveryA Global Research Data Platform: How Globus Services Enable Scientific Discovery
A Global Research Data Platform: How Globus Services Enable Scientific DiscoveryIan Foster
 
Foster CRA March 2022.pptx
Foster CRA March 2022.pptxFoster CRA March 2022.pptx
Foster CRA March 2022.pptxIan Foster
 
Big Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental ScienceBig Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental ScienceIan Foster
 
AI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryAI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryIan Foster
 
Coding the Continuum
Coding the ContinuumCoding the Continuum
Coding the ContinuumIan Foster
 
Data Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud AutomationData Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud AutomationIan Foster
 
Research Automation for Data-Driven Discovery
Research Automation for Data-Driven DiscoveryResearch Automation for Data-Driven Discovery
Research Automation for Data-Driven DiscoveryIan Foster
 
Scaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and JupyterScaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and JupyterIan Foster
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for ScienceIan Foster
 
Data Automation at Light Sources
Data Automation at Light SourcesData Automation at Light Sources
Data Automation at Light SourcesIan Foster
 
Team Argon Summary
Team Argon SummaryTeam Argon Summary
Team Argon SummaryIan Foster
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...Ian Foster
 
Going Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCFGoing Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCFIan Foster
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...Ian Foster
 
Software Infrastructure for a National Research Platform
Software Infrastructure for a National Research PlatformSoftware Infrastructure for a National Research Platform
Software Infrastructure for a National Research PlatformIan Foster
 

Más de Ian Foster (20)

Global Services for Global Science March 2023.pptx
Global Services for Global Science March 2023.pptxGlobal Services for Global Science March 2023.pptx
Global Services for Global Science March 2023.pptx
 
The Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, EvolutionThe Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, Evolution
 
Better Information Faster: Programming the Continuum
Better Information Faster: Programming the ContinuumBetter Information Faster: Programming the Continuum
Better Information Faster: Programming the Continuum
 
ESnet6 and Smart Instruments
ESnet6 and Smart InstrumentsESnet6 and Smart Instruments
ESnet6 and Smart Instruments
 
Linking Scientific Instruments and Computation
Linking Scientific Instruments and ComputationLinking Scientific Instruments and Computation
Linking Scientific Instruments and Computation
 
A Global Research Data Platform: How Globus Services Enable Scientific Discovery
A Global Research Data Platform: How Globus Services Enable Scientific DiscoveryA Global Research Data Platform: How Globus Services Enable Scientific Discovery
A Global Research Data Platform: How Globus Services Enable Scientific Discovery
 
Foster CRA March 2022.pptx
Foster CRA March 2022.pptxFoster CRA March 2022.pptx
Foster CRA March 2022.pptx
 
Big Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental ScienceBig Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental Science
 
AI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryAI at Scale for Materials and Chemistry
AI at Scale for Materials and Chemistry
 
Coding the Continuum
Coding the ContinuumCoding the Continuum
Coding the Continuum
 
Data Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud AutomationData Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud Automation
 
Research Automation for Data-Driven Discovery
Research Automation for Data-Driven DiscoveryResearch Automation for Data-Driven Discovery
Research Automation for Data-Driven Discovery
 
Scaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and JupyterScaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and Jupyter
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
Data Automation at Light Sources
Data Automation at Light SourcesData Automation at Light Sources
Data Automation at Light Sources
 
Team Argon Summary
Team Argon SummaryTeam Argon Summary
Team Argon Summary
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
 
Going Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCFGoing Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCF
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
 
Software Infrastructure for a National Research Platform
Software Infrastructure for a National Research PlatformSoftware Infrastructure for a National Research Platform
Software Infrastructure for a National Research Platform
 

Último

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 

Último (20)

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 

NIH Data Commons Architecture Ideas

  • 1. Team Argon “A Commons Platform for Promoting Continuous FAIRness” NIH Data Commons Pilot Globus, University of Chicago University of Southern California Contact: Ian Foster, foster@uchicago.edu PIs: Kyle Chard, Ian Foster, Carl Kesselman, Ravi Madduri
  • 2. Three big picture themes • Continuous FAIRness: Make all data findable, accessible, interoperable, reusable at every stage, via pervasive use of simple identifier and exchange format conventions • Build on proven security, data, and computation building blocks that have large user communities inside and outside biomedicine (see subsequent slides for details) • Solutions leverage industry best practices and professional services team to meet scalability, interoperability, sustainability, and reliability needs
  • 3. App (Client) Service (Resource Server) Service (Resource Server) Globus Auth: A foundational service for an authentication and authorization ecosystem • A flexible security infrastructure that can be used across the Commons • Enables federation across services using arbitrary linked identities (e.g., @gmail @xsede @uchicago) • Facilitates secure/authorized communication between users, services, clients • Supports arbitrary clients including REST, web, command line, software • Flexible token management • Secure sharing between services • Fine-grain user consents and revocation Service (Resource Server) Resource Owner Resource server operator App (Client) 3 https://docs.globus.org/api/auth/
  • 4. Standards-based, reliable, performant data management • Globus Connect Server: S3-compatible HTTP/OAuth interface for secure client-server transfer • Endpoints have DNS names • Globus Transfer: Managed, high- performance, secure, reliable bulk asynchronous transfer • In-place data sharing with flexible and secure ACLs • Standards compliant • S3, OAuth, OIDC, HTTP, GridFTP 4 https://docs.globus.org/api/transfer/
  • 5. Interoperability: naming and exchange Minid • Lightweight identifiers for any product at any stage • Easily created, dereferenced, validated • Global integrity – validate content across the commons BDBag • Self-describing and flexible format for exchange • Extended BagIt Specification • Standard manifest representation that supports different protocols Data Metadata File1 2AG230.. File2 A31FDC.. FTP File3 D0F142.. HTTP … Minid 001 Minid 007 Minid 719 http://minid.bd2k.org http://bd2k.ini.usc.edu/tools/bdbag/
  • 6. Infrastructure My Workspace • Workspaces bring together data and tools • Infrastructure designed for scalability and portability • Leverages • Federated identities & access control • Secure access to distributed data • Data interoperability, exchange • Provenance • Tracking activity around data • By whom? With what? • Publication & sharing of tools and workflows • Cost aware resource allocation for both compute and data movement Workspaces: Scalable compute for distributed data Data Tools 6
  • 7. Search, navigation, and virtual cohorts • DERIVA: Digital asset management for heterogeneous data • Organize, navigate, discover interrelated objects (e.g., assays from a sample over time) • REST interface • Entity/Relation model for organizing data • Supports various DCPPC metadata models • Fine grain access control to support diverse collaboration models • Model evolution to enable continuous publication, diverse, heterogeneous use cases • Model driven user interface that self-configures to current data model • Integration with Globus Auth, Minids, BDBags, and other components • Complements Globus Search: Access-controlled search of derived data products 7
  • 8. Workspace Manager Bags Workspaces Pipelines minid_1 Galaxy GTExRNA minid_2 Jupyter GATKVar minid_3 RStudio UCSC GTEx TOPMed MOD User catalogs User catalogs User catalogs User catalogs Search Analyze Visualize Publish & Reproduce Discover Uniform, secure, reliable access to storage Virtual cohorts in standard manifest with lightweight ID Uniform search across multiple data sources All results tracked via standard manifest and lightweight IDs Workspaces support Jupyter and Galaxy on different clouds Publication assigns DOIs and indexes datasets Integrating scenario
  • 9. Summary: Reusable components include... • Globus Connect Server for data access, transfer, and sharing • HTTP/S3 access to many storage systems (Posix, object store, etc.) • GridFTP for managed, reliable, secure, efficient transfers • Integration with Globus Auth for authentication and authorization • Offers: A universal storage API • Globus Auth for securing all REST API interactions • OAuth2 and OIDC + fine-grained consents and revocation • Offers: A universal authentication and authorization API • BDBag (“big data bags”: profiles on BagIt) tools • BagIt specification with profiles for “holey bags”, etc. • Offers: Common manifest for exchange of query results, virtual cohorts • Identifier service for creating lightweight identifiers • ARKs, created on demand, associated checksum, simple metadata • Offers: Common mechanism for naming and tracking derived data products9