SlideShare una empresa de Scribd logo
1 de 35
Scalable Data Management:
Automation and the Modern
Research Data Portal
Vas Vasiliadis
vas@uchicago.edu
University of Chicago
March 22, 2021
Warning: We may have unregistered attendees…
2
Globus is …
a non-profit service
developed and operated by
3
Our mission is to…
increase the efficiency and
effectiveness of researchers
engaged in data-driven
science and scholarship
through sustainable software
4
Thank you, funders...
U . S . D E P A R T M E N T O F
ENERGY
5
Thank you, subscribers!
Globus SaaS / PaaS: Research data lifecycle
Researcher initiates
transfer request; or
requested automatically
by script, science
gateway
1
Instrument
Compute Facility
Globus transfers files
reliably, securely
2
Globus controls
access to shared
files on existing
storage; no need
to move files to
cloud storage!
4
Researcher
selects files to
share, selects
user or group,
and sets access
permissions
3
Collaborator logs in to
Globus and accesses
shared files; no local
account required;
download via Globus
5
Automating research
workflows and
ensuring those that
need access to the
data have it.
8
Personal Computer
Transfer
Share
Build
The Globus
Command Line
Interface, API sets,
and Python SDK
provide a platform…
6
… for building
science gateways,
portals and
publication services.
7
Use(r)-appropriate interfaces
GET /endpoint/go%23ep1
PUT /endpoint/demodoc#my_endpt
200 OK
X-Transfer-API-Version: 0.10
Content-Type: application/json
…
Globus
service
Web
CLI
Platform
(RESTful APIs)
8
Platform services enable development of…
• Science gateways
• Web portals
• Data commons
• Data-centric
research apps
9
The instruments are coming!
10
Automated instrument data egress
Cryo EM
Lightsheet
Sequencer
ALS/APS
….
Local system
download
Remote analysis,
visualization
• Reliable, near-real time
data access
• Automatically set policy
based permissions
• Self-service access
control, management
• Federated login for
frictionless data access
Local
policy
store
--/cohort045
--/cohort096
--/cohort127
Local or
cloud strage
11
Today’s exemplar: Advanced Photon Source
• 2-D, 3-D imaging
• 2016: ~112TB/month
• 100x – 1,000x growth
12
Petrel Data Portal: Data collaboration at scale
13
petreldata.alcf.anl.gov
How do I get one of those?
14
A good starting reference is the MRDP
15
MRDP: Key elements
Science DMZ
Fast, clean data path
Data Transfer Node
Purpose-built data mover
Globus Platform
Secure, reliable data
orchestration
Globus Connect
Storage system enabler
16
What’s wrong with my LRDP?
17
LRDP architecture
18
Source: ESnet Science Engagement team
MRDP network architecture
19
Source:
ESnet
Science
Engagement
team
The Data Transfer Node
20
Source:
ESnet
Science
Engagement
team
fasterdata.es.net/science-dmz/DTN/reference-implementation
…makes diverse
storage systems
accessible via
Globus
Globus Connectors support diverse systems
ActiveScale
Object
Storage
Instrument data orchestration
• Authentication and Authorization
• Data description and discovery
• Data access and transfer
• Data and compute orchestration
23
Auth Search Transfer Groups Automate
Globus Auth: Foundational IAM service
Brokers authentication and authorization among…
– End-users
– Identity providers: enterprise, external (federated identities)
– Services: resource servers with REST APIs
– Apps: web, mobile, desktop, command line clients
– Services acting as clients to other services
• OAuth 2.0 Authorization Framework (a.k.a. OAuth2)
• OpenID Connect Core 1.0 (a.k.a. OIDC)
Auth
24
25
Diverse use cases
Support a range of assurance
levels, authentication policy
and access control
Auth
Step 0: Application registration
• Set desired scopes
• Set callback URL
• Get client ID and secret
• Consents implement
least privileges principle
26
Auth
developers.globus.org
Data description and discovery
• (Meta)data store with fine-
grained visibility controls
• Schema agnostic
 dynamic schemas
• Simple search using URL
query parameters
• Complex search using
search request document
27
docs.globus.org/api/search
Search
Index
Search
Data ingest with Globus Search
28
Search
Index
POST /index/{index_id}/ingest'
Search
{
"ingest_type": "GMetaList",
"ingest_data": {
"gmeta": [
{
"id": "filetype",
"subject”: "https://search.api.globus.org/abc.txt",
"visible_to": ["public"],
"content": {
"metadata-schema/file#type": "file”
}
},
...
]
}
Data ingest with Globus Search
29
Search
Index
POST /index/{index_id}/ingest'
Search
{
"ingest_type": "GMetaList",
"ingest_data": {
"gmeta": [
{
"id": "size",
"subject": "https://search.api.globus.org/abc.txt",
"visible_to": ["urn:globus:auth:identity:46bd0f56-
e24f-11e5-a510-131bef46955c"],
"content": {
"metadata-schema/file#size": "1000000",
"metadata-schema/file#size_human": "1MB”
}
},
...
]
}
Visibility limited to Globus Auth identity
- Single user
- Globus Group
- Registered client application
Data discovery with Globus Search
30
{
"@datatype": "GSearchResult",
"@version": "2017-09-01",
"count": 1,
"gmeta": [
{
"@datatype": "GMetaResult",
"@version": "2019-08-27",
"entries": [
{ ... }
],
"subject": "https://..."
}
],
"offset": 0,
"total": 1
}
GET /index/{index_id}/search?q=type%3Ahdf5
Search
Index
Simple query
Search
Data discovery with Globus Search
31
POST /index/{index_id}/search
Search
Index
Complex query
{
"filters": [
{
"type": "range",
"field_name": ”pubdate",
"values": [
{
"from": "*",
"to": "2020-12-31"
}
]
}
],
"facets": [
{
"name": "Publication Date",
"field_name": "pubdate",
...
}
]
}
Search
Data Access and Sharing
• Set guest collection access rule
• Check authenticated user’s Group membership
• Submit Transfer task
32
Groups
service
Transfer
service
GET /groups/my_groups
POST /endpoint/{endpoint_id}/access
POST /transfer
Groups
Transfer
Data and compute orchestration
33
Automate
Compute platform service
funcX: FaaS platform for HPC
• funcX endpoints deployed at resources
• Service routes requests to endpoints
• Parsl acquires resources
• Singularity containers run functions
• Globus Auth secures communication
funcX
mrdp.globus.org
docs.globus.org
outreach@globus.org
support@globus.org

Más contenido relacionado

La actualidad más candente

Data sharing with accountability in cloud
Data sharing with accountability in cloudData sharing with accountability in cloud
Data sharing with accountability in cloud
Susheenthiran Sujith
 
Ensuring Distributed Accountability in the Cloud
Ensuring Distributed Accountability in the CloudEnsuring Distributed Accountability in the Cloud
Ensuring Distributed Accountability in the Cloud
Suraj Mehta
 

La actualidad más candente (20)

Globus High Assurance for Protected Data (GlobusWorld Tour - Columbia Univers...
Globus High Assurance for Protected Data (GlobusWorld Tour - Columbia Univers...Globus High Assurance for Protected Data (GlobusWorld Tour - Columbia Univers...
Globus High Assurance for Protected Data (GlobusWorld Tour - Columbia Univers...
 
security in oracle database
security in oracle databasesecurity in oracle database
security in oracle database
 
Security in oracle
Security in oracleSecurity in oracle
Security in oracle
 
Blockchain solution architecture deliverable
Blockchain solution architecture deliverableBlockchain solution architecture deliverable
Blockchain solution architecture deliverable
 
NIX Case Study: ARTIFACTS - A Blockchain Platform for Scientific Research Dat...
NIX Case Study: ARTIFACTS - A Blockchain Platform for Scientific Research Dat...NIX Case Study: ARTIFACTS - A Blockchain Platform for Scientific Research Dat...
NIX Case Study: ARTIFACTS - A Blockchain Platform for Scientific Research Dat...
 
APIs and the IoT - Centaur Technologies
APIs and the IoT - Centaur TechnologiesAPIs and the IoT - Centaur Technologies
APIs and the IoT - Centaur Technologies
 
Apache Airavata SGCI Webinar 8 April 2020
Apache Airavata SGCI Webinar 8 April 2020Apache Airavata SGCI Webinar 8 April 2020
Apache Airavata SGCI Webinar 8 April 2020
 
Globus ppt
Globus pptGlobus ppt
Globus ppt
 
OGSA
OGSAOGSA
OGSA
 
Blockchain for healthcare 2018
Blockchain for healthcare 2018Blockchain for healthcare 2018
Blockchain for healthcare 2018
 
Data sharing with accountability in cloud
Data sharing with accountability in cloudData sharing with accountability in cloud
Data sharing with accountability in cloud
 
Lucene solrrev documentlevelsecurity_rajanimaski_final
Lucene solrrev documentlevelsecurity_rajanimaski_finalLucene solrrev documentlevelsecurity_rajanimaski_final
Lucene solrrev documentlevelsecurity_rajanimaski_final
 
FIWARE Wednesday Webinars - How to Secure FIWARE Architectures
FIWARE Wednesday Webinars - How to Secure FIWARE ArchitecturesFIWARE Wednesday Webinars - How to Secure FIWARE Architectures
FIWARE Wednesday Webinars - How to Secure FIWARE Architectures
 
Scalable policy-aware Linked Data architecture for prIvacy, transparency and ...
Scalable policy-aware Linked Data architecture for prIvacy, transparency and ...Scalable policy-aware Linked Data architecture for prIvacy, transparency and ...
Scalable policy-aware Linked Data architecture for prIvacy, transparency and ...
 
Access Control for Linked Data: Past, Present and Future
Access Control for Linked Data: Past, Present and FutureAccess Control for Linked Data: Past, Present and Future
Access Control for Linked Data: Past, Present and Future
 
Data Sharing: Ensure Accountability Distribution in the Cloud
Data Sharing: Ensure Accountability Distribution in the CloudData Sharing: Ensure Accountability Distribution in the Cloud
Data Sharing: Ensure Accountability Distribution in the Cloud
 
Shared authority based privacy preserving authentication protocol in cloud co...
Shared authority based privacy preserving authentication protocol in cloud co...Shared authority based privacy preserving authentication protocol in cloud co...
Shared authority based privacy preserving authentication protocol in cloud co...
 
FaunaDB security
FaunaDB securityFaunaDB security
FaunaDB security
 
Ensuring Distributed Accountability in the Cloud
Ensuring Distributed Accountability in the CloudEnsuring Distributed Accountability in the Cloud
Ensuring Distributed Accountability in the Cloud
 
Ensuring Distributed Accountability for Data Sharing in the Cloud
Ensuring Distributed Accountability for Data Sharing in the CloudEnsuring Distributed Accountability for Data Sharing in the Cloud
Ensuring Distributed Accountability for Data Sharing in the Cloud
 

Similar a Scalable Data Management: Automation and the Modern Research Data Portal

Advanced Computing Meets Data FAIRness
Advanced Computing Meets Data FAIRnessAdvanced Computing Meets Data FAIRness
Advanced Computing Meets Data FAIRness
Globus
 
Globus Auth: A Research Identity and Access Management Platform
Globus Auth: A Research Identity and Access Management PlatformGlobus Auth: A Research Identity and Access Management Platform
Globus Auth: A Research Identity and Access Management Platform
Ian Foster
 
Automating Research Data Management at Scale with Globus
Automating Research Data Management at Scale with GlobusAutomating Research Data Management at Scale with Globus
Automating Research Data Management at Scale with Globus
Globus
 

Similar a Scalable Data Management: Automation and the Modern Research Data Portal (20)

Introduction to Globus for New Users
Introduction to Globus for New UsersIntroduction to Globus for New Users
Introduction to Globus for New Users
 
Managing Protected and Controlled Data with Globus
Managing Protected and Controlled Data with Globus Managing Protected and Controlled Data with Globus
Managing Protected and Controlled Data with Globus
 
Building Research Applications with Globus PaaS
Building Research Applications with Globus PaaSBuilding Research Applications with Globus PaaS
Building Research Applications with Globus PaaS
 
Advanced Computing Meets Data FAIRness
Advanced Computing Meets Data FAIRnessAdvanced Computing Meets Data FAIRness
Advanced Computing Meets Data FAIRness
 
Introduction to Globus for New Users
Introduction to Globus for New UsersIntroduction to Globus for New Users
Introduction to Globus for New Users
 
Instrument Data Orchestration with Globus Search and Flows
Instrument Data Orchestration with Globus Search and FlowsInstrument Data Orchestration with Globus Search and Flows
Instrument Data Orchestration with Globus Search and Flows
 
Building Data Portals and Science Gateways with Globus
Building Data Portals and Science Gateways with GlobusBuilding Data Portals and Science Gateways with Globus
Building Data Portals and Science Gateways with Globus
 
Introduction to Globus (GlobusWorld Tour West)
Introduction to Globus (GlobusWorld Tour West)Introduction to Globus (GlobusWorld Tour West)
Introduction to Globus (GlobusWorld Tour West)
 
Enabling Secure Data Discoverability (SC21 Tutorial)
Enabling Secure Data Discoverability (SC21 Tutorial)Enabling Secure Data Discoverability (SC21 Tutorial)
Enabling Secure Data Discoverability (SC21 Tutorial)
 
Introduction to Globus for New Users (GlobusWorld Tour - UCSD)
Introduction to Globus for New Users (GlobusWorld Tour - UCSD)Introduction to Globus for New Users (GlobusWorld Tour - UCSD)
Introduction to Globus for New Users (GlobusWorld Tour - UCSD)
 
Globus: Research Data Management as Service and Platform - pearc17
Globus: Research Data Management as Service and Platform - pearc17Globus: Research Data Management as Service and Platform - pearc17
Globus: Research Data Management as Service and Platform - pearc17
 
Jupyter + Globus: The Foundation for Interactive Data Science
Jupyter + Globus: The Foundation for Interactive Data ScienceJupyter + Globus: The Foundation for Interactive Data Science
Jupyter + Globus: The Foundation for Interactive Data Science
 
Simplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus PlatformSimplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus Platform
 
Gateways 2020 Tutorial - Introduction to Globus
Gateways 2020 Tutorial - Introduction to GlobusGateways 2020 Tutorial - Introduction to Globus
Gateways 2020 Tutorial - Introduction to Globus
 
Team Argon Summary
Team Argon SummaryTeam Argon Summary
Team Argon Summary
 
GlobusWorld 2021 Tutorial: Introduction to Globus
GlobusWorld 2021 Tutorial: Introduction to GlobusGlobusWorld 2021 Tutorial: Introduction to Globus
GlobusWorld 2021 Tutorial: Introduction to Globus
 
Globus Auth: A Research Identity and Access Management Platform
Globus Auth: A Research Identity and Access Management PlatformGlobus Auth: A Research Identity and Access Management Platform
Globus Auth: A Research Identity and Access Management Platform
 
Automating Research Data Management at Scale with Globus
Automating Research Data Management at Scale with GlobusAutomating Research Data Management at Scale with Globus
Automating Research Data Management at Scale with Globus
 
Sept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the CloudSept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the Cloud
 
Enduring Impact in Data-Driven Science
Enduring Impact in Data-Driven ScienceEnduring Impact in Data-Driven Science
Enduring Impact in Data-Driven Science
 

Más de Globus

Más de Globus (20)

Advanced Globus System Administration Topics
Advanced Globus System Administration TopicsAdvanced Globus System Administration Topics
Advanced Globus System Administration Topics
 
Instrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a FlowInstrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a Flow
 
Reliable, Remote Computation at All Scales
Reliable, Remote Computation at All ScalesReliable, Remote Computation at All Scales
Reliable, Remote Computation at All Scales
 
Best Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using GlobusBest Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using Globus
 
An Introduction to Globus for Researchers
An Introduction to Globus for ResearchersAn Introduction to Globus for Researchers
An Introduction to Globus for Researchers
 
Introduction to Research Automation with Globus
Introduction to Research Automation with GlobusIntroduction to Research Automation with Globus
Introduction to Research Automation with Globus
 
Globus for System Administrators
Globus for System AdministratorsGlobus for System Administrators
Globus for System Administrators
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System Administrators
 
Introduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for ResearchersIntroduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for Researchers
 
Introduction to the Globus Platform for Developers
Introduction to the Globus Platform for DevelopersIntroduction to the Globus Platform for Developers
Introduction to the Globus Platform for Developers
 
Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)
 
Automating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and ComputeAutomating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and Compute
 
Automating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus PlatformAutomating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus Platform
 
Advanced Globus System Administration
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System Administration
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System Administrators
 
Working with Globus Platform Services and Portals
Working with Globus Platform Services and PortalsWorking with Globus Platform Services and Portals
Working with Globus Platform Services and Portals
 
Globus Automation
Globus AutomationGlobus Automation
Globus Automation
 
Advanced Globus System Administration
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System Administration
 
Introduction to Globus
Introduction to GlobusIntroduction to Globus
Introduction to Globus
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System Administrators
 

Último

Último (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

Scalable Data Management: Automation and the Modern Research Data Portal

  • 1. Scalable Data Management: Automation and the Modern Research Data Portal Vas Vasiliadis vas@uchicago.edu University of Chicago March 22, 2021
  • 2. Warning: We may have unregistered attendees… 2
  • 3. Globus is … a non-profit service developed and operated by 3
  • 4. Our mission is to… increase the efficiency and effectiveness of researchers engaged in data-driven science and scholarship through sustainable software 4
  • 5. Thank you, funders... U . S . D E P A R T M E N T O F ENERGY 5
  • 7. Globus SaaS / PaaS: Research data lifecycle Researcher initiates transfer request; or requested automatically by script, science gateway 1 Instrument Compute Facility Globus transfers files reliably, securely 2 Globus controls access to shared files on existing storage; no need to move files to cloud storage! 4 Researcher selects files to share, selects user or group, and sets access permissions 3 Collaborator logs in to Globus and accesses shared files; no local account required; download via Globus 5 Automating research workflows and ensuring those that need access to the data have it. 8 Personal Computer Transfer Share Build The Globus Command Line Interface, API sets, and Python SDK provide a platform… 6 … for building science gateways, portals and publication services. 7
  • 8. Use(r)-appropriate interfaces GET /endpoint/go%23ep1 PUT /endpoint/demodoc#my_endpt 200 OK X-Transfer-API-Version: 0.10 Content-Type: application/json … Globus service Web CLI Platform (RESTful APIs) 8
  • 9. Platform services enable development of… • Science gateways • Web portals • Data commons • Data-centric research apps 9
  • 10. The instruments are coming! 10
  • 11. Automated instrument data egress Cryo EM Lightsheet Sequencer ALS/APS …. Local system download Remote analysis, visualization • Reliable, near-real time data access • Automatically set policy based permissions • Self-service access control, management • Federated login for frictionless data access Local policy store --/cohort045 --/cohort096 --/cohort127 Local or cloud strage 11
  • 12. Today’s exemplar: Advanced Photon Source • 2-D, 3-D imaging • 2016: ~112TB/month • 100x – 1,000x growth 12
  • 13. Petrel Data Portal: Data collaboration at scale 13 petreldata.alcf.anl.gov
  • 14. How do I get one of those? 14
  • 15. A good starting reference is the MRDP 15
  • 16. MRDP: Key elements Science DMZ Fast, clean data path Data Transfer Node Purpose-built data mover Globus Platform Secure, reliable data orchestration Globus Connect Storage system enabler 16
  • 17. What’s wrong with my LRDP? 17
  • 18. LRDP architecture 18 Source: ESnet Science Engagement team
  • 20. The Data Transfer Node 20 Source: ESnet Science Engagement team fasterdata.es.net/science-dmz/DTN/reference-implementation
  • 22. Globus Connectors support diverse systems ActiveScale Object Storage
  • 23. Instrument data orchestration • Authentication and Authorization • Data description and discovery • Data access and transfer • Data and compute orchestration 23 Auth Search Transfer Groups Automate
  • 24. Globus Auth: Foundational IAM service Brokers authentication and authorization among… – End-users – Identity providers: enterprise, external (federated identities) – Services: resource servers with REST APIs – Apps: web, mobile, desktop, command line clients – Services acting as clients to other services • OAuth 2.0 Authorization Framework (a.k.a. OAuth2) • OpenID Connect Core 1.0 (a.k.a. OIDC) Auth 24
  • 25. 25 Diverse use cases Support a range of assurance levels, authentication policy and access control Auth
  • 26. Step 0: Application registration • Set desired scopes • Set callback URL • Get client ID and secret • Consents implement least privileges principle 26 Auth developers.globus.org
  • 27. Data description and discovery • (Meta)data store with fine- grained visibility controls • Schema agnostic  dynamic schemas • Simple search using URL query parameters • Complex search using search request document 27 docs.globus.org/api/search Search Index Search
  • 28. Data ingest with Globus Search 28 Search Index POST /index/{index_id}/ingest' Search { "ingest_type": "GMetaList", "ingest_data": { "gmeta": [ { "id": "filetype", "subject”: "https://search.api.globus.org/abc.txt", "visible_to": ["public"], "content": { "metadata-schema/file#type": "file” } }, ... ] }
  • 29. Data ingest with Globus Search 29 Search Index POST /index/{index_id}/ingest' Search { "ingest_type": "GMetaList", "ingest_data": { "gmeta": [ { "id": "size", "subject": "https://search.api.globus.org/abc.txt", "visible_to": ["urn:globus:auth:identity:46bd0f56- e24f-11e5-a510-131bef46955c"], "content": { "metadata-schema/file#size": "1000000", "metadata-schema/file#size_human": "1MB” } }, ... ] } Visibility limited to Globus Auth identity - Single user - Globus Group - Registered client application
  • 30. Data discovery with Globus Search 30 { "@datatype": "GSearchResult", "@version": "2017-09-01", "count": 1, "gmeta": [ { "@datatype": "GMetaResult", "@version": "2019-08-27", "entries": [ { ... } ], "subject": "https://..." } ], "offset": 0, "total": 1 } GET /index/{index_id}/search?q=type%3Ahdf5 Search Index Simple query Search
  • 31. Data discovery with Globus Search 31 POST /index/{index_id}/search Search Index Complex query { "filters": [ { "type": "range", "field_name": ”pubdate", "values": [ { "from": "*", "to": "2020-12-31" } ] } ], "facets": [ { "name": "Publication Date", "field_name": "pubdate", ... } ] } Search
  • 32. Data Access and Sharing • Set guest collection access rule • Check authenticated user’s Group membership • Submit Transfer task 32 Groups service Transfer service GET /groups/my_groups POST /endpoint/{endpoint_id}/access POST /transfer Groups Transfer
  • 33. Data and compute orchestration 33 Automate
  • 34. Compute platform service funcX: FaaS platform for HPC • funcX endpoints deployed at resources • Service routes requests to endpoints • Parsl acquires resources • Singularity containers run functions • Globus Auth secures communication funcX