Globus is an established service from the University of Chicago that is widely used for managing research data in national laboratories, campus computing centers, and HPC facilities. While its interactive web browser interface addresses simple file transfer and sharing scenarios, large scale automation typically requires integration of the research data management platform it provides into bespoke applications.
We will describe one such example, the Petrel data portal (https://petreldata.net), used by researchers to manage data in diverse fields including materials science, cosmology, machine learning, and serial crystallography. The portal facilitates automated ingest of data, extraction and addition of metadata for creating search indexes, assignment of persistent identifiers faceted search for rapid data discovery, and point-and-click downloading of datasets by authorized users. As security and privacy are often critical requirements, the portal employs fine-grained permissions that control both visibility of metadata and access to the datasets themselves. It is based on the Modern Research Data Portal design pattern, jointly developed by the ESnet and Globus teams, and leverages capabilities such as the Science DMZ for enhanced performance and to streamline the user experience.
3. Globus is …
a non-profit service
developed and operated by
3
4. Our mission is to…
increase the efficiency and
effectiveness of researchers
engaged in data-driven
science and scholarship
through sustainable software
4
7. Globus SaaS / PaaS: Research data lifecycle
Researcher initiates
transfer request; or
requested automatically
by script, science
gateway
1
Instrument
Compute Facility
Globus transfers files
reliably, securely
2
Globus controls
access to shared
files on existing
storage; no need
to move files to
cloud storage!
4
Researcher
selects files to
share, selects
user or group,
and sets access
permissions
3
Collaborator logs in to
Globus and accesses
shared files; no local
account required;
download via Globus
5
Automating research
workflows and
ensuring those that
need access to the
data have it.
8
Personal Computer
Transfer
Share
Build
The Globus
Command Line
Interface, API sets,
and Python SDK
provide a platform…
6
… for building
science gateways,
portals and
publication services.
7
11. Automated instrument data egress
Cryo EM
Lightsheet
Sequencer
ALS/APS
….
Local system
download
Remote analysis,
visualization
• Reliable, near-real time
data access
• Automatically set policy
based permissions
• Self-service access
control, management
• Federated login for
frictionless data access
Local
policy
store
--/cohort045
--/cohort096
--/cohort127
Local or
cloud strage
11
16. MRDP: Key elements
Science DMZ
Fast, clean data path
Data Transfer Node
Purpose-built data mover
Globus Platform
Secure, reliable data
orchestration
Globus Connect
Storage system enabler
16
23. Instrument data orchestration
• Authentication and Authorization
• Data description and discovery
• Data access and transfer
• Data and compute orchestration
23
Auth Search Transfer Groups Automate
24. Globus Auth: Foundational IAM service
Brokers authentication and authorization among…
– End-users
– Identity providers: enterprise, external (federated identities)
– Services: resource servers with REST APIs
– Apps: web, mobile, desktop, command line clients
– Services acting as clients to other services
• OAuth 2.0 Authorization Framework (a.k.a. OAuth2)
• OpenID Connect Core 1.0 (a.k.a. OIDC)
Auth
24
26. Step 0: Application registration
• Set desired scopes
• Set callback URL
• Get client ID and secret
• Consents implement
least privileges principle
26
Auth
developers.globus.org
27. Data description and discovery
• (Meta)data store with fine-
grained visibility controls
• Schema agnostic
dynamic schemas
• Simple search using URL
query parameters
• Complex search using
search request document
27
docs.globus.org/api/search
Search
Index
Search
28. Data ingest with Globus Search
28
Search
Index
POST /index/{index_id}/ingest'
Search
{
"ingest_type": "GMetaList",
"ingest_data": {
"gmeta": [
{
"id": "filetype",
"subject”: "https://search.api.globus.org/abc.txt",
"visible_to": ["public"],
"content": {
"metadata-schema/file#type": "file”
}
},
...
]
}
29. Data ingest with Globus Search
29
Search
Index
POST /index/{index_id}/ingest'
Search
{
"ingest_type": "GMetaList",
"ingest_data": {
"gmeta": [
{
"id": "size",
"subject": "https://search.api.globus.org/abc.txt",
"visible_to": ["urn:globus:auth:identity:46bd0f56-
e24f-11e5-a510-131bef46955c"],
"content": {
"metadata-schema/file#size": "1000000",
"metadata-schema/file#size_human": "1MB”
}
},
...
]
}
Visibility limited to Globus Auth identity
- Single user
- Globus Group
- Registered client application
32. Data Access and Sharing
• Set guest collection access rule
• Check authenticated user’s Group membership
• Submit Transfer task
32
Groups
service
Transfer
service
GET /groups/my_groups
POST /endpoint/{endpoint_id}/access
POST /transfer
Groups
Transfer