SlideShare una empresa de Scribd logo
1 de 32
Descargar para leer sin conexión
Vas Vasiliadis
vas@uchicago.edu
February 27, 2024
Introduction to Research Automation
What do we mean by
research “automation”?
Executing research tasks* reliably,
at scale, with minimal (or no)
human intervention when required.
*data management and computation
2
Stepping into automation using Globus
• Level 1: Use the web app; it’s manual, but it may be
more automated than your current process J
• Level 2: Semi-automated, recurring tasks
• Level 3: Automation using the Globus CLI
• Level 4: Automation using Globus Flows
• Level 5: “Lights-out” automation using Globus Flows
with event triggers
A simple, and very common, use case
Transfer data
to a system for
sharing
Transfer
Set access
controls for
sharing data
Share
We’ll use this to
demonstrate
5 levels of automation
Level 1
Point-and-click using
the web app
6
Example: Ad hoc sharing of
results with a collaborator
Transfer Share
Level 2
Timer + manual setting
of permissions
7
Example: Scheduled backup
set to read-only access
Transfer Share
Timers
8
Scheduled and/or
recurring file
transfers
Support all transfer
and sync options
hpc.nih.gov/docs/globus/globus_cron.php#cron
Level 3
Parameterized CLI script
9
Transfer Share
Example: On-demand data
distribution from analysis
Globus Command Line Interface
Automation of
simple data
management tasks
Integration with
existing scripts
(job submission …)
Open source, uses
the Python SDK
$ globus
Usage: globus [OPTIONS] COMMAND [ARGS]...
Interact with Globus from the command line
All `globus` subcommands support `--help` documentation.
Use `globus login` to get started!
The documentation is also online at https://docs.globus.org/cli/
Options:
-v, --verbose Control level of output
-h, --help Show this message and exit.
-F, --format [unix|json|text] Output format for stdout. Defaults to
text
--jmespath, --jq TEXT A JMESPath expression to apply to json
output. Forces the format to be json
processed by this expression
--map-http-status TEXT Map HTTP statuses to any of these exit
codes:
0,1,50-99. e.g. "404=50,403=51"
Commands:
api Make API calls to Globus services
bookmark Manage endpoint bookmarks
cli-profile-list List all CLI profiles which have been used
collection Manage your Collections
delete Submit a delete task (asynchronous)
endpoint Manage Globus endpoint definitions
flows Interact with the Globus Flows service
Transfer and share CLI commands
11
$ globus transfer 
> --recursive 
> source_collection_uuid:source_path 
> guest_collection_uuid:destination_path
Message: The transfer has been accepted and a task has been created and
queued for execution
Task ID: f5eb855c-4098-11ee-8ba2-2197ca2bfedc
$ globus endpoint permission create 
> --group $group_uuid 
> --permissions $permissions 
> guest_collection_uuid:destination_path
Granting group, ............., read access to the destination directory
Message: Access rule created successfully.
Rule ID: 7fe723a4-413b-11ee-88f9-03dc0e0dcc45
Exercise: Run script using the Globus CLI
• Log into your instance
• Go to the ~/globus-tutorials directory
• Run the transfer_share.sh script
$ ./transfer_share.sh 
> --source-collection a6f165fa-aee2-4fe5-95f3-97429c28bf82 
> --source-path /cli 
> --guest-collection fe2feb64-4ac0-4a40-ba90-94b99d06dd2c 
> --sharing-path /rpi/YOUR_NAME 
> --group-id 50b6a29c-63ac-11e4-8062-22000ab68755
Level 4
Using a Globus Flow
13
Transfer Share
Example: Moving data from
instrument to campus
cluster for analysis
Level 4: Automation with Globus Flows
• Flows Service: A platform for managed, secure,
reliable task orchestration
• Flows comprise Actions à invoke Globus services;
extensible to support your own services
• Run via web app, CLI, API, event-based triggers*
Common tasks in most instrument scenarios
Transfer raw
images to HPC
cluster
Transfer
Set access
controls to allow
analysis
Share
Flows
Service
2
Actions
1
:set_permission
Action Provider
:transfer
Action Provider
Flow lifecycle
16
• Define using JSON
Flow lifecycle
17
• Define using JSON/YAML
• Deploy to Flows service
Flow lifecycle
18
• Define using JSON/YAML
• Deploy to Flows service
• Set access policy for
visibility and execution
Flow lifecycle
19
• Define using JSON/YAML
• Deploy to Flows service
• Set access policy for
visibility and execution
• Run (debug) and monitor
Flow lifecycle: Write once, run many
20
• Define using JSON/YAML
• Deploy to Flows service
• Set access policy for
visibility and execution
• Run (debug) and monitor
• …and run again!
Flow definition
21
"StartAt": "TransferFiles",
"States": {
"TransferFiles": {
"Comment": "Transfer to a guest collection",
"Type": "Action",
"ActionUrl": "https://actions.automate.globus.org/transfer/transfer",
"Parameters": {
"source_endpoint_id.$": "$.input.source.id",
"destination_endpoint_id.$": "$.input.destination.id",
"transfer_items": [
{
"source_path.$": "$.input.source.path",
"destination_path.$": "$.input.destination.path",
"recursive.$": "$.input.recursive_tx"
}
]
},
"ResultPath": "$.TransferFiles",
"WaitTime": 60,
"Next": "SetPermission",
},
"SetPermission": {
.....
"End": True
}
}
Action
Action Provider URL
Action inputs
Timeout (seconds)
Next state
Flow input schema
22
{
....
"properties": {
"input": {
"type": "object",
"required": [
"source",
"destination",
"recursive_tx"
],
"properties": {
"source": {
"type": "object",
"title": "Select source collection and path",
"description": ”Source collection/path (MUST end with '/')",
"format": "globus-collection",
"required": [
"id",
"path"
],
"transfer_label": {
"type": "string",
"title": "Label for Transfer Task",
"pattern": "^[a-zA-Z0-9-_, ]+$",
"maxLength": 128,
}
....
Required inputs
Custom schema
Input type
Input type
We give you a head start
23
Run a flow
app.globus.org/flows
(make sure you’re a member
of the “Tutorial Users” group)
25
Transfer Share
Exercise: Run Globus Flow using the web app
• Find “Tutorial - Transfer and Share” in flows library
• Click “Start”
• Confirm the source and destination collections
• Change the name of target path: /rpi/YOUR_NAME
• Enter a label for the flow run
• Click “Start Run”
• Monitor flow progress on the “Event log” tab
Level 5
Triggering flows
automatically
27
Transfer Share
EC2
Instance
“Instrument”
Simulating an instrument flow
Monitor
script
transfer control
Access data
and run
analysis
0 trigger
flow run
set
permissions
2
Globus
Connect
Server
Globus
Connect Server
ALCF
Eagle
transfer
files
1
Illustrating the
possible…
29
A more interesting scenario: cryoEM
Globus
Flows
Carbon!
Correct,
classify, …
Transfer
Transfer
raw files
Compute
Launch
analysis job
Share
Set access
controls
Transfer
Move final
files to repo
Globus
Flows
End-to-end Automation: Serial Crystallography
Image
processing
Data capture
Carbon!
Check
threshold
Data publication
Transfer
Transfer
raw files
Transfer
Move results
to repo
Compute
Analyze
images
Compute
Visualize
Compute
Gather
metadata
Share
Set access
controls
Compute
Launch QA
job
Search
Ingest to
index
Extending the Flows
ecosystem
32
Extending the ecosystem: Action providers
33
• Action Provider is a
service endpoint
– Run
– Status
– Cancel
– Release
– Resume
• Action Provider Toolkit
action-provider-tools.readthedocs.io
compute
ACLs
delete
identifier
transfer
notify ingest
mkdir
search
ls
Xtract describe
web form
Custom developed
docs.globus.org/api/flows/hosted-action-providers
Support resources
• Flows service in web app: app.globus.org/flows
• Flows documentation: docs.globus.org/api/flows
• Helpdesk: support@globus.org
• Customer engagement team can advise on flows
• Professional services team can help build flows

Más contenido relacionado

Similar a Introduction to Research Automation with Globus

Phpne august-2012-symfony-components-friends
Phpne august-2012-symfony-components-friendsPhpne august-2012-symfony-components-friends
Phpne august-2012-symfony-components-friends
Michael Peacock
 

Similar a Introduction to Research Automation with Globus (20)

Globus Automation
Globus AutomationGlobus Automation
Globus Automation
 
Automating Research Data Workflows (GlobusWorld Tour - STFC)
Automating Research Data Workflows (GlobusWorld Tour - STFC)Automating Research Data Workflows (GlobusWorld Tour - STFC)
Automating Research Data Workflows (GlobusWorld Tour - STFC)
 
Phpne august-2012-symfony-components-friends
Phpne august-2012-symfony-components-friendsPhpne august-2012-symfony-components-friends
Phpne august-2012-symfony-components-friends
 
GlobusWorld 2021 Tutorial: The Globus CLI, Platform and SDK
GlobusWorld 2021 Tutorial: The Globus CLI, Platform and SDKGlobusWorld 2021 Tutorial: The Globus CLI, Platform and SDK
GlobusWorld 2021 Tutorial: The Globus CLI, Platform and SDK
 
Leveraging the Globus Platform (GlobusWorld Tour - Columbia University)
Leveraging the Globus Platform (GlobusWorld Tour - Columbia University)Leveraging the Globus Platform (GlobusWorld Tour - Columbia University)
Leveraging the Globus Platform (GlobusWorld Tour - Columbia University)
 
SOLID: the core principles of success of the Symfony web framework and of you...
SOLID: the core principles of success of the Symfony web framework and of you...SOLID: the core principles of success of the Symfony web framework and of you...
SOLID: the core principles of success of the Symfony web framework and of you...
 
Simple Data Automation with Globus (GlobusWorld Tour West)
Simple Data Automation with Globus (GlobusWorld Tour West)Simple Data Automation with Globus (GlobusWorld Tour West)
Simple Data Automation with Globus (GlobusWorld Tour West)
 
Introduction to Globus for System Administrators (GlobusWorld Tour - UMich)
Introduction to Globus for System Administrators (GlobusWorld Tour - UMich)Introduction to Globus for System Administrators (GlobusWorld Tour - UMich)
Introduction to Globus for System Administrators (GlobusWorld Tour - UMich)
 
Automating Research Data Flows with Globus (CHPC 2019 - South Africa)
Automating Research Data Flows with Globus (CHPC 2019 - South Africa)Automating Research Data Flows with Globus (CHPC 2019 - South Africa)
Automating Research Data Flows with Globus (CHPC 2019 - South Africa)
 
Automating Research Data Workflows (GlobusWorld Tour - UCSD)
Automating Research Data Workflows (GlobusWorld Tour - UCSD)Automating Research Data Workflows (GlobusWorld Tour - UCSD)
Automating Research Data Workflows (GlobusWorld Tour - UCSD)
 
Leveraging the Globus Platform in Web Applications (CHPC 2019 - South Africa)
Leveraging the Globus Platform in Web Applications (CHPC 2019 - South Africa)Leveraging the Globus Platform in Web Applications (CHPC 2019 - South Africa)
Leveraging the Globus Platform in Web Applications (CHPC 2019 - South Africa)
 
Burn down the silos! Helping dev and ops gel on high availability websites
Burn down the silos! Helping dev and ops gel on high availability websitesBurn down the silos! Helping dev and ops gel on high availability websites
Burn down the silos! Helping dev and ops gel on high availability websites
 
Globus Endpoint Setup and Configuration - XSEDE14 Tutorial
Globus Endpoint Setup and Configuration - XSEDE14 TutorialGlobus Endpoint Setup and Configuration - XSEDE14 Tutorial
Globus Endpoint Setup and Configuration - XSEDE14 Tutorial
 
Tutorial: Introduction to Globus for System Administrators
Tutorial: Introduction to Globus for System AdministratorsTutorial: Introduction to Globus for System Administrators
Tutorial: Introduction to Globus for System Administrators
 
Automating Data Flows with the Globus CLI (GlobusWorld Tour - UMich)
Automating Data Flows with the Globus CLI (GlobusWorld Tour - UMich)Automating Data Flows with the Globus CLI (GlobusWorld Tour - UMich)
Automating Data Flows with the Globus CLI (GlobusWorld Tour - UMich)
 
Automating Research Data Flows with the Globus Command Line Interface (CLI)
Automating Research Data Flows with the Globus Command Line Interface (CLI)Automating Research Data Flows with the Globus Command Line Interface (CLI)
Automating Research Data Flows with the Globus Command Line Interface (CLI)
 
Airflow presentation
Airflow presentationAirflow presentation
Airflow presentation
 
Tutorial: Automating Research Data Workflows
Tutorial: Automating Research Data WorkflowsTutorial: Automating Research Data Workflows
Tutorial: Automating Research Data Workflows
 
Laravel for Web Artisans
Laravel for Web ArtisansLaravel for Web Artisans
Laravel for Web Artisans
 
Introduction to the Globus Platform (APS Workshop)
Introduction to the Globus Platform (APS Workshop)Introduction to the Globus Platform (APS Workshop)
Introduction to the Globus Platform (APS Workshop)
 

Más de Globus

Providing Globus Services to Users Of JASMIN for Environmental Data Analysis
Providing Globus Services to Users Of JASMIN for Environmental Data AnalysisProviding Globus Services to Users Of JASMIN for Environmental Data Analysis
Providing Globus Services to Users Of JASMIN for Environmental Data Analysis
Globus
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 
Extending Globus into a Site-wide Automated Data Infrastructure
Extending Globus into a Site-wide Automated Data InfrastructureExtending Globus into a Site-wide Automated Data Infrastructure
Extending Globus into a Site-wide Automated Data Infrastructure
Globus
 

Más de Globus (20)

Reactive Documents and Computational Pipelines
Reactive Documents and Computational PipelinesReactive Documents and Computational Pipelines
Reactive Documents and Computational Pipelines
 
Providing Globus Services to Users Of JASMIN for Environmental Data Analysis
Providing Globus Services to Users Of JASMIN for Environmental Data AnalysisProviding Globus Services to Users Of JASMIN for Environmental Data Analysis
Providing Globus Services to Users Of JASMIN for Environmental Data Analysis
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
Innovating Inference: Remote Triggering of Large Language Models on HPC Clust...
Innovating Inference: Remote Triggering of Large Language Models on HPC Clust...Innovating Inference: Remote Triggering of Large Language Models on HPC Clust...
Innovating Inference: Remote Triggering of Large Language Models on HPC Clust...
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 
GlobusWorld 2024: Opening Keynote Address
GlobusWorld 2024: Opening Keynote AddressGlobusWorld 2024: Opening Keynote Address
GlobusWorld 2024: Opening Keynote Address
 
Globus Connect Server Deep Dive - Advanced Configuration Options and Use Cases
Globus Connect Server Deep Dive - Advanced Configuration Options and Use CasesGlobus Connect Server Deep Dive - Advanced Configuration Options and Use Cases
Globus Connect Server Deep Dive - Advanced Configuration Options and Use Cases
 
Globus Compute with Integrated Research Infrastructure (IRI) Workflows
Globus Compute with Integrated Research Infrastructure (IRI) WorkflowsGlobus Compute with Integrated Research Infrastructure (IRI) Workflows
Globus Compute with Integrated Research Infrastructure (IRI) Workflows
 
Exploring Innovations in Data Repository Solutions Insights from the U.S. Geo...
Exploring Innovations in Data Repository Solutions Insights from the U.S. Geo...Exploring Innovations in Data Repository Solutions Insights from the U.S. Geo...
Exploring Innovations in Data Repository Solutions Insights from the U.S. Geo...
 
Globus at the U.S. Geological Survey (USGS)
Globus at the U.S. Geological Survey (USGS)Globus at the U.S. Geological Survey (USGS)
Globus at the U.S. Geological Survey (USGS)
 
Globus and the Integrated Research Infrastructure (IRI)
Globus and the Integrated Research Infrastructure (IRI)Globus and the Integrated Research Infrastructure (IRI)
Globus and the Integrated Research Infrastructure (IRI)
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
Extending Globus into a Site-wide Automated Data Infrastructure
Extending Globus into a Site-wide Automated Data InfrastructureExtending Globus into a Site-wide Automated Data Infrastructure
Extending Globus into a Site-wide Automated Data Infrastructure
 
Enhancing Research Orchestration Capabilities at ORNL.pptx
Enhancing Research Orchestration Capabilities at ORNL.pptxEnhancing Research Orchestration Capabilities at ORNL.pptx
Enhancing Research Orchestration Capabilities at ORNL.pptx
 
Enhancing Performance with Globus and the Science DMZ.pdf
Enhancing Performance with Globus and the Science DMZ.pdfEnhancing Performance with Globus and the Science DMZ.pdf
Enhancing Performance with Globus and the Science DMZ.pdf
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
Climate Science Flows Enabling Petabyte-Scale Climate Analysis with the Earth...
Climate Science Flows Enabling Petabyte-Scale Climate Analysis with the Earth...Climate Science Flows Enabling Petabyte-Scale Climate Analysis with the Earth...
Climate Science Flows Enabling Petabyte-Scale Climate Analysis with the Earth...
 
Introduction to Globus Compute - GlobusWorld 2024
Introduction to Globus Compute - GlobusWorld 2024Introduction to Globus Compute - GlobusWorld 2024
Introduction to Globus Compute - GlobusWorld 2024
 
Advanced Globus System Administration Topics
Advanced Globus System Administration TopicsAdvanced Globus System Administration Topics
Advanced Globus System Administration Topics
 
Instrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a FlowInstrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a Flow
 

Último

Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Lisi Hocke
 

Último (20)

Abortion Clinic In Stanger ](+27832195400*)[ 🏥 Safe Abortion Pills In Stanger...
Abortion Clinic In Stanger ](+27832195400*)[ 🏥 Safe Abortion Pills In Stanger...Abortion Clinic In Stanger ](+27832195400*)[ 🏥 Safe Abortion Pills In Stanger...
Abortion Clinic In Stanger ](+27832195400*)[ 🏥 Safe Abortion Pills In Stanger...
 
Encryption Recap: A Refresher on Key Concepts
Encryption Recap: A Refresher on Key ConceptsEncryption Recap: A Refresher on Key Concepts
Encryption Recap: A Refresher on Key Concepts
 
Modern binary build systems - PyCon 2024
Modern binary build systems - PyCon 2024Modern binary build systems - PyCon 2024
Modern binary build systems - PyCon 2024
 
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-CloudAlluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
 
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
 
[GRCPP] Introduction to concepts (C++20)
[GRCPP] Introduction to concepts (C++20)[GRCPP] Introduction to concepts (C++20)
[GRCPP] Introduction to concepts (C++20)
 
Abortion Clinic In Pretoria ](+27832195400*)[ 🏥 Safe Abortion Pills in Pretor...
Abortion Clinic In Pretoria ](+27832195400*)[ 🏥 Safe Abortion Pills in Pretor...Abortion Clinic In Pretoria ](+27832195400*)[ 🏥 Safe Abortion Pills in Pretor...
Abortion Clinic In Pretoria ](+27832195400*)[ 🏥 Safe Abortion Pills in Pretor...
 
Auto Affiliate AI Earns First Commission in 3 Hours..pdf
Auto Affiliate  AI Earns First Commission in 3 Hours..pdfAuto Affiliate  AI Earns First Commission in 3 Hours..pdf
Auto Affiliate AI Earns First Commission in 3 Hours..pdf
 
Abortion Clinic Pretoria ](+27832195400*)[ Abortion Clinic Near Me ● Abortion...
Abortion Clinic Pretoria ](+27832195400*)[ Abortion Clinic Near Me ● Abortion...Abortion Clinic Pretoria ](+27832195400*)[ Abortion Clinic Near Me ● Abortion...
Abortion Clinic Pretoria ](+27832195400*)[ Abortion Clinic Near Me ● Abortion...
 
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
 
From Theory to Practice: Utilizing SpiraPlan's REST API
From Theory to Practice: Utilizing SpiraPlan's REST APIFrom Theory to Practice: Utilizing SpiraPlan's REST API
From Theory to Practice: Utilizing SpiraPlan's REST API
 
The mythical technical debt. (Brooke, please, forgive me)
The mythical technical debt. (Brooke, please, forgive me)The mythical technical debt. (Brooke, please, forgive me)
The mythical technical debt. (Brooke, please, forgive me)
 
Rapidoform for Modern Form Building and Insights
Rapidoform for Modern Form Building and InsightsRapidoform for Modern Form Building and Insights
Rapidoform for Modern Form Building and Insights
 
Community is Just as Important as Code by Andrea Goulet
Community is Just as Important as Code by Andrea GouletCommunity is Just as Important as Code by Andrea Goulet
Community is Just as Important as Code by Andrea Goulet
 
Abortion Clinic In Pongola ](+27832195400*)[ 🏥 Safe Abortion Pills In Pongola...
Abortion Clinic In Pongola ](+27832195400*)[ 🏥 Safe Abortion Pills In Pongola...Abortion Clinic In Pongola ](+27832195400*)[ 🏥 Safe Abortion Pills In Pongola...
Abortion Clinic In Pongola ](+27832195400*)[ 🏥 Safe Abortion Pills In Pongola...
 
Effective Strategies for Wix's Scaling challenges - GeeCon
Effective Strategies for Wix's Scaling challenges - GeeConEffective Strategies for Wix's Scaling challenges - GeeCon
Effective Strategies for Wix's Scaling challenges - GeeCon
 
Abortion Clinic In Johannesburg ](+27832195400*)[ 🏥 Safe Abortion Pills in Jo...
Abortion Clinic In Johannesburg ](+27832195400*)[ 🏥 Safe Abortion Pills in Jo...Abortion Clinic In Johannesburg ](+27832195400*)[ 🏥 Safe Abortion Pills in Jo...
Abortion Clinic In Johannesburg ](+27832195400*)[ 🏥 Safe Abortion Pills in Jo...
 
Abortion Pill Prices Turfloop ](+27832195400*)[ 🏥 Women's Abortion Clinic in ...
Abortion Pill Prices Turfloop ](+27832195400*)[ 🏥 Women's Abortion Clinic in ...Abortion Pill Prices Turfloop ](+27832195400*)[ 🏥 Women's Abortion Clinic in ...
Abortion Pill Prices Turfloop ](+27832195400*)[ 🏥 Women's Abortion Clinic in ...
 
Weeding your micro service landscape.pdf
Weeding your micro service landscape.pdfWeeding your micro service landscape.pdf
Weeding your micro service landscape.pdf
 
Abortion Pill Prices Jozini ](+27832195400*)[ 🏥 Women's Abortion Clinic in Jo...
Abortion Pill Prices Jozini ](+27832195400*)[ 🏥 Women's Abortion Clinic in Jo...Abortion Pill Prices Jozini ](+27832195400*)[ 🏥 Women's Abortion Clinic in Jo...
Abortion Pill Prices Jozini ](+27832195400*)[ 🏥 Women's Abortion Clinic in Jo...
 

Introduction to Research Automation with Globus

  • 1. Vas Vasiliadis vas@uchicago.edu February 27, 2024 Introduction to Research Automation
  • 2. What do we mean by research “automation”? Executing research tasks* reliably, at scale, with minimal (or no) human intervention when required. *data management and computation 2
  • 3. Stepping into automation using Globus • Level 1: Use the web app; it’s manual, but it may be more automated than your current process J • Level 2: Semi-automated, recurring tasks • Level 3: Automation using the Globus CLI • Level 4: Automation using Globus Flows • Level 5: “Lights-out” automation using Globus Flows with event triggers
  • 4. A simple, and very common, use case Transfer data to a system for sharing Transfer Set access controls for sharing data Share We’ll use this to demonstrate 5 levels of automation
  • 5. Level 1 Point-and-click using the web app 6 Example: Ad hoc sharing of results with a collaborator Transfer Share
  • 6. Level 2 Timer + manual setting of permissions 7 Example: Scheduled backup set to read-only access Transfer Share
  • 7. Timers 8 Scheduled and/or recurring file transfers Support all transfer and sync options hpc.nih.gov/docs/globus/globus_cron.php#cron
  • 8. Level 3 Parameterized CLI script 9 Transfer Share Example: On-demand data distribution from analysis
  • 9. Globus Command Line Interface Automation of simple data management tasks Integration with existing scripts (job submission …) Open source, uses the Python SDK $ globus Usage: globus [OPTIONS] COMMAND [ARGS]... Interact with Globus from the command line All `globus` subcommands support `--help` documentation. Use `globus login` to get started! The documentation is also online at https://docs.globus.org/cli/ Options: -v, --verbose Control level of output -h, --help Show this message and exit. -F, --format [unix|json|text] Output format for stdout. Defaults to text --jmespath, --jq TEXT A JMESPath expression to apply to json output. Forces the format to be json processed by this expression --map-http-status TEXT Map HTTP statuses to any of these exit codes: 0,1,50-99. e.g. "404=50,403=51" Commands: api Make API calls to Globus services bookmark Manage endpoint bookmarks cli-profile-list List all CLI profiles which have been used collection Manage your Collections delete Submit a delete task (asynchronous) endpoint Manage Globus endpoint definitions flows Interact with the Globus Flows service
  • 10. Transfer and share CLI commands 11 $ globus transfer > --recursive > source_collection_uuid:source_path > guest_collection_uuid:destination_path Message: The transfer has been accepted and a task has been created and queued for execution Task ID: f5eb855c-4098-11ee-8ba2-2197ca2bfedc $ globus endpoint permission create > --group $group_uuid > --permissions $permissions > guest_collection_uuid:destination_path Granting group, ............., read access to the destination directory Message: Access rule created successfully. Rule ID: 7fe723a4-413b-11ee-88f9-03dc0e0dcc45
  • 11. Exercise: Run script using the Globus CLI • Log into your instance • Go to the ~/globus-tutorials directory • Run the transfer_share.sh script $ ./transfer_share.sh > --source-collection a6f165fa-aee2-4fe5-95f3-97429c28bf82 > --source-path /cli > --guest-collection fe2feb64-4ac0-4a40-ba90-94b99d06dd2c > --sharing-path /rpi/YOUR_NAME > --group-id 50b6a29c-63ac-11e4-8062-22000ab68755
  • 12. Level 4 Using a Globus Flow 13 Transfer Share Example: Moving data from instrument to campus cluster for analysis
  • 13. Level 4: Automation with Globus Flows • Flows Service: A platform for managed, secure, reliable task orchestration • Flows comprise Actions à invoke Globus services; extensible to support your own services • Run via web app, CLI, API, event-based triggers*
  • 14. Common tasks in most instrument scenarios Transfer raw images to HPC cluster Transfer Set access controls to allow analysis Share Flows Service 2 Actions 1 :set_permission Action Provider :transfer Action Provider
  • 16. Flow lifecycle 17 • Define using JSON/YAML • Deploy to Flows service
  • 17. Flow lifecycle 18 • Define using JSON/YAML • Deploy to Flows service • Set access policy for visibility and execution
  • 18. Flow lifecycle 19 • Define using JSON/YAML • Deploy to Flows service • Set access policy for visibility and execution • Run (debug) and monitor
  • 19. Flow lifecycle: Write once, run many 20 • Define using JSON/YAML • Deploy to Flows service • Set access policy for visibility and execution • Run (debug) and monitor • …and run again!
  • 20. Flow definition 21 "StartAt": "TransferFiles", "States": { "TransferFiles": { "Comment": "Transfer to a guest collection", "Type": "Action", "ActionUrl": "https://actions.automate.globus.org/transfer/transfer", "Parameters": { "source_endpoint_id.$": "$.input.source.id", "destination_endpoint_id.$": "$.input.destination.id", "transfer_items": [ { "source_path.$": "$.input.source.path", "destination_path.$": "$.input.destination.path", "recursive.$": "$.input.recursive_tx" } ] }, "ResultPath": "$.TransferFiles", "WaitTime": 60, "Next": "SetPermission", }, "SetPermission": { ..... "End": True } } Action Action Provider URL Action inputs Timeout (seconds) Next state
  • 21. Flow input schema 22 { .... "properties": { "input": { "type": "object", "required": [ "source", "destination", "recursive_tx" ], "properties": { "source": { "type": "object", "title": "Select source collection and path", "description": ”Source collection/path (MUST end with '/')", "format": "globus-collection", "required": [ "id", "path" ], "transfer_label": { "type": "string", "title": "Label for Transfer Task", "pattern": "^[a-zA-Z0-9-_, ]+$", "maxLength": 128, } .... Required inputs Custom schema Input type Input type
  • 22. We give you a head start 23
  • 23. Run a flow app.globus.org/flows (make sure you’re a member of the “Tutorial Users” group) 25 Transfer Share
  • 24. Exercise: Run Globus Flow using the web app • Find “Tutorial - Transfer and Share” in flows library • Click “Start” • Confirm the source and destination collections • Change the name of target path: /rpi/YOUR_NAME • Enter a label for the flow run • Click “Start Run” • Monitor flow progress on the “Event log” tab
  • 26. EC2 Instance “Instrument” Simulating an instrument flow Monitor script transfer control Access data and run analysis 0 trigger flow run set permissions 2 Globus Connect Server Globus Connect Server ALCF Eagle transfer files 1
  • 28. A more interesting scenario: cryoEM Globus Flows Carbon! Correct, classify, … Transfer Transfer raw files Compute Launch analysis job Share Set access controls Transfer Move final files to repo
  • 29. Globus Flows End-to-end Automation: Serial Crystallography Image processing Data capture Carbon! Check threshold Data publication Transfer Transfer raw files Transfer Move results to repo Compute Analyze images Compute Visualize Compute Gather metadata Share Set access controls Compute Launch QA job Search Ingest to index
  • 31. Extending the ecosystem: Action providers 33 • Action Provider is a service endpoint – Run – Status – Cancel – Release – Resume • Action Provider Toolkit action-provider-tools.readthedocs.io compute ACLs delete identifier transfer notify ingest mkdir search ls Xtract describe web form Custom developed docs.globus.org/api/flows/hosted-action-providers
  • 32. Support resources • Flows service in web app: app.globus.org/flows • Flows documentation: docs.globus.org/api/flows • Helpdesk: support@globus.org • Customer engagement team can advise on flows • Professional services team can help build flows