We discuss the differences between Globus Connect Server versions 4 and 5, migrating to version 5, managing multi-DTN endpoints, options for tweaking file transfer performance and other advanced topics.
Presented at a workshop at Oak Ridge National Laboratory on June 23, 2022.
4. Out with the old, in with the new
• Host endpoints è Mapped collections
– Need local account to access data
• Shared endpoints è Guest collections
– No local account needed for data access, permissions set in Globus
• Create shared endpoint(s) on a host endpoint è
Create guest collection(s) on a mapped collection
5. GCSv4 vs GCSv5
Feature GCSv4 GCSv5
Credentials for
creating endpoints
• Globus ID was required to
create endpoint
• Use any identity with which
you can log into Globus.
• Endpoint registered as a
resource server with Globus
Auth à client id/secret.
Policy and data
access interfaces
• Static endpoint definition
supporting one collection
• Configuration changes
required re-running setup on
every DTN
• Data access via GridFTP only
• Endpoint supports multiple
storage systems/collections
• Automated management
across DTNs
• Data access via GridFTP and
HTTPS
6. GCSv4 vs GCSv5
Feature GCSv4 GCSv5
Server credentials • Default certificate issued
by Globus CA
• Custom certificate
supported
• Default certificate issued by Let’s
Encrypt (in order to enable
browser-based downloads)
• Custom certificate supported
Management interface • Configuration files and
tools on the DTN
• Some attributes editable
via the web app
• GCS Manager service on the
DTNs
• GCS CLI used to interact with the
service
7. GCSv4 vs GCSv5
Feature GCSv4 GCSv5
Endpoint management
roles
• Administrator, Activity
Manager, Activity
Monitor supported
• Unchanged
High Assurance data
access
• Not supported • Storage gateway can be
configured for high assurance
data handling, enforcing
additional security constraints
User authentication for
data access
• InCommon OAuth,
MyProxy or MyProxy
OAuth
• Any identity used to log into
Globus
• Custom identity provider via
OIDC
8. GCSv4 vs GCSv5
Feature GCSv4 GCSv5
Mapping to local
account
• Data access identity
independent from Globus
account
• User certificate DN to local
account
• ePPN based, GridMap files,
custom callout
• Storage gateway policy maps
Globus authentication information
to local account
• Expression based matching for
values in the authentication
context
• Custom external program
Custom domain
name
• Not applicable • Custom domain name for
collections
10. Goals
• No user intervention should be required
• Recreate all host and guest endpoints
• Preserve all relevant configuration
• Preserve the UUIDs of the resource
• Minimize downtime
10
11. Migration tools: Approach
• Read v4 configuration to create migration plan
• Allow edits and changes by administrator to plan
• Apply migration plan to a vanilla install of v5
• Test v5 endpoint/collections and validate
• Finalize by assigning v4 UUID to the new endpoint
11
12. Impact
• Downtime after final migration step (preserving UUID)
• Active transfers cancelled on final step; users notified
• Pause rules NOT preserved; must be recreated
• Custom applications using v4 host endpoint (with
activation) must move to v5 collection (with consent)
docs.globus.org/globus-connect-server/migrating-to-
v5.4/application-migration
12
16. Adding a node requires just two commands
$ globus-connect-server node setup $CLIENT_ID --deployment-key THE_KEY
$ systemctl restart apache2
Copy the deployment key
from the first node (DTN) to
every other node
Node setup pulls configuration from Globus service
Check your DTN cluster status:
globus-connect-server node list
17. Multi-node DTN behavior
• Active nodes can receive transfer tasks
• Tasks on inactive node will pause until active again
• GCS manager assistant
– Synchronizes configuration among nodes in the endpoint
– Stores encrypted configuration values in Globus service
17
18. Migrating an endpoint to a new host (DTN)
• An endpoints is a logical construct è replace host
system without disrupting the endpoint
– Avoid replicating configuration data (esp. for guest collections!)
– Maintain continuity for custom apps, automation scripts, etc., that
use the endpoint UUID
• Using GCS’s multi-node configuration, add new node(s)
to endpoint and then remove original node(s)
• Again, deployment key is required
– Export node configuration with node setup --export-node
– Import on new DTN using node setup --import-node
19. Supporting non-POSIX systems
• Update your GCS packages
• Add the appropriate storage gateway
– Non-POSIX systems require add-on connector subscription(s)
• Gateway configuration options vary by connector
– e.g., specify bucket name(s) for AWS S3
• Collection authentication options vary by connector
– e.g., provide user access key and secret key for AWS S3
– Credentials must grant appropriate permissions
– Mapped collection may not actually “map” to local user account
22. Globus transfer is fast …but it depends on…
• Data Transfer Node (CPU, RAM, bus, NIC, …)
• Network (devices, path quality, latency, …)
• Storage (hardware, attach mode, …)
• Dataset make-up (file#, size, tree depth, …)
– Remember: LoSF == Great sadness
• Things people do (one transfer per file …1M files)
• …?
22
23. You should have Great Expectations
23
ESnet EPOC target for all DOE labs
Requires at least a 10G connection
25. Legacy Architecture (don’t do this)
10GE
Border Router
WAN
Firewall
Enterprise
perfSONAR
perfSONAR
Filesystem
(data store)
10GE
Portal
Server
Browsing path
Query path
Data path
Portal server applications:
· web server
· search
· database
· authentication
· data service
26. Best practice: ScienceDMZ – you have one!
10GE
10GE
10GE
10GE
Border Router
WAN
Science DMZ
Switch/Router
Firewall
Enterprise
perfSONAR
perfSONAR
10GE
10GE
10GE
10GE
DTN
DTN
API DTNs
(data access governed
by portal)
DTN
DTN
perfSONAR
Filesystem
(data store)
10GE
Portal
Server
Browsing path
Query path
Portal server applications:
· web server
· search
· database
· authentication
Data Path
Data Transfer Path
Portal Query/Browse Path
27. Science DMZ configuration
27
Source
security
filters
Destination
security
filters
Destination
Science DMZ
Source
Science DMZ
Source
Border Router
Destination
Border Router
Source Router Destination Router
User
Organization
DATA
CONTROL
Physical Control Path
Logical Control Path
Physical Data Path
Logical Data Path
* Port 443
* Ports 50000-
51000
Data Transfer
Node (DTN)
Data Transfer
Node (DTN)
* Please see TCP ports reference: https://docs.globus.org/resource-provider-guide/#open-tcp-ports_section
29. Performance is a pairs sport
• Network use parameters: concurrency, parallelism
• Maximum, Preferred values for each
• Transfer considers source and destination endpoint settings
min(
max(preferred src, preferred dest),
max src,
max dest
)
• Service limits, e.g. concurrent requests
29
30. Globus network use parameters
• May only be changed on managed endpoints
• Modify via the web app: Console à Endpoints tab
• Modify via Globus Connect Server CLI
– Run globus-connect-server endpoint modify
• Strong recommendation: Do not change network use
parameters before establishing baseline performance
30
32. Configuring a “private” data channel
• Default: data interface is set to the DTN’s public IP
address (see data_interface in
/etc/gridftp.d/globus-connect-server
• Create /etc/gridftp.d/STORAGE_GATEWAY_ID
• Set data_interface PRIVATE_INTERFACE_IP_ADDRESS
• Replicate on every DTN (files in /etc/gridftp.d/ are
not sync'd between nodes by Globus)
32
34. Before asking for help…
• self-diagnostic can identify many issues
– Are services running? GCS manager/assistant, GridFTP server
• Connectivity is a common cause
– Is the DTN control channel reachable?
– Can the DTN establish data channel connection?
docs.globus.org/globus-connect-server/v5.4/troubleshooting-guide
…and we’re always here for you: support@globus.org
34
35. When you really need a clean slate…
• Proper clean-up—both on your system and in the
Globus service—is important!
• Execute these commands in the specified order:
o globus-connect-server node cleanup (on every DTN)
o globus-connect-server endpoint cleanup (on last DTN)
• Delete the GCS registration at developers.globus.org
• Don’t use the same Client ID for another endpoint!