SlideShare una empresa de Scribd logo
1 de 26
Descargar para leer sin conexión
UCSD HEP Group Trainings

            Wedding
     convenience and control
              with
         RemoteCondor
                        by Igor Sfiligoi
               RemoteCondor co-developed with J. Dost
                         UC San Diego


Apr 2012                     Remote Condor              1
The Condor Batch System
 ●   Condor is a Workload Management System
      ●    i.e. a batch system
 ●   Strong points
      ●    Fault tolerant
      ●    Robust feature set
      ●    Flexible
 ●   Large community base
      ●    Both commercial and scientific
                            http://research.cs.wisc.edu/condor/


Apr 2012                             Remote Condor                2
Condor Architecture
 ●   Clearly separates                          Machines (aka worker nodes)
                                                    CPUs, Memory, IO,...
      ●    Resource providers
           from
      ●    Resource consumers
                                                   Job queues (aka submit nodes)
 ●   Each has a daemon                                  Jobs submitted by users

     process to represent it
      ●    Startd for resource provides
      ●    Schedd for resource consumers
 ●   A central service connects them all
      ●
           Managed by a Collector/Negotiator pair


Apr 2012                        Remote Condor                                      3
Condor Architecture
                       in a picture


           Schedd                        Startd

             .
             .         Collector           .
                                           .
             .         Negotiator          .


           Schedd                        Startd




Apr 2012                 Remote Condor            4
The truth about submit nodes
 ●   Corollary
      ●    The submit node is a server!
 ●   There is no real “Condor client”
      ●    The cmdline tools are just a convenience
           to talk to the daemon process
                            Submit node

                                                     Collector
                                                     Negotiator


                                            Schedd     Startd


                    condor_submit
                        condor_q

Apr 2012                            Remote Condor                 5
Implications
 ●   Being a server has several implications
 ●   Security implications
      ●    Will have incoming connectivity
      ●    All security configuration on the submit node
      ●    Submit node controls user
           authentication and authorization
 ●   Unfriendly to non-dedicated hardware
      ●    Requires always on operation
      ●    Must be on a public&static IP address

Apr 2012                       Remote Condor               6
Implications
 ●   Being a server has several implications
 ●   Security implications                        High exploit risk
      ●    Will have incoming connectivity
      ●    All security configuration on the submit node
      ●    Submit node controls user           Requires high trust
                                               between all nodes
           authentication and authorization      in the cluster
 ●   Unfriendly to non-dedicated hardware
      ●    Requires always on operation                 Impossible to
                                                       use on a laptop
      ●    Must be on a public&static IP address

Apr 2012                       Remote Condor                             7
Implications
 ●   Being a server has several implications
 ●   Security implications                      High exploit risk
      ●    Will have incoming connectivity
      ●    All security configuration on the submit node
      ●
                      Not suitable Requires high trust
       Submit node controls user
                 for and authorization between cluster
       authentication an unmanaged in the all nodes
 ●
                     user machine
     Unfriendly to non-dedicated hardware
      ●    Requires always on operation             Impossible to
                                                   use on a laptop
      ●    Must be on a public&static IP address

Apr 2012                       Remote Condor                         8
What are the alternatives?
 ●   Out of the box, Condor provides
      ●    Remote submission
      ●    Condor-C
 ●   In the contrib sections, you can find
      ●    RemoteCondor




Apr 2012                   Remote Condor     9
What are the alternatives?
 ●   Out of the box, Condor provides
      ●    Remote submission
      ●    Condor-C
 ●   In the contrib sections, you can find
      ●    RemoteCondor
                                   This presentation
                                   argues that this is
                                    the best solution




Apr 2012                   Remote Condor                 10
What are the alternatives?
 ●   Out of the box, Condor provides
      ●    Remote submission
                                      So what is wrong with these?
      ●    Condor-C
 ●   In the contrib sections, you can find
      ●    RemoteCondor
                                   This presentation
                                   argues that this is
                                    the best solution




Apr 2012                   Remote Condor                       11
Remote submission
 ●   Essentially, connecting to a remote Schedd
      ●    condor_submit -remote … + condor_transfer_data
           and
      ●    condor_q -name ..., condor_rm -name ..., …
 ●   So no daemon processes on the submit node
      ●    A true client solution!
                          Submit node                                         Schedd node

                                                                                                Collector
                                                                                                Negotiator




                                                                            Auth
                                                                                   Schedd
                                                                                     Schedd
                           condor_submit                                                          Startd


                                condor_q
                     condor_transfer_data
                         http://research.cs.wisc.edu/condor/manual/v7.6/condor_submit.html
                     http://research.cs.wisc.edu/condor/manual/v7.6/condor_transfer_data.html

Apr 2012                                         Remote Condor                                       12
So, what's the problem?
 ●   No local user log file
                                          ● Annoying at best
      ●    Must use                       ● High monitoring load

           condor_q                       ● And it does not work

           to monitor progress              with DAGMan
 ●   Fully Condor-based user authentication
      ●    While rich, not what users expect
           (e.g. no user/password)
      ●    Hard to tie into campus-wide auth
 ●   Staged input data not shared
                                Could be a problem with large datasets
Apr 2012                         Remote Condor                      13
Condor-C
 ●   Based on the Grid paradigm
      ●    Submit locally, then delegate to remote Schedd
 ●   Still running a daemon process                                                                ● Secure
                                                                                                   ● Laptop
      ●    But requires no incoming connections
                                                                                                     friendly
                  Submit node                                               Schedd node

                                                                                                       Collector
                                                                                                       Negotiator
                                 Schedd




                                                                          Auth
                                                                                 Schedd
                                                                                   Schedd
                                                                                                         Startd


              condor_submit
                  condor_q

               http://research.cs.wisc.edu/condor/manual/v7.6/5_3Grid_Universe.html#sec:Condor-C


Apr 2012                                        Remote Condor                                               14
What are the drawbacks?
 ●   Awkward syntax
      ●    At least compared to Vanilla universe       Can be mitigated
                                                       with Job Router
      ●    See the Condor manual for examples             (but adds another
                                                         layer of complexity)
 ●   Has scalability problems
      ●    Could likely be improved,
           but this is the current state-of-the-art
 ●   Fully Condor-based user authentication
 ●   Staged input data not shared                     Same as remote
                                                       submissions


Apr 2012                        Remote Condor                             15
Introducing

           RemoteCondor




Apr 2012        Remote Condor   16
What's the big idea?
 ●   Let the users login into a remote machine
      ●    And run the cmdline tools there   True client
                                             approach




Apr 2012                     Remote Condor                 17
What's the big idea?
 ●   Let the users login into a remote machine
      ●    And run the cmdline tools there

      Advantages:                                     No exceptions
      ● True local Condor experience

      ● Standard system
                                                  ● Minimize security risk
                                                  ● Central handling

        authentication and authorization          ● Familiar to users
      ● No admin privileges for the users




                  ● Trust based on “central” Schedd admin skills
                  ● Can regulate and transform Condor submissions




Apr 2012                        Remote Condor                          18
What's the big idea?
 ●   Let the users login into a remote machine
      ●    And run the cmdline tools there

      Advantages:                          No exceptions
      ● True local Condor experience

                                          Minimize security risk
                        Big deal!
                                                  ●
      ● Standard system
                                          Central handling
                                                  ●

        authentication and authorization  Familiar to users
                                                  ●

                 Where's the news?
      ● No admin privileges for the users




                  ● Trust based on “central” Schedd admin skills
                  ● Can regulate and transform Condor submissions




Apr 2012                        Remote Condor                       19
What's the big idea?
 ●   Let the users login into a remote machine
      ●    And run the cmdline tools there
 ●   … while preserving the local look-and-feel
 ●   RemoteCondor provides
      ●    Wrappers around major Condor cmdline tools
      ●    Integration with sshfs
                    https://condor-wiki.cs.wisc.edu/index.cgi/wiki?p=RemoteCondor




Apr 2012                                    Remote Condor                           20
RemoteCondor wrappers
 ●   Provide wrappers that use ssh under the hood
 ●   Users (almost) unaware of the trick
      ●    But may be prompted for a password
      ●    Works best with public key authentication
            Submit node                       Schedd node
                                                                 Collector
                                                                 Negotiator
                              Auth


                                     sshd               Schedd
                                                                   Startd




            condor_submit                   condor_submit
                condor_q                        condor_q



Apr 2012                      Remote Condor                          21
RemoteCondor and sshfs
 ●   But being able to talk to Condor is not enough
      ●    Users must be able to create and read data!
 ●   Using sshfs solves the problem
      ●    Schedd-local disk mounted on submit node
      ●    Using ssh as a tunnel                                            Disk local to Schedd
                                                                        for maximum performance
      ●    All in user space (FUSE)
 ●   RemoteCondor will properly convert paths
     (within certain limits)

                               http://fuse.sourceforge.net/sshfs.html



Apr 2012                                Remote Condor                                       22
RemoteCondor and sshfs
 ●   But being able to talk to Condor is not enough
      ●    Users must be able to create and read data!
 ●   Using sshfs solves the problem
      ●    Schedd-local disk mounted on submit node
            Submit node                      Schedd node
                                                                    Collector
                                                                    Negotiator
                             Auth


                                    sshd                   Schedd
                                                                      Startd




                 sshfs                         Real disk



Apr 2012                     Remote Condor                              23
Using RemoteCondor
 ●   Distributed in the Condor src tarball
      ●    In the Contrib section
 ●   Requires a “make install”
      ●    To put the proper files in place
 ●   Plus minimal configuration
      ●    Where is the remote Schedd node?
      ●    What username to use?
      ●    Where to mount the sshfs partition?
                    https://condor-wiki.cs.wisc.edu/index.cgi/wiki?p=RemoteCondor



Apr 2012                                    Remote Condor                           24
Summary
●     Traditional Condor not suitable for user machines
●     Keeping Schedd nodes professionally maintained
      highly desirable
       ●   To minimize security risks and control job flow
●     RemoteCondor allows this operation mode
      while preserving the local look-and-feel
       ●   Requires minimal local install




    Apr 2012                     Remote Condor               25
Acknowledgements

  This work is partially sponsored by
  ●   the US National Science Foundation under Grants
      No. OCI-0943725 (STCI) and PHY-0612805
      (CMS Maintenance & Operations),
      and
  ●   the US Department of Energy under Grant No. DE-
      FC02-06ER41436 subcontract No. 647F290 (OSG).




Apr 2012                   Remote Condor                26

Más contenido relacionado

Similar a Wedding convenience and control with RemoteCondor

Osh camp 2012 experience with adk
Osh camp 2012 experience with adkOsh camp 2012 experience with adk
Osh camp 2012 experience with adk
Paul Tanner
 
aoa-adk-osidays-rajeshsola
aoa-adk-osidays-rajeshsolaaoa-adk-osidays-rajeshsola
aoa-adk-osidays-rajeshsola
Rajesh Sola
 

Similar a Wedding convenience and control with RemoteCondor (20)

An argument for moving the requirements out of user hands - The CMS Experience
An argument for moving the requirements out of user hands - The CMS ExperienceAn argument for moving the requirements out of user hands - The CMS Experience
An argument for moving the requirements out of user hands - The CMS Experience
 
glideinWMS Frontend Installation - Part 1 - Condor Installation - glideinWMS ...
glideinWMS Frontend Installation - Part 1 - Condor Installation - glideinWMS ...glideinWMS Frontend Installation - Part 1 - Condor Installation - glideinWMS ...
glideinWMS Frontend Installation - Part 1 - Condor Installation - glideinWMS ...
 
Understanding priorities in HTCondor
Understanding priorities in HTCondorUnderstanding priorities in HTCondor
Understanding priorities in HTCondor
 
Condor from the user point of view - glideinWMS Training Jan 2012
Condor from the user point of view - glideinWMS Training Jan 2012Condor from the user point of view - glideinWMS Training Jan 2012
Condor from the user point of view - glideinWMS Training Jan 2012
 
glideinWMS Training 2014 - HTCondor Internals
glideinWMS Training 2014 - HTCondor InternalsglideinWMS Training 2014 - HTCondor Internals
glideinWMS Training 2014 - HTCondor Internals
 
Osh camp 2012 experience with adk
Osh camp 2012 experience with adkOsh camp 2012 experience with adk
Osh camp 2012 experience with adk
 
Osh camp 2012 experience with adk
Osh camp 2012 experience with adkOsh camp 2012 experience with adk
Osh camp 2012 experience with adk
 
Android Open Accessory Protocol - Turn Your Linux machine as ADK
Android Open Accessory Protocol - Turn Your Linux machine as ADKAndroid Open Accessory Protocol - Turn Your Linux machine as ADK
Android Open Accessory Protocol - Turn Your Linux machine as ADK
 
aoa-adk-osidays-rajeshsola
aoa-adk-osidays-rajeshsolaaoa-adk-osidays-rajeshsola
aoa-adk-osidays-rajeshsola
 
Matchmaking in glideinWMS in CMS
Matchmaking in glideinWMS in CMSMatchmaking in glideinWMS in CMS
Matchmaking in glideinWMS in CMS
 
Introduction to Docker
Introduction to DockerIntroduction to Docker
Introduction to Docker
 
glideinWMS Frontend Monitoring - glideinWMS Training Jan 2012
glideinWMS Frontend Monitoring - glideinWMS Training Jan 2012glideinWMS Frontend Monitoring - glideinWMS Training Jan 2012
glideinWMS Frontend Monitoring - glideinWMS Training Jan 2012
 
Leveraging Android's Linux Heritage
Leveraging Android's Linux HeritageLeveraging Android's Linux Heritage
Leveraging Android's Linux Heritage
 
Android Development Tutorial V3
Android Development Tutorial   V3Android Development Tutorial   V3
Android Development Tutorial V3
 
Glidein startup Internals and Glidein configuration - glideinWMS Training Jan...
Glidein startup Internals and Glidein configuration - glideinWMS Training Jan...Glidein startup Internals and Glidein configuration - glideinWMS Training Jan...
Glidein startup Internals and Glidein configuration - glideinWMS Training Jan...
 
Monitoring and troubleshooting a glideinWMS-based HTCondor pool
Monitoring and troubleshooting a glideinWMS-based HTCondor poolMonitoring and troubleshooting a glideinWMS-based HTCondor pool
Monitoring and troubleshooting a glideinWMS-based HTCondor pool
 
Leveraging Android's Linux Heritage at AnDevCon3
Leveraging Android's Linux Heritage at AnDevCon3Leveraging Android's Linux Heritage at AnDevCon3
Leveraging Android's Linux Heritage at AnDevCon3
 
JOSA TechTalk: Introduction to docker
JOSA TechTalk: Introduction to dockerJOSA TechTalk: Introduction to docker
JOSA TechTalk: Introduction to docker
 
Controlling an Arduino with Android
Controlling an Arduino with AndroidControlling an Arduino with Android
Controlling an Arduino with Android
 
Leveraging Android's Linux Heritage at ELC-E 2011
Leveraging Android's Linux Heritage at ELC-E 2011Leveraging Android's Linux Heritage at ELC-E 2011
Leveraging Android's Linux Heritage at ELC-E 2011
 

Más de Igor Sfiligoi

Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...
Igor Sfiligoi
 
The anachronism of whole-GPU accounting
The anachronism of whole-GPU accountingThe anachronism of whole-GPU accounting
The anachronism of whole-GPU accounting
Igor Sfiligoi
 

Más de Igor Sfiligoi (20)

Preparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYROPreparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYRO
 
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
 
Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...
 
The anachronism of whole-GPU accounting
The anachronism of whole-GPU accountingThe anachronism of whole-GPU accounting
The anachronism of whole-GPU accounting
 
Auto-scaling HTCondor pools using Kubernetes compute resources
Auto-scaling HTCondor pools using Kubernetes compute resourcesAuto-scaling HTCondor pools using Kubernetes compute resources
Auto-scaling HTCondor pools using Kubernetes compute resources
 
Speeding up bowtie2 by improving cache-hit rate
Speeding up bowtie2 by improving cache-hit rateSpeeding up bowtie2 by improving cache-hit rate
Speeding up bowtie2 by improving cache-hit rate
 
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsPerformance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
 
Comparing GPU effectiveness for Unifrac distance compute
Comparing GPU effectiveness for Unifrac distance computeComparing GPU effectiveness for Unifrac distance compute
Comparing GPU effectiveness for Unifrac distance compute
 
Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...
 
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory AccessAccelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
 
Using A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific OutputUsing A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific Output
 
Using commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobsUsing commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobs
 
Modest scale HPC on Azure using CGYRO
Modest scale HPC on Azure using CGYROModest scale HPC on Azure using CGYRO
Modest scale HPC on Azure using CGYRO
 
Data-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstData-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud Burst
 
Scheduling a Kubernetes Federation with Admiralty
Scheduling a Kubernetes Federation with AdmiraltyScheduling a Kubernetes Federation with Admiralty
Scheduling a Kubernetes Federation with Admiralty
 
Accelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACCAccelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACC
 
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
 
Porting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUsPorting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUs
 
Demonstrating 100 Gbps in and out of the public Clouds
Demonstrating 100 Gbps in and out of the public CloudsDemonstrating 100 Gbps in and out of the public Clouds
Demonstrating 100 Gbps in and out of the public Clouds
 
TransAtlantic Networking using Cloud links
TransAtlantic Networking using Cloud linksTransAtlantic Networking using Cloud links
TransAtlantic Networking using Cloud links
 

Último

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Último (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

Wedding convenience and control with RemoteCondor

  • 1. UCSD HEP Group Trainings Wedding convenience and control with RemoteCondor by Igor Sfiligoi RemoteCondor co-developed with J. Dost UC San Diego Apr 2012 Remote Condor 1
  • 2. The Condor Batch System ● Condor is a Workload Management System ● i.e. a batch system ● Strong points ● Fault tolerant ● Robust feature set ● Flexible ● Large community base ● Both commercial and scientific http://research.cs.wisc.edu/condor/ Apr 2012 Remote Condor 2
  • 3. Condor Architecture ● Clearly separates Machines (aka worker nodes) CPUs, Memory, IO,... ● Resource providers from ● Resource consumers Job queues (aka submit nodes) ● Each has a daemon Jobs submitted by users process to represent it ● Startd for resource provides ● Schedd for resource consumers ● A central service connects them all ● Managed by a Collector/Negotiator pair Apr 2012 Remote Condor 3
  • 4. Condor Architecture in a picture Schedd Startd . . Collector . . . Negotiator . Schedd Startd Apr 2012 Remote Condor 4
  • 5. The truth about submit nodes ● Corollary ● The submit node is a server! ● There is no real “Condor client” ● The cmdline tools are just a convenience to talk to the daemon process Submit node Collector Negotiator Schedd Startd condor_submit condor_q Apr 2012 Remote Condor 5
  • 6. Implications ● Being a server has several implications ● Security implications ● Will have incoming connectivity ● All security configuration on the submit node ● Submit node controls user authentication and authorization ● Unfriendly to non-dedicated hardware ● Requires always on operation ● Must be on a public&static IP address Apr 2012 Remote Condor 6
  • 7. Implications ● Being a server has several implications ● Security implications High exploit risk ● Will have incoming connectivity ● All security configuration on the submit node ● Submit node controls user Requires high trust between all nodes authentication and authorization in the cluster ● Unfriendly to non-dedicated hardware ● Requires always on operation Impossible to use on a laptop ● Must be on a public&static IP address Apr 2012 Remote Condor 7
  • 8. Implications ● Being a server has several implications ● Security implications High exploit risk ● Will have incoming connectivity ● All security configuration on the submit node ● Not suitable Requires high trust Submit node controls user for and authorization between cluster authentication an unmanaged in the all nodes ● user machine Unfriendly to non-dedicated hardware ● Requires always on operation Impossible to use on a laptop ● Must be on a public&static IP address Apr 2012 Remote Condor 8
  • 9. What are the alternatives? ● Out of the box, Condor provides ● Remote submission ● Condor-C ● In the contrib sections, you can find ● RemoteCondor Apr 2012 Remote Condor 9
  • 10. What are the alternatives? ● Out of the box, Condor provides ● Remote submission ● Condor-C ● In the contrib sections, you can find ● RemoteCondor This presentation argues that this is the best solution Apr 2012 Remote Condor 10
  • 11. What are the alternatives? ● Out of the box, Condor provides ● Remote submission So what is wrong with these? ● Condor-C ● In the contrib sections, you can find ● RemoteCondor This presentation argues that this is the best solution Apr 2012 Remote Condor 11
  • 12. Remote submission ● Essentially, connecting to a remote Schedd ● condor_submit -remote … + condor_transfer_data and ● condor_q -name ..., condor_rm -name ..., … ● So no daemon processes on the submit node ● A true client solution! Submit node Schedd node Collector Negotiator Auth Schedd Schedd condor_submit Startd condor_q condor_transfer_data http://research.cs.wisc.edu/condor/manual/v7.6/condor_submit.html http://research.cs.wisc.edu/condor/manual/v7.6/condor_transfer_data.html Apr 2012 Remote Condor 12
  • 13. So, what's the problem? ● No local user log file ● Annoying at best ● Must use ● High monitoring load condor_q ● And it does not work to monitor progress with DAGMan ● Fully Condor-based user authentication ● While rich, not what users expect (e.g. no user/password) ● Hard to tie into campus-wide auth ● Staged input data not shared Could be a problem with large datasets Apr 2012 Remote Condor 13
  • 14. Condor-C ● Based on the Grid paradigm ● Submit locally, then delegate to remote Schedd ● Still running a daemon process ● Secure ● Laptop ● But requires no incoming connections friendly Submit node Schedd node Collector Negotiator Schedd Auth Schedd Schedd Startd condor_submit condor_q http://research.cs.wisc.edu/condor/manual/v7.6/5_3Grid_Universe.html#sec:Condor-C Apr 2012 Remote Condor 14
  • 15. What are the drawbacks? ● Awkward syntax ● At least compared to Vanilla universe Can be mitigated with Job Router ● See the Condor manual for examples (but adds another layer of complexity) ● Has scalability problems ● Could likely be improved, but this is the current state-of-the-art ● Fully Condor-based user authentication ● Staged input data not shared Same as remote submissions Apr 2012 Remote Condor 15
  • 16. Introducing RemoteCondor Apr 2012 Remote Condor 16
  • 17. What's the big idea? ● Let the users login into a remote machine ● And run the cmdline tools there True client approach Apr 2012 Remote Condor 17
  • 18. What's the big idea? ● Let the users login into a remote machine ● And run the cmdline tools there Advantages: No exceptions ● True local Condor experience ● Standard system ● Minimize security risk ● Central handling authentication and authorization ● Familiar to users ● No admin privileges for the users ● Trust based on “central” Schedd admin skills ● Can regulate and transform Condor submissions Apr 2012 Remote Condor 18
  • 19. What's the big idea? ● Let the users login into a remote machine ● And run the cmdline tools there Advantages: No exceptions ● True local Condor experience Minimize security risk Big deal! ● ● Standard system Central handling ● authentication and authorization Familiar to users ● Where's the news? ● No admin privileges for the users ● Trust based on “central” Schedd admin skills ● Can regulate and transform Condor submissions Apr 2012 Remote Condor 19
  • 20. What's the big idea? ● Let the users login into a remote machine ● And run the cmdline tools there ● … while preserving the local look-and-feel ● RemoteCondor provides ● Wrappers around major Condor cmdline tools ● Integration with sshfs https://condor-wiki.cs.wisc.edu/index.cgi/wiki?p=RemoteCondor Apr 2012 Remote Condor 20
  • 21. RemoteCondor wrappers ● Provide wrappers that use ssh under the hood ● Users (almost) unaware of the trick ● But may be prompted for a password ● Works best with public key authentication Submit node Schedd node Collector Negotiator Auth sshd Schedd Startd condor_submit condor_submit condor_q condor_q Apr 2012 Remote Condor 21
  • 22. RemoteCondor and sshfs ● But being able to talk to Condor is not enough ● Users must be able to create and read data! ● Using sshfs solves the problem ● Schedd-local disk mounted on submit node ● Using ssh as a tunnel Disk local to Schedd for maximum performance ● All in user space (FUSE) ● RemoteCondor will properly convert paths (within certain limits) http://fuse.sourceforge.net/sshfs.html Apr 2012 Remote Condor 22
  • 23. RemoteCondor and sshfs ● But being able to talk to Condor is not enough ● Users must be able to create and read data! ● Using sshfs solves the problem ● Schedd-local disk mounted on submit node Submit node Schedd node Collector Negotiator Auth sshd Schedd Startd sshfs Real disk Apr 2012 Remote Condor 23
  • 24. Using RemoteCondor ● Distributed in the Condor src tarball ● In the Contrib section ● Requires a “make install” ● To put the proper files in place ● Plus minimal configuration ● Where is the remote Schedd node? ● What username to use? ● Where to mount the sshfs partition? https://condor-wiki.cs.wisc.edu/index.cgi/wiki?p=RemoteCondor Apr 2012 Remote Condor 24
  • 25. Summary ● Traditional Condor not suitable for user machines ● Keeping Schedd nodes professionally maintained highly desirable ● To minimize security risks and control job flow ● RemoteCondor allows this operation mode while preserving the local look-and-feel ● Requires minimal local install Apr 2012 Remote Condor 25
  • 26. Acknowledgements This work is partially sponsored by ● the US National Science Foundation under Grants No. OCI-0943725 (STCI) and PHY-0612805 (CMS Maintenance & Operations), and ● the US Department of Energy under Grant No. DE- FC02-06ER41436 subcontract No. 647F290 (OSG). Apr 2012 Remote Condor 26