SlideShare una empresa de Scribd logo
1 de 39
Descargar para leer sin conexión
Condor Week 2012


          An argument for moving the
         requirements out of user hands
                                             -

                                The CMS
                                experience
                              presented by Igor Sfiligoi
                                      UC San Diego




Condor Week 2012 - May 2012         No user requirements   1
The basics
 ●   Condor architecture clearly separates
      ●   Resource providers                       Machines (aka worker nodes)
                                                        CPUs, Memory, IO,...
          from
      ●   Resource consumers                           Job queues (aka submit nodes)
                                                           Jobs submitted by users
 ●   Each has a daemon
     process to represent it                                    Schedd
                                                                         Negotiator
          Startd for resource provides




                                                                ...
      ●                                                                  Collector
                                                                Schedd
                                                                                                   Startd
      ●   Schedd for resource consumers                                                      ...
                                                                                         Startd
 ●   A central service connects them all                                           ...
                                                                               Startd
      ●
          Managed by a Collector/Negotiator pair

Condor Week 2012 - May 2012     No user requirements                                                 2
Matchmaking
 ●    In order for a job to start running on a resource
       ●   The job requirements
           must evaluate to True
       ●   The machine requirements
           must evaluate to True



     There is also the ranking,
     but that's 2nd level optimization




Condor Week 2012 - May 2012              No user requirements   3
Matchmaking
 ●    In order for a job to start running on a resource
       ●   The job requirements
           must evaluate to True
       ●   The machine requirements                               Most manuals
           must evaluate to True                                focus on job reqs




     There is also the ranking,
     but that's 2nd level optimization




Condor Week 2012 - May 2012              No user requirements                       4
Matchmaking
 ●    In order for a job to start running on a resource
       ●   The job requirements
           must evaluate to True
       ●   The machine requirements                                   Most manuals
           must evaluate to True                                    focus on job reqs




     There is also the ranking,
     but that's 2nd level optimization                    Machine reqs
                                                       deemed for handling
                                                         Owner state only



Condor Week 2012 - May 2012              No user requirements                           5
My argument


                          Get rid of
                      Job Requirements

               Put all logic in the
              Machine Requirements

Condor Week 2012 - May 2012     No user requirements   6
Some background
 ●   I am a big glideinWMS user                                          http://tinyurl.com/glideinWMS



 ●   And glideinWMS has 2 level matchmaking
      ●   One at VO Frontend level – where to send glideins
      ●   One at Negotiator level – which job to start in a glidein
                                 Collector
       Schedd                    Negotiator

                                                               glidein
                                                             Execution node

                                                                Startd

          Frontend                                                   Job
                                Factory

Condor Week 2012 - May 2012           No user requirements                                       7
Matchmaking problem
 ●   Both levels must be in sync, or you either
      ●   Ask for glideins which never match any jobs


                                   Job matched to site
                                      in VO Frontend
                                             but
                                   not to site's machine
                                     in the Negotiator




Condor Week 2012 - May 2012   No user requirements         8
Matchmaking problem
 ●   Both levels must be in sync, or you either
      ●   Ask for glideins which never match any jobs, or
      ●   Have job waiting in the queue when site available


                                        Job doesn't match to site
                                              in VO Frontend
                                                    but
                                         would to site's machine
                                             in the Negotiator
                                       if glideins were requested




Condor Week 2012 - May 2012   No user requirements                  9
Matchmaking problem
 ●   Both levels must be in sync, or you either
      ●   Ask for glideins which never match any jobs, or
      ●   Have job waiting in the queue when site available
 ●   But site and machine adds have
     different attributes
      ●   Not all machines on a site are exactly the same

                                  So I need
                                 2 different
                                requirements



Condor Week 2012 - May 2012   No user requirements            10
The usual dilemma
 ●   Where do I define these requirements?
      ●   In user job ClassAd?
      ●   Or in the Resource ClassAds?
            –   And we have 2 different resource types here




Condor Week 2012 - May 2012      No user requirements         11
The usual dilemma
 ●   Where do I define these requirements?
      ●   In user job ClassAd?
      ●   Or in the Resource ClassAds?
            –   And we have 2 different resource types here




      The 2 resources
         are handled
     by the same admin
      (in VO Frontend)




Condor Week 2012 - May 2012      No user requirements         12
The usual dilemma
 ●   Where do I define these requirements?
      ●   In user job ClassAd?
      ●   Or in the Resource ClassAds?
            –   And we have 2 different resource types here


                                                             Potentially
                                                               O(1k)
      The 2 resources                                          users!
         are handled
     by the same admin
      (in VO Frontend)
                              Typically

                              O(1)
Condor Week 2012 - May 2012               No user requirements             13
The usual dilemma
 ●   Where do I define these requirements?
      ●   In user job ClassAd?
      ●   Or in the Resource ClassAds?
            –   And we have 2 different resource types here


                                                       And do you really
                                                        trust your users
      The 2 resources                                to do the right thing?
         are handled
     by the same admin
      (in VO Frontend)                                                        Potentially
                              Typically                                   O(1k)
                              O(1)
Condor Week 2012 - May 2012               No user requirements                              14
Moving reqs to the resources
 ●   So we went for setting the requirements
     in the resources themselves




Condor Week 2012 - May 2012   No user requirements   15
Moving reqs to the resources
 ●   So we went for setting the requirements
     in the resources themselves
 ●   But users still need a way to select resources!
      ●   How do they do it???




Condor Week 2012 - May 2012   No user requirements     16
Moving reqs to the resources
 ●   So we went for setting the requirements
     in the resources themselves
 ●   But users still need a way to select resources!
      ●   How do they do it???




                              They express their needs
                                 through attributes!



Condor Week 2012 - May 2012      No user requirements    17
Fixed schema
  ●       The resource provider defines the requirements
                              a.k.a startd + VO Frontend
          ●   The VO Frontend admin in our case
  ●       Those requirements look for well-defined
          user-provided attributes             Site attribute
                                               Machine attribute
                                                                     Job attribute


          entry_req = stringListMember(GLIDEIN_Site,DESIRED_Sites)&&
                      ((GLIDEIN_Min_Mem>DESIRED_Mem)=!=False)
Example




          Start   =   stringListMember(GLIDEIN_Site,DESIRED_Sites)&&
                      ((Memory>DESIRED_Mem)=!=False)



Condor Week 2012 - May 2012                   No user requirements                   18
Simple user job submit file
 ●   No complex requirements to write
 ●   Very little user training
 ●   Low error rate
                                    a.submit

                         Executable = a.sh
                         Output = a.out
                         +DESIRED_Sites=”UCSD,Nebraska”
                         Requirements=True
                         queue




Condor Week 2012 - May 2012        No user requirements   19
How well does it work?
 ●   Advantages                                 ●    Disadvantages
      ●   Easy to keep the two                        ●    Rigid schema
          levels in sync
      ●   Easy to define
          reasonable defaults
      ●   Easy on the users
      ●   And more...
          (wait for later slides)




Condor Week 2012 - May 2012         No user requirements                  20
How well does it work?
 ●   Advantages                                 ●    Disadvantages
      ●   Easy to keep the two                        ●    Rigid schema
          levels in sync
      ●   Easy to define
          reasonable defaults
      ●   Easy on the users
      ●   And more...                                         In our experience,
          (wait for later slides)                          does not need to change
                                                                  more than
                                                           a couple of times a year




Condor Week 2012 - May 2012         No user requirements                              21
Is this glideinWMS specific?
 ●   Advantages                           ●    Disadvantages
      ●   Easy to keep the two
              Don't think so!
                                                ●    Rigid schema
          levels in sync


      ●   Easy to define
          reasonable defaults
                                                The ease of use
      ●   Easy on the users                       is there for
                                               any Condor setup



Condor Week 2012 - May 2012   No user requirements                  22
Is this glideinWMS specific?
 ●   Advantages                           ●    Disadvantages
      ●   Easy to keep the two
              Don't think so!
                                                ●    Rigid schema
          levels in sync


      ●  Easy to define
         reasonable defaults
      ● Easy on the users
                                                The ease of use
      I would argue it                            is there for
      should become                            any Condor setup
     standard practice


Condor Week 2012 - May 2012   No user requirements                  23
And there is
                  still more to it


Condor Week 2012 - May 2012   No user requirements   24
Side effect
 ●   Discovered one unexpected nice side effect




Condor Week 2012 - May 2012     No user requirements   25
Side effect
 ●   Discovered one unexpected nice side effect


                              We can outsmart
                                our users!




Condor Week 2012 - May 2012       No user requirements   26
The overflow use case
  ●        Normally, CMS jobs run near the data
           ●   So users provide a whitelist of sites to run on
           ●   And we have the appropriate glideinWMS expression
                        a.submit

          Executable = a.sh
          Output = a.out
          +DESIRED_Sites=”UCSD,Nebraska”
          Requirements=True
           entry_req = stringListMember(GLIDEIN_Site,DESIRED_Sites)&&
          queue
                          ((GLIDEIN_Min_Mem>DESIRED_Mem)=!=False)
Example




           Start   =   stringListMember(GLIDEIN_Site,DESIRED_Sites)&&
                       ((Memory>DESIRED_Mem)=!=False)

Condor Week 2012 - May 2012         No user requirements                27
The overflow use case
 ●   Normally, CMS jobs run near the data
 ●   But some jobs could run over the WAN
      ●   It is just slightly less efficient



                              Xrootd based WAN access
                                 if you are interested




Condor Week 2012 - May 2012      No user requirements    28
The overflow use case
 ●   Normally, CMS jobs run near the data
 ●   Jobs could run over the WAN
      ●   It is just slightly less efficient
 ●   But if some CPUs are idle due to low demand
      ●   A low efficiency job is still better than no job!
      ●   As long as it results
          in users getting                    We call this
          their results sooner               “overflowing”
            –   i.e. only if
                “optimal resources” are not available

Condor Week 2012 - May 2012       No user requirements        29
The overflow use case
 ●   Normally, CMS jobs run near the data
 ●   Jobs could run over the WAN
      ●   It is just slightly less efficient
 ●   But if some CPUs are idle due to low demand
      ●   A low efficiency job is still better than no job!
      ●   As long as it results
          in users getting
          their results sooner              So, how do we
           – i.e. only if                  implement this?
               “optimal resources” are not available

Condor Week 2012 - May 2012      No user requirements         30
Overflow configuration
  ●       Essentially, we change the rules!
          ●   Without involving the users
  ●       We write the requirements based on where the
          data is, not where the CPUs are
          ●   Since not all sites have xrootd installed

          entry_req = ((CurrentTime-QDate)>21600)&&(Country=?=”US”)&&
                         stringListsIntersect(DESIRED_Sites,”UCSD,Wisc”)&&
Example




                         ((GLIDEIN_Min_Mem>DESIRED_Mem)=!=False)
          Start = ((CurrentTime-QDate)>21600)&&
                    stringListsIntersect(DESIRED_Sites,”UCSD,Wisc”)&&
                    ((Memory>DESIRED_Mem)=!=False)

Condor Week 2012 - May 2012       No user requirements                   31
Overflow configuration
  ●        Essentially, we change the rules!
            ●   Without involving the users                No change to the
                                                           user submit file!
  ●        We write the requirements based on where the
           data is, not where the CPUs are
                    a.submit

          Executable = a.sh
            ● Since not all sites have xrootd installed
          Output = a.out
          +DESIRED_Sites=”UCSD,Nebraska”
          Requirements=True
           entry_req = ((CurrentTime-QDate)>21600)&&(Country=?=”US”)&&
          queue            stringListsIntersect(DESIRED_Sites,”UCSD,Wisc”)&&
Example




                           ((GLIDEIN_Min_Mem>DESIRED_Mem)=!=False)
           Start = ((CurrentTime-QDate)>21600)&&
                      stringListsIntersect(DESIRED_Sites,”UCSD,Wisc”)&&
                      ((Memory>DESIRED_Mem)=!=False)

Condor Week 2012 - May 2012         No user requirements                       32
Overflow configuration
  ●        Essentially,decide
           And we can we change the rules!
  where to overflow from users
    ● Without involving the                                No change to the
        hours after                                        user submit file!
 the jobs werethe requirements
 ● We write submitted                based on where the
           data is, not where the CPUs are
                    a.submit

          Executable = a.sh
            ● Since not all sites have xrootd installed
          Output = a.out
          +DESIRED_Sites=”UCSD,Nebraska”
          Requirements=True
           entry_req = ((CurrentTime-QDate)>21600)&&(Country=?=”US”)&&
          queue                          ”UCSD,Wisc”
                           stringListsIntersect(DESIRED_Sites,”UCSD,Wisc”)&&
Example




                           ((GLIDEIN_Min_Mem>DESIRED_Mem)=!=False)
           Start = ((CurrentTime-QDate)>21600)&&
                      stringListsIntersect(DESIRED_Sites,”UCSD,Wisc”)&&
                      ((Memory>DESIRED_Mem)=!=False)

Condor Week 2012 - May 2012         No user requirements                       33
Looking
                      at the future


Condor Week 2012 - May 2012   No user requirements   34
Looking at the future
 ●   The attribute schema now a fixed one
      ●   At least regarding matchmaking
      ●   Opens up new interesting possibilities




Condor Week 2012 - May 2012   No user requirements   35
Looking at the future
 ●   The attribute schema now a fixed one
      ●   At least regarding matchmaking
 ●   Can consider RDBMS techniques
      ●   Example uses
            –   RDMBS driven matchmaking
            –   Tracking of job and/or machine history in a DB




Condor Week 2012 - May 2012       No user requirements           36
Looking at the future
 ●   The attribute schema now a fixed one
      ●   At least regarding matchmaking
 ●   Can consider RDBMS techniques
      ●   Example uses
            –   RDMBS driven matchmaking
            –   Tracking of job and/or machine history in a DB
      ●   Will likely need code changes in Condor
            –   UCSD committed to try it our in the near future
            –   If the results end up as good as hoped for,
                will push for official adoption


Condor Week 2012 - May 2012       No user requirements            37
Conclusions
 ●   Matchmaking is made of requirements
      ●   But there are two places where they can be defined
 ●   Usually, users expected to set requirements
      ●   CMS glideinWMS setup moves it completely
          into the resource domain
 ●   Experience with no-user-req setup very positive
      ●   Advocating that this should be the recommended
          way for all Condor deployments
 ●   Also opens up interesting new venues

Condor Week 2012 - May 2012     No user requirements       38
Acknowledgments
 ●   This work is partially sponsored by
      ●   the US National Science Foundation under Grants
          No. PHY-1104549 (AAA) and PHY-0612805
          (CMS Maintenance & Operations)
          and
      ●   the US Department of Energy under Grant No. DE-
          FC02-06ER41436 subcontract No. 647F290 (OSG).




Condor Week 2012 - May 2012       No user requirements   39

Más contenido relacionado

Similar a An argument for moving the requirements out of user hands - The CMS Experience

Introductiontorails 120804023905-phpapp02
Introductiontorails 120804023905-phpapp02Introductiontorails 120804023905-phpapp02
Introductiontorails 120804023905-phpapp02
Sopheak Sem
 

Similar a An argument for moving the requirements out of user hands - The CMS Experience (20)

glideinWMS Frontend Installation - Part 2 - Frontend Installation -glideinWM...
 glideinWMS Frontend Installation - Part 2 - Frontend Installation -glideinWM... glideinWMS Frontend Installation - Part 2 - Frontend Installation -glideinWM...
glideinWMS Frontend Installation - Part 2 - Frontend Installation -glideinWM...
 
Matchmaking in glideinWMS in CMS
Matchmaking in glideinWMS in CMSMatchmaking in glideinWMS in CMS
Matchmaking in glideinWMS in CMS
 
Monitoring and troubleshooting a glideinWMS-based HTCondor pool
Monitoring and troubleshooting a glideinWMS-based HTCondor poolMonitoring and troubleshooting a glideinWMS-based HTCondor pool
Monitoring and troubleshooting a glideinWMS-based HTCondor pool
 
Glidein internals
Glidein internalsGlidein internals
Glidein internals
 
Introduction to glideinWMS
Introduction to glideinWMSIntroduction to glideinWMS
Introduction to glideinWMS
 
Nagios Conference 2012 - Nathan Vonnahme - Monitoring the User Experience
Nagios Conference 2012 - Nathan Vonnahme - Monitoring the User ExperienceNagios Conference 2012 - Nathan Vonnahme - Monitoring the User Experience
Nagios Conference 2012 - Nathan Vonnahme - Monitoring the User Experience
 
Evolving to Cloud-Native - Anand Rao
Evolving to Cloud-Native - Anand RaoEvolving to Cloud-Native - Anand Rao
Evolving to Cloud-Native - Anand Rao
 
glideinWMS validation scirpts - glideinWMS Training Jan 2012
glideinWMS validation scirpts - glideinWMS Training Jan 2012glideinWMS validation scirpts - glideinWMS Training Jan 2012
glideinWMS validation scirpts - glideinWMS Training Jan 2012
 
Client Vs. Server Rendering
Client Vs. Server RenderingClient Vs. Server Rendering
Client Vs. Server Rendering
 
glideinWMS Frontend Installation - Part 1 - Condor Installation - glideinWMS ...
glideinWMS Frontend Installation - Part 1 - Condor Installation - glideinWMS ...glideinWMS Frontend Installation - Part 1 - Condor Installation - glideinWMS ...
glideinWMS Frontend Installation - Part 1 - Condor Installation - glideinWMS ...
 
glideinWMS Architecture - glideinWMS Training Jan 2012
glideinWMS Architecture - glideinWMS Training Jan 2012glideinWMS Architecture - glideinWMS Training Jan 2012
glideinWMS Architecture - glideinWMS Training Jan 2012
 
Backwards Compatibility: Strategies and Tactics
Backwards Compatibility: Strategies and TacticsBackwards Compatibility: Strategies and Tactics
Backwards Compatibility: Strategies and Tactics
 
Introduction to rails
Introduction to railsIntroduction to rails
Introduction to rails
 
Introductiontorails 120804023905-phpapp02
Introductiontorails 120804023905-phpapp02Introductiontorails 120804023905-phpapp02
Introductiontorails 120804023905-phpapp02
 
Pavel Prischepa. Wodby
Pavel Prischepa. WodbyPavel Prischepa. Wodby
Pavel Prischepa. Wodby
 
Pavel Prischepa - Wodby
Pavel Prischepa - WodbyPavel Prischepa - Wodby
Pavel Prischepa - Wodby
 
Best Practices in PHP Application Deployment
Best Practices in PHP Application DeploymentBest Practices in PHP Application Deployment
Best Practices in PHP Application Deployment
 
Drools planner - 2012-10-23 IntelliFest 2012
Drools planner - 2012-10-23 IntelliFest 2012Drools planner - 2012-10-23 IntelliFest 2012
Drools planner - 2012-10-23 IntelliFest 2012
 
Java interfaces design perspective
Java interfaces design perspectiveJava interfaces design perspective
Java interfaces design perspective
 
Phpers day 2019
Phpers day 2019Phpers day 2019
Phpers day 2019
 

Más de Igor Sfiligoi

Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...
Igor Sfiligoi
 
The anachronism of whole-GPU accounting
The anachronism of whole-GPU accountingThe anachronism of whole-GPU accounting
The anachronism of whole-GPU accounting
Igor Sfiligoi
 

Más de Igor Sfiligoi (20)

Preparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYROPreparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYRO
 
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
 
Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...
 
The anachronism of whole-GPU accounting
The anachronism of whole-GPU accountingThe anachronism of whole-GPU accounting
The anachronism of whole-GPU accounting
 
Auto-scaling HTCondor pools using Kubernetes compute resources
Auto-scaling HTCondor pools using Kubernetes compute resourcesAuto-scaling HTCondor pools using Kubernetes compute resources
Auto-scaling HTCondor pools using Kubernetes compute resources
 
Speeding up bowtie2 by improving cache-hit rate
Speeding up bowtie2 by improving cache-hit rateSpeeding up bowtie2 by improving cache-hit rate
Speeding up bowtie2 by improving cache-hit rate
 
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsPerformance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
 
Comparing GPU effectiveness for Unifrac distance compute
Comparing GPU effectiveness for Unifrac distance computeComparing GPU effectiveness for Unifrac distance compute
Comparing GPU effectiveness for Unifrac distance compute
 
Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...
 
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory AccessAccelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
 
Using A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific OutputUsing A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific Output
 
Using commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobsUsing commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobs
 
Modest scale HPC on Azure using CGYRO
Modest scale HPC on Azure using CGYROModest scale HPC on Azure using CGYRO
Modest scale HPC on Azure using CGYRO
 
Data-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstData-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud Burst
 
Scheduling a Kubernetes Federation with Admiralty
Scheduling a Kubernetes Federation with AdmiraltyScheduling a Kubernetes Federation with Admiralty
Scheduling a Kubernetes Federation with Admiralty
 
Accelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACCAccelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACC
 
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
 
Porting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUsPorting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUs
 
Demonstrating 100 Gbps in and out of the public Clouds
Demonstrating 100 Gbps in and out of the public CloudsDemonstrating 100 Gbps in and out of the public Clouds
Demonstrating 100 Gbps in and out of the public Clouds
 
TransAtlantic Networking using Cloud links
TransAtlantic Networking using Cloud linksTransAtlantic Networking using Cloud links
TransAtlantic Networking using Cloud links
 

Último

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Último (20)

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

An argument for moving the requirements out of user hands - The CMS Experience

  • 1. Condor Week 2012 An argument for moving the requirements out of user hands - The CMS experience presented by Igor Sfiligoi UC San Diego Condor Week 2012 - May 2012 No user requirements 1
  • 2. The basics ● Condor architecture clearly separates ● Resource providers Machines (aka worker nodes) CPUs, Memory, IO,... from ● Resource consumers Job queues (aka submit nodes) Jobs submitted by users ● Each has a daemon process to represent it Schedd Negotiator Startd for resource provides ... ● Collector Schedd Startd ● Schedd for resource consumers ... Startd ● A central service connects them all ... Startd ● Managed by a Collector/Negotiator pair Condor Week 2012 - May 2012 No user requirements 2
  • 3. Matchmaking ● In order for a job to start running on a resource ● The job requirements must evaluate to True ● The machine requirements must evaluate to True There is also the ranking, but that's 2nd level optimization Condor Week 2012 - May 2012 No user requirements 3
  • 4. Matchmaking ● In order for a job to start running on a resource ● The job requirements must evaluate to True ● The machine requirements Most manuals must evaluate to True focus on job reqs There is also the ranking, but that's 2nd level optimization Condor Week 2012 - May 2012 No user requirements 4
  • 5. Matchmaking ● In order for a job to start running on a resource ● The job requirements must evaluate to True ● The machine requirements Most manuals must evaluate to True focus on job reqs There is also the ranking, but that's 2nd level optimization Machine reqs deemed for handling Owner state only Condor Week 2012 - May 2012 No user requirements 5
  • 6. My argument Get rid of Job Requirements Put all logic in the Machine Requirements Condor Week 2012 - May 2012 No user requirements 6
  • 7. Some background ● I am a big glideinWMS user http://tinyurl.com/glideinWMS ● And glideinWMS has 2 level matchmaking ● One at VO Frontend level – where to send glideins ● One at Negotiator level – which job to start in a glidein Collector Schedd Negotiator glidein Execution node Startd Frontend Job Factory Condor Week 2012 - May 2012 No user requirements 7
  • 8. Matchmaking problem ● Both levels must be in sync, or you either ● Ask for glideins which never match any jobs Job matched to site in VO Frontend but not to site's machine in the Negotiator Condor Week 2012 - May 2012 No user requirements 8
  • 9. Matchmaking problem ● Both levels must be in sync, or you either ● Ask for glideins which never match any jobs, or ● Have job waiting in the queue when site available Job doesn't match to site in VO Frontend but would to site's machine in the Negotiator if glideins were requested Condor Week 2012 - May 2012 No user requirements 9
  • 10. Matchmaking problem ● Both levels must be in sync, or you either ● Ask for glideins which never match any jobs, or ● Have job waiting in the queue when site available ● But site and machine adds have different attributes ● Not all machines on a site are exactly the same So I need 2 different requirements Condor Week 2012 - May 2012 No user requirements 10
  • 11. The usual dilemma ● Where do I define these requirements? ● In user job ClassAd? ● Or in the Resource ClassAds? – And we have 2 different resource types here Condor Week 2012 - May 2012 No user requirements 11
  • 12. The usual dilemma ● Where do I define these requirements? ● In user job ClassAd? ● Or in the Resource ClassAds? – And we have 2 different resource types here The 2 resources are handled by the same admin (in VO Frontend) Condor Week 2012 - May 2012 No user requirements 12
  • 13. The usual dilemma ● Where do I define these requirements? ● In user job ClassAd? ● Or in the Resource ClassAds? – And we have 2 different resource types here Potentially O(1k) The 2 resources users! are handled by the same admin (in VO Frontend) Typically O(1) Condor Week 2012 - May 2012 No user requirements 13
  • 14. The usual dilemma ● Where do I define these requirements? ● In user job ClassAd? ● Or in the Resource ClassAds? – And we have 2 different resource types here And do you really trust your users The 2 resources to do the right thing? are handled by the same admin (in VO Frontend) Potentially Typically O(1k) O(1) Condor Week 2012 - May 2012 No user requirements 14
  • 15. Moving reqs to the resources ● So we went for setting the requirements in the resources themselves Condor Week 2012 - May 2012 No user requirements 15
  • 16. Moving reqs to the resources ● So we went for setting the requirements in the resources themselves ● But users still need a way to select resources! ● How do they do it??? Condor Week 2012 - May 2012 No user requirements 16
  • 17. Moving reqs to the resources ● So we went for setting the requirements in the resources themselves ● But users still need a way to select resources! ● How do they do it??? They express their needs through attributes! Condor Week 2012 - May 2012 No user requirements 17
  • 18. Fixed schema ● The resource provider defines the requirements a.k.a startd + VO Frontend ● The VO Frontend admin in our case ● Those requirements look for well-defined user-provided attributes Site attribute Machine attribute Job attribute entry_req = stringListMember(GLIDEIN_Site,DESIRED_Sites)&& ((GLIDEIN_Min_Mem>DESIRED_Mem)=!=False) Example Start = stringListMember(GLIDEIN_Site,DESIRED_Sites)&& ((Memory>DESIRED_Mem)=!=False) Condor Week 2012 - May 2012 No user requirements 18
  • 19. Simple user job submit file ● No complex requirements to write ● Very little user training ● Low error rate a.submit Executable = a.sh Output = a.out +DESIRED_Sites=”UCSD,Nebraska” Requirements=True queue Condor Week 2012 - May 2012 No user requirements 19
  • 20. How well does it work? ● Advantages ● Disadvantages ● Easy to keep the two ● Rigid schema levels in sync ● Easy to define reasonable defaults ● Easy on the users ● And more... (wait for later slides) Condor Week 2012 - May 2012 No user requirements 20
  • 21. How well does it work? ● Advantages ● Disadvantages ● Easy to keep the two ● Rigid schema levels in sync ● Easy to define reasonable defaults ● Easy on the users ● And more... In our experience, (wait for later slides) does not need to change more than a couple of times a year Condor Week 2012 - May 2012 No user requirements 21
  • 22. Is this glideinWMS specific? ● Advantages ● Disadvantages ● Easy to keep the two Don't think so! ● Rigid schema levels in sync ● Easy to define reasonable defaults The ease of use ● Easy on the users is there for any Condor setup Condor Week 2012 - May 2012 No user requirements 22
  • 23. Is this glideinWMS specific? ● Advantages ● Disadvantages ● Easy to keep the two Don't think so! ● Rigid schema levels in sync ● Easy to define reasonable defaults ● Easy on the users The ease of use I would argue it is there for should become any Condor setup standard practice Condor Week 2012 - May 2012 No user requirements 23
  • 24. And there is still more to it Condor Week 2012 - May 2012 No user requirements 24
  • 25. Side effect ● Discovered one unexpected nice side effect Condor Week 2012 - May 2012 No user requirements 25
  • 26. Side effect ● Discovered one unexpected nice side effect We can outsmart our users! Condor Week 2012 - May 2012 No user requirements 26
  • 27. The overflow use case ● Normally, CMS jobs run near the data ● So users provide a whitelist of sites to run on ● And we have the appropriate glideinWMS expression a.submit Executable = a.sh Output = a.out +DESIRED_Sites=”UCSD,Nebraska” Requirements=True entry_req = stringListMember(GLIDEIN_Site,DESIRED_Sites)&& queue ((GLIDEIN_Min_Mem>DESIRED_Mem)=!=False) Example Start = stringListMember(GLIDEIN_Site,DESIRED_Sites)&& ((Memory>DESIRED_Mem)=!=False) Condor Week 2012 - May 2012 No user requirements 27
  • 28. The overflow use case ● Normally, CMS jobs run near the data ● But some jobs could run over the WAN ● It is just slightly less efficient Xrootd based WAN access if you are interested Condor Week 2012 - May 2012 No user requirements 28
  • 29. The overflow use case ● Normally, CMS jobs run near the data ● Jobs could run over the WAN ● It is just slightly less efficient ● But if some CPUs are idle due to low demand ● A low efficiency job is still better than no job! ● As long as it results in users getting We call this their results sooner “overflowing” – i.e. only if “optimal resources” are not available Condor Week 2012 - May 2012 No user requirements 29
  • 30. The overflow use case ● Normally, CMS jobs run near the data ● Jobs could run over the WAN ● It is just slightly less efficient ● But if some CPUs are idle due to low demand ● A low efficiency job is still better than no job! ● As long as it results in users getting their results sooner So, how do we – i.e. only if implement this? “optimal resources” are not available Condor Week 2012 - May 2012 No user requirements 30
  • 31. Overflow configuration ● Essentially, we change the rules! ● Without involving the users ● We write the requirements based on where the data is, not where the CPUs are ● Since not all sites have xrootd installed entry_req = ((CurrentTime-QDate)>21600)&&(Country=?=”US”)&& stringListsIntersect(DESIRED_Sites,”UCSD,Wisc”)&& Example ((GLIDEIN_Min_Mem>DESIRED_Mem)=!=False) Start = ((CurrentTime-QDate)>21600)&& stringListsIntersect(DESIRED_Sites,”UCSD,Wisc”)&& ((Memory>DESIRED_Mem)=!=False) Condor Week 2012 - May 2012 No user requirements 31
  • 32. Overflow configuration ● Essentially, we change the rules! ● Without involving the users No change to the user submit file! ● We write the requirements based on where the data is, not where the CPUs are a.submit Executable = a.sh ● Since not all sites have xrootd installed Output = a.out +DESIRED_Sites=”UCSD,Nebraska” Requirements=True entry_req = ((CurrentTime-QDate)>21600)&&(Country=?=”US”)&& queue stringListsIntersect(DESIRED_Sites,”UCSD,Wisc”)&& Example ((GLIDEIN_Min_Mem>DESIRED_Mem)=!=False) Start = ((CurrentTime-QDate)>21600)&& stringListsIntersect(DESIRED_Sites,”UCSD,Wisc”)&& ((Memory>DESIRED_Mem)=!=False) Condor Week 2012 - May 2012 No user requirements 32
  • 33. Overflow configuration ● Essentially,decide And we can we change the rules! where to overflow from users ● Without involving the No change to the hours after user submit file! the jobs werethe requirements ● We write submitted based on where the data is, not where the CPUs are a.submit Executable = a.sh ● Since not all sites have xrootd installed Output = a.out +DESIRED_Sites=”UCSD,Nebraska” Requirements=True entry_req = ((CurrentTime-QDate)>21600)&&(Country=?=”US”)&& queue ”UCSD,Wisc” stringListsIntersect(DESIRED_Sites,”UCSD,Wisc”)&& Example ((GLIDEIN_Min_Mem>DESIRED_Mem)=!=False) Start = ((CurrentTime-QDate)>21600)&& stringListsIntersect(DESIRED_Sites,”UCSD,Wisc”)&& ((Memory>DESIRED_Mem)=!=False) Condor Week 2012 - May 2012 No user requirements 33
  • 34. Looking at the future Condor Week 2012 - May 2012 No user requirements 34
  • 35. Looking at the future ● The attribute schema now a fixed one ● At least regarding matchmaking ● Opens up new interesting possibilities Condor Week 2012 - May 2012 No user requirements 35
  • 36. Looking at the future ● The attribute schema now a fixed one ● At least regarding matchmaking ● Can consider RDBMS techniques ● Example uses – RDMBS driven matchmaking – Tracking of job and/or machine history in a DB Condor Week 2012 - May 2012 No user requirements 36
  • 37. Looking at the future ● The attribute schema now a fixed one ● At least regarding matchmaking ● Can consider RDBMS techniques ● Example uses – RDMBS driven matchmaking – Tracking of job and/or machine history in a DB ● Will likely need code changes in Condor – UCSD committed to try it our in the near future – If the results end up as good as hoped for, will push for official adoption Condor Week 2012 - May 2012 No user requirements 37
  • 38. Conclusions ● Matchmaking is made of requirements ● But there are two places where they can be defined ● Usually, users expected to set requirements ● CMS glideinWMS setup moves it completely into the resource domain ● Experience with no-user-req setup very positive ● Advocating that this should be the recommended way for all Condor deployments ● Also opens up interesting new venues Condor Week 2012 - May 2012 No user requirements 38
  • 39. Acknowledgments ● This work is partially sponsored by ● the US National Science Foundation under Grants No. PHY-1104549 (AAA) and PHY-0612805 (CMS Maintenance & Operations) and ● the US Department of Energy under Grant No. DE- FC02-06ER41436 subcontract No. 647F290 (OSG). Condor Week 2012 - May 2012 No user requirements 39