SlideShare una empresa de Scribd logo
1 de 36
Descargar para leer sin conexión
glideinWMS Training @ UCSD




                     Condor tunning
                       by Igor Sfiligoi (UCSD)




UCSD Jan 18th 2012           Condor Tunning      1
Regulating
                     User Priorities




UCSD Jan 18th 2012        Condor Tunning   2
User priorities
 ●   By default, the Negotiator treats all users equally
      ●   You get fair-share out of the box
          If you have two users, on average
          each gets ½ of slots
 ●   If you have different needs, Negotiator supports
      ●   Priority factors
           –   PF reduces user priority
           –   If users X and Y have PFX=(N-1)*PFY, on average
               user X gets 1/N of slots (with user Y the rest)
      ●   Accounting groups (see next slide)
                     http://research.cs.wisc.edu/condor/manual/v7.6/3_4User_Priorities.html

UCSD Jan 18th 2012                                  Condor Tunning                            3
Accounting groups
 ●   Users can be joined in accounting groups
      ●   The Negotiator defines the groups,
          but jobs specify which group they belong to
 ●   Each group can be given a quota
      ●   Can be relative to the size of the pool
      ●   Sum of group jobs cannot exceed it
 ●   If quotas >100%, can be used for relative prio
      ●   Here higher is better
                                                    Jobs without any
      ●   Each group will be given,                 group may never
                                                    get anything
          on average, quota /sum(quotas) of slots
                             G




UCSD Jan 18th 2012                Condor Tunning                       4
Configuring AC
 ●   To enable accounting groups, just define
     the quotas in the Negotiator config file
      ●   Which jobs go into which group not defined here

                     #condor_config.local
                      #condor_config.local
                     ## 1.0 is100%
                       1.0 is 100%
                     GROUP_QUOTA_DYNAMIC_group_higgs == 3.2
                      GROUP_QUOTA_DYNAMIC_group_higgs 3.2
                     GROUP_QUOTA_DYNAMIC_group_b ==
                      GROUP_QUOTA_DYNAMIC_group_b      2.2
                                                        2.2
                     GROUP_QUOTA_DYNAMIC_group_k ==
                      GROUP_QUOTA_DYNAMIC_group_k      6.5
                                                        6.5
                     GROUP_QUOTA_DYNAMIC_group_susy == 8.8
                      GROUP_QUOTA_DYNAMIC_group_susy 8.8




UCSD Jan 18th 2012                    Condor Tunning          5
Using the AC
 ●   Users must specify which group they belong to
      ●   No automatic mapping or validation in Condor
      ●   Based on trust
 ●   Jobs must add to their submit file
     +AccountingGroup = "<group>.<user>"
           Universe
            Universe = vanilla
                       = vanilla
           Executable = cosmos
            Executable = cosmos
           Arguments = -k 1543.3
            Arguments = -k 1543.3
           Output
            Output    = cosmos.out
                       = cosmos.out
           Input
            Input     = cosmos.in
                       = cosmos.in
           Log
            Log       = cosmos.log
                       = cosmos.log
           +AccountingGroup = "group_higgs.frieda"
            +AccountingGroup = "group_higgs.frieda"
           Queue 1
            Queue 1
UCSD Jan 18th 2012          Condor Tunning               6
Configuring PF
                                                                                                            Anyone can read
 ●   Priority factors a dynamic property                                                                    Can only be set
                                                                                                            by the Negotiator
      ●   Set by cmdline tool condor_userprio                                                               administrator

       $ condor_userprio -all -allusers |grep user1@node1
        $ condor_userprio -all -allusers |grep user1@node1
       user1@node1           8016.22 8.02          10.00  0 15780.63 11/23/2011 05:59 12/30/2011 20:37
        user1@node1           8016.22 8.02          10.00  0 15780.63 11/23/2011 05:59 12/30/2011 20:37
       $ condor_userprio -setfactor user1@node1 1000
        $ condor_userprio -setfactor user1@node1 1000
       The priority factor of user1@node1 was set to 1000.000000
        The priority factor of user1@node1 was set to 1000.000000
       $ condor_userprio -all -allusers |grep user1@node1
        $ condor_userprio -all -allusers |grep user1@node1
       user1@node1           8016.22 8.02        1000.00  0 15780.63 11/23/2011 05:59 12/30/2011 20:37
        user1@node1           8016.22 8.02        1000.00  0 15780.63 11/23/2011 05:59 12/30/2011 20:37

      ●   Although persistent between Negotiator restarts
      ●   May want to set higher default PF (e.g. 1000)
            –   default of 1, and user PF cannot go below
            –   Attribute regulated by DEFAULT_PRIO_FACTOR
           http://research.cs.wisc.edu/condor/manual/v7.6/2_7Priorities_Preemption.html#sec:user-priority-explained


UCSD Jan 18th 2012                                    Condor Tunning                                                    7
Negotiation cycle
                       optimization




UCSD Jan 18th 2012         Condor Tunning   8
Negotiation internals
 ●   The Negotiator does the matchmaking in loops
      ●   Get all Machine ClassAds (from Collector)
      ●   Get Job ClassAds (from Schedds)
          one by one, prioritized by user
           –   Tell Schedd if a match has been found
      ●   Sleep
      ●   Repeat




UCSD Jan 18th 2012                Condor Tunning       9
Sleep time
 ●   The Negotiator sleeps between cycles
     to save CPU
      ●   But will be interrupted if new jobs are submitted
 ●   Makes sense for static pools                                                  The sleep will not be interrupted
                                                                                   if a new glidein join
      ●   But can be a problem for
          dynamic ones like glideins                                                                  Unmatched
                                                                                                       glideins
 ●   Keep the sleep as low as feasible
     NEGOTIATOR_INTERVAL
      ●   Installer sets it to 60s
                              NEGOTIATOR_INTERVAL ==60
                               NEGOTIATOR_INTERVAL 60
                http://www.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#param:NegotiatorInterval

UCSD Jan 18th 2012                                 Condor Tunning                                                 10
Preemption
 ●   By default, Condor will preempt a user as soon
     as jobs from a higher priority users are available
      ●   Preemption == wasted CPU                                                  Preemption==killing jobs

 ●   The glideinWMS installer disables preemption
      ●   Once a glidein is given to a user, he can keep it until
          the glidein dies

             ## Prevent preemption
               Prevent preemption                                                                     Glidein lifetime
             PREEMPTION_REQUIREMENTS ==False
              PREEMPTION_REQUIREMENTS False                                                            is limited, so
             ## Speed up Matchmaking
               Speed up Matchmaking                                                                      fair share
             NEGOTIATOR_CONSIDER_PREEMPTION ==False
              NEGOTIATOR_CONSIDER_PREEMPTION False                                                     still enforced
                                                                                                         long term
           http://www.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#param:NegotiatorConsiderPreemption

UCSD Jan 18th 2012                                  Condor Tunning                                                 11
Protecting against dead glideins
 ●   Glideins can die
      ●   Just a fact of life on the Grid
 ●   But their ClassAds will stay in the Collector
     for about 20 mins
      ●   If they were Idle at the time, will match jobs!
 ●   Make sure the lowest priority user get stuck
      ●   Order Machine ClassAds by freshness
                 NEGOTIATOR_POST_JOB_RANK ==MY.LastHeardFrom
                  NEGOTIATOR_POST_JOB_RANK MY.LastHeardFrom

              http://www.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#param:NegotiatorPostJobRank


UCSD Jan 18th 2012                                 Condor Tunning                                           12
Protecting against slow schedds
 ●   If one schedd gets stuck, you want to limit the
     damage, and match jobs from other schedds
      ●   Negotiator can limit the time spent on each schedd
                                               Was really important before 7.6.X... less of an issue now
 ●   The glideinWMS installer sets these limits
      ●   See Condor manual for details
                     NEGOTIATOR_MAX_TIME_PER_SUBMITTER=60
                      NEGOTIATOR_MAX_TIME_PER_SUBMITTER=60
                     NEGOTIATOR_MAX_TIME_PER_PIESPIN=20
                      NEGOTIATOR_MAX_TIME_PER_PIESPIN=20

          http://www.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#param:NegotiatorMaxTimePerSubmitter




UCSD Jan 18th 2012                                 Condor Tunning                                               13
One way matchmaking
 ●   The Negotiator normally sends the match both
     the Schedd and the Startd
      ●   Only Schedd strictly needed
 ●   Given that the glideins can be behind a firewall,
     it is a good idea to disable this behavior

                      NEGOTIATOR_INFORM_STARTD ==False
                       NEGOTIATOR_INFORM_STARTD False




UCSD Jan 18th 2012                 Condor Tunning        14
Collector tunning




UCSD Jan 18th 2012         Condor Tunning   15
Collector tunning
 ●   The major tunning is the Collector tree
      ●   Already discussed in the Installation section
 ●   Three more optional tunnings
      ●   Security session lifetime
      ●   File descriptor limits
      ●   Client handling parameters




UCSD Jan 18th 2012             Condor Tunning             16
Security session lifetime
 ●   All communication to the Collector (leafs)
     is GSI based
 ●   But GSI handshakes are expensive!
      ●   Condor thus uses it just to exchange a shared
          secret (i.e. password) with the remote glidein
      ●   Will use this passwd for all further communications
      ●   But the passwd also has a limited lifetime
                                                                                                      To limit damage
 ●   Default lifetime quite long                                                                      in case it is stolen
      ●   The glideinWMS installer                                                               Was months in old
          limits it to 12h                                                                       versions of Condor!
                           SEC_DAEMON_SESSION_DURATION == 50000
                            SEC_DAEMON_SESSION_DURATION 50000
            http://www.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#param:SecDefaultSessionDuration
                                     http://iopscience.iop.org/1742-6596/219/4/042017

UCSD Jan 18th 2012                                 Condor Tunning                                                  17
File descriptor limit
 ●   The Collector (leafs) also function as CCB
      ●   Will use 5+ FDs for every glidein they serve!
 ●   By default, OS only gives 1024 FDs to each
     process
      ●   But Condor will limit itself to ~80% of that
      ●   So it has a few FDs left for internal needs (e.g. logs)
                                                                            Not done by
 ●   You may want to increase this limit                                    glideinWMS
                                                                            installer
      ●   Must be done as root                      The alternative is to just
          (and Collector usually not root)          run more secondary collectors
                                                    (recommended)
 ●   And don't forget the system-wide limits
     /proc/sys/fs/file-max

UCSD Jan 18th 2012                 Condor Tunning                                   18
Client handling limits
 ●   Every time someone runs
     condor_status                                                      Used by the
                                                                        Frontend processes
     the Collector must act on it
      ●   A query is also done implicitly when you run
          condor_q -name <schedd_name>
 ●   Since the Collector is a busy process, the
     Condor team added the option to fork on query
                                                         It only uses RAM if the parent
      ●   This will use up RAM!                          memory pages change
                                                         But not unusual on a busy Collector
      ●   So there is a limit
          COLLECTOR_QUERY_WORKERS
                                                           Defaults to 16
     You may want to either lower or increase it
UCSD Jan 18th 2012                      Condor Tunning                                   19
Firewall tunning
 ●   If you decided you want to keep the firewall
     you may need to tune it
      ●   Open ports for the Collector tree
      ●   And possibly the Negotiator                    Negotiator port usually
          (if Schedds outside the perimeter)             dynamic... will need to
                                                         fix it with
 ●   You may also need to                          NEGOTIATOR_ARGS = -f -p 9617

     tune the firewall scalability knobs
 Statefull firewalls    ## foriptables
                          for iptables
 will have to track     sysctl -w net.ipv4.netfilter.ip_conntrack_max=385552;
                         sysctl -w net.ipv4.netfilter.ip_conntrack_max=385552;
 5+ TCP connections     sysctl -p;
                         sysctl -p;
 per glidein            echo 48194 >>/sys/module/ip_conntrack/parameters/hashsize
                         echo 48194 /sys/module/ip_conntrack/parameters/hashsize


UCSD Jan 18th 2012                Condor Tunning                              20
Schedd tunning




UCSD Jan 18th 2012        Condor Tunning   21
Schedd tunning
 ●   The following knobs may need to be tuned
      ●   Job limits
      ●   Startup/Termination limits
      ●   Client handling limits
      ●   Match attributes
      ●   Requirements defaults
      ●   Periodic expressions
 ●   You also may need to tune the system settings


UCSD Jan 18th 2012            Condor Tunning         22
Job limits
 ●   As you may remember, each running job uses a
     non-negligible amount of resources
      ●   Thus the Schedd has a limit on them
          MAX_JOB_RUNNING
 ●   Set it to a value                                                             You may waste some
     that is right for your HW                                                     CPU on glideins, but
                                                                                   better than crashing
                                                                                   the submit node
                     ## glideinWMS installerprovided value
                       glideinWMS installer provided value
                     MAX_JOBS_RUNNING == 6000
                      MAX_JOBS_RUNNING 6000


                 http://www.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#param:MaxJobRunning



UCSD Jan 18th 2012                                Condor Tunning                                          23
Startup limits
 ●   Each time a job is started
      ●   The Schedd must create a shadow
      ●   The Shadow must load the input files from disk
      ●   The Shadow must transfer the files over the network
 ●   If too many start at once, may overload the node
 ●   Two sets of limits:                                                    Limit at the Schedd
      ●   JOB_START_DELAY/JOB_START_COUNT
      ●   MAX_CONCURRENT_UPLOADS
                                                                           Limit on the network
              http://www.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#param:JobStartCount
              http://www.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#param:MaxConcurrentUploads


UCSD Jan 18th 2012                                 Condor Tunning                                          24
Termination limits
 ●   Termination can be problematic as well
                                                                                                Can easily overload
      ●   Output files must be transferred back                                                 the submit node.
           –   And you do not know how big will it be!
      ●   If using glexec, an authentication call will be made
          on each glidein           Each calling the site GUMS.
                                                                Imagine O(1k) within a second!

 ●   Similar limits:                                                         Limit at the Schedd

      ●   JOB_STOP_DELAY/JOB_STOP_COUNT
      ●   MAX_CONCURRENT_DOWNLOADS
                                                                            Limit on the network
               http://www.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#param:JobStopCount
               http://www.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#param:MaxConcurrentDownloads

UCSD Jan 18th 2012                                 Condor Tunning                                              25
Client handling limits
 ●   Every time someone runs
     condor_q                                                                                 Used by the
                                                                                              Frontend processes
     the Schedd must act on it
 ●   Since the Schedd is a busy process, the
     Condor team added the option to fork on query
      ●   This will use up RAM!                                         It only uses RAM if the parent
                                                                        memory pages change
      ●   So there is a limit                                           But not unusual on a busy Schedd
          SCHEDD_QUERY_WORKERS
                                                                           Defaults to 3
                        You will want to increase it

              http://www.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#param:ScheddQueryWorkers



UCSD Jan 18th 2012                                 Condor Tunning                                            26
Match attributes for history
 ●   It is often useful to know where were
     the jobs running
      ●   By default, you get just the host name
          LastRemoteHost
      ●   May be useful to know which glidein factory
          it came from, too (etc.)
 ●   But you can get match info easily if you ...
      ●   Add to the job submit files
          job_attribute = $$(glidein attribute)
                     After matching, Schedd adds in the ClassAd
                     MATCH_EXP_job_attribute

UCSD Jan 18th 2012                  Condor Tunning                27
System level addition
  ●   You can add them in the system config
## condor_config.local
  condor_config.local
JOB_GLIDEIN_Factory =="$$(GLIDEIN_Factory:Unknown)"
 JOB_GLIDEIN_Factory "$$(GLIDEIN_Factory:Unknown)"
JOB_GLIDEIN_Name == "$$(GLIDEIN_Name:Unknown)"              Factory identification
 JOB_GLIDEIN_Name "$$(GLIDEIN_Name:Unknown)"
JOB_GLIDEIN_Entry_Name =="$$(GLIDEIN_Entry_Name:Unknown)"
 JOB_GLIDEIN_Entry_Name "$$(GLIDEIN_Entry_Name:Unknown)"
JOB_GLIDEIN_ClusterId == "$$(GLIDEIN_ClusterId:Unknown)"    Glidein identification
 JOB_GLIDEIN_ClusterId "$$(GLIDEIN_ClusterId:Unknown)"      (to give the Factory admins)
JOB_GLIDEIN_ProcId == "$$(GLIDEIN_ProcId:Unknown)"
 JOB_GLIDEIN_ProcId "$$(GLIDEIN_ProcId:Unknown)"
JOB_GLIDECLIENT_Name =="$$(GLIDECLIENT_Name:Unknown)"
 JOB_GLIDECLIENT_Name "$$(GLIDECLIENT_Name:Unknown)"        Frontend identification
JOB_GLIDECLIENT_Group == "$$((GLIDECLIENT_Group:Unknown)"
 JOB_GLIDECLIENT_Group "$$((GLIDECLIENT_Group:Unknown)"
JOB_Site == "$$(GLIDEIN_Site:Unknown)"
 JOB_Site "$$(GLIDEIN_Site:Unknown)"                        Any other attribute

SUBMIT_EXPRS == $(SUBMIT_EXPRS) 
 SUBMIT_EXPRS $(SUBMIT_EXPRS) 
  JOB_GLIDEIN_Factory JOB_GLIDEIN_Name  
   JOB_GLIDEIN_Factory JOB_GLIDEIN_Name
  JOB_GLIDEIN_Entry_Name                                  This is how you push
   JOB_GLIDEIN_Entry_Name                                   them in the user job
  JOB_GLIDEIN_ClusterId JOB_GLIDEIN_ProcId  
   JOB_GLIDEIN_ClusterId JOB_GLIDEIN_ProcId
  JOB_GLIDECLIENT_Name JOB_GLIDECLIENT_Group              ClassAds
   JOB_GLIDECLIENT_Name JOB_GLIDECLIENT_Group
  JOB_Site
   JOB_Site

 UCSD Jan 18th 2012              Condor Tunning                                     28
Additional job defaults
 ●   With glideins, we often want empty
     user job requirements
      ●   But Condor does not allow it
 ●   It will forcibly add something about
      ●   Arch                    e.g. (Arch == "INTEL")
      ●   Disk                    e.g. (Disk>DiskUsage)

      ●   Memory                  e.g. (Memory>ImageSize)

 ●   Unless they are already there... so lets add them
            ## condor_config.local
              condor_config.local
            APPEND_REQ_VANILLA ==(Memory>=1)&&(Disk>=1)&&(Arch=!="fake")
             APPEND_REQ_VANILLA (Memory>=1)&&(Disk>=1)&&(Arch=!="fake")

     Just to make sure it always evaluates to True
UCSD Jan 18th 2012                      Condor Tunning                     29
Periodic expressions
 ●   Unless you have very well behaved users,
     some of the jobs will be pathologically broken
      ●   You want to detect them and                     The will be wasting
                                                          CPU cycles at best
          remove the from the system
 ●   Condor allows for periodic expressions
     SYSTEM_PERIODIC_REMOVE                           Just remove
     SYSTEM_PERIODIC_HOLD                     Leave in the queue, but do not match

      ●   An arbitrary boolean expression
      ●   What to look for comes with experience


UCSD Jan 18th 2012           Condor Tunning                                     30
Example periodic expression

## condor_config.local
  condor_config.local
SYSTEM_PERIODIC_HOLD == 
 SYSTEM_PERIODIC_HOLD
 ((ImageSize>2000000) ||||                                                 Too much memory
   ((ImageSize>2000000)
  (DiskSize>1000000) |||| 
    (DiskSize>1000000)                                                      Too much disk space
  (((( JobStatus == 2 ) &&  
      JobStatus == 2 ) &&
   ((CurrentTime-JobCurrentStartDate ) ) >100000)) ||||                     Took too long
     ((CurrentTime-JobCurrentStartDate > 100000)) 
  (JobRunCount>5))
    (JobRunCount>5))                                                         Tried too many times

SYSTEM_PERIODIC_HOLD_REASON == 
 SYSTEM_PERIODIC_HOLD_REASON
  ifThenElse((ImageSize>2000000),  
    ifThenElse((ImageSize>2000000),
              “Used too much memory”,                                      In Condor 7.7.X
                “Used too much memory”,                                      you can also provide
   … 
     …
   ifThenElse((JobRunCount>5),                                             a human readable reason string
     ifThenElse((JobRunCount>5),
               “Tried too many times”,  
                 “Tried too many times”,
               “Reason unknown”)...)
                 “Reason unknown”)...)

       http://www.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#param:SystemPeriodicHold
       http://www.cs.wisc.edu/condor/manual/v7.7/3_3Configuration.html#param:SystemPeriodicHoldReason


  UCSD Jan 18th 2012                                  Condor Tunning                                      31
File descriptors and process IDs
 ●   Each Shadow keeps open O(10) FDs
      ●   And we have a shadow per running job
      ●   Make sure you have enough system-wide
          file descriptors
 ●   Make also sure the system can support all the
     shadow PIDs
                     ~$ echo 4849118 >> /proc/sys/fs/file-max
                      ~$ echo 4849118    /proc/sys/fs/file-max
                     ~$ cat /proc/sys/fs/file-nr
                      ~$ cat /proc/sys/fs/file-nr
                     59160 00 4849118
                      59160     4849118
                     ~$ echo 128000 >> /proc/sys/kernel/pid_max
                      ~$ echo 128000    /proc/sys/kernel/pid_max

                        http://www.cs.wisc.edu/condor/condorg/linux_scalability.html



UCSD Jan 18th 2012                                  Condor Tunning                     32
Firewall tunning
 ●   If you decided you want to keep the firewall
     you may need to tune it
      ●   At minimum, open the port of the
          Shared Port daemon
 ●   You may also need to
     tune the firewall scalability knobs
 Statefull firewalls
 will have to track
                        ## foriptables
                          for iptables
 2 connections
                        sysctl -w net.ipv4.netfilter.ip_conntrack_max=385552;
                         sysctl -w net.ipv4.netfilter.ip_conntrack_max=385552;
 per running job
                        sysctl -p;
                         sysctl -p;
                        echo 48194 >>/sys/module/ip_conntrack/parameters/hashsize
                         echo 48194 /sys/module/ip_conntrack/parameters/hashsize



UCSD Jan 18th 2012                Condor Tunning                              33
The End




UCSD Jan 18th 2012    Condor Tunning   34
Pointers
 ●   The official project Web page is
     http://tinyurl.com/glideinWMS
 ●   glideinWMS development team is reachable at
     glideinwms-support@fnal.gov
 ●   Condor Home Page
     http://www.cs.wisc.edu/condor/
 ●   Condor support
     condor-user@cs.wisc.edu
     condor-admin@cs.wisc.edu

UCSD Jan 18th 2012      Condor Tunning             35
Acknowledgments
 ●   The glideinWMS is a CMS-led project
     developed mostly at FNAL, with contributions
     from UCSD and ISI
 ●   The glideinWMS factory operations at UCSD is
     sponsored by OSG
 ●   The funding comes from NSF, DOE and the
     UC system




UCSD Jan 18th 2012        Condor Tunning            36

Más contenido relacionado

Destacado (8)

Option strategy presentation
Option strategy presentationOption strategy presentation
Option strategy presentation
 
Class 4 process control loops
Class 4   process control loopsClass 4   process control loops
Class 4 process control loops
 
Class 7 mathematical modeling of liquid-level systems
Class 7   mathematical modeling of liquid-level systemsClass 7   mathematical modeling of liquid-level systems
Class 7 mathematical modeling of liquid-level systems
 
Block Diagram For Control Systems.
Block Diagram For Control Systems.Block Diagram For Control Systems.
Block Diagram For Control Systems.
 
Quantitative Strategy - Option Decoder
Quantitative Strategy  - Option DecoderQuantitative Strategy  - Option Decoder
Quantitative Strategy - Option Decoder
 
block diagram representation of control systems
block diagram representation of  control systemsblock diagram representation of  control systems
block diagram representation of control systems
 
Options strategies
Options strategiesOptions strategies
Options strategies
 
Option Strategies
Option StrategiesOption Strategies
Option Strategies
 

Similar a Condor tuning optimization for dynamic glideinWMS pools

Understanding priorities in HTCondor
Understanding priorities in HTCondorUnderstanding priorities in HTCondor
Understanding priorities in HTCondorIgor Sfiligoi
 
Introduction to glideinWMS
Introduction to glideinWMSIntroduction to glideinWMS
Introduction to glideinWMSIgor Sfiligoi
 
glideinWMS validation scirpts - glideinWMS Training Jan 2012
glideinWMS validation scirpts - glideinWMS Training Jan 2012glideinWMS validation scirpts - glideinWMS Training Jan 2012
glideinWMS validation scirpts - glideinWMS Training Jan 2012Igor Sfiligoi
 
Condor overview - glideinWMS Training Jan 2012
Condor overview - glideinWMS Training Jan 2012Condor overview - glideinWMS Training Jan 2012
Condor overview - glideinWMS Training Jan 2012Igor Sfiligoi
 
Condor from the user point of view - glideinWMS Training Jan 2012
Condor from the user point of view - glideinWMS Training Jan 2012Condor from the user point of view - glideinWMS Training Jan 2012
Condor from the user point of view - glideinWMS Training Jan 2012Igor Sfiligoi
 
MySQL backup and restore performance
MySQL backup and restore performanceMySQL backup and restore performance
MySQL backup and restore performanceVinicius M Grippa
 
Building a continuous delivery platform for the biggest spike in e-commerce -...
Building a continuous delivery platform for the biggest spike in e-commerce -...Building a continuous delivery platform for the biggest spike in e-commerce -...
Building a continuous delivery platform for the biggest spike in e-commerce -...Puppet
 
glideinWMS Frontend Monitoring - glideinWMS Training Jan 2012
glideinWMS Frontend Monitoring - glideinWMS Training Jan 2012glideinWMS Frontend Monitoring - glideinWMS Training Jan 2012
glideinWMS Frontend Monitoring - glideinWMS Training Jan 2012Igor Sfiligoi
 
Monitoring with Ganglia
Monitoring with GangliaMonitoring with Ganglia
Monitoring with GangliaFastly
 
PGConf.ASIA 2019 Bali - Patroni in 2019 - Alexander Kukushkin
PGConf.ASIA 2019 Bali - Patroni in 2019 - Alexander KukushkinPGConf.ASIA 2019 Bali - Patroni in 2019 - Alexander Kukushkin
PGConf.ASIA 2019 Bali - Patroni in 2019 - Alexander KukushkinEqunix Business Solutions
 
Buytaert kris tools
Buytaert kris toolsBuytaert kris tools
Buytaert kris toolskuchinskaya
 
7 tools for your devops stack
7 tools for your devops stack7 tools for your devops stack
7 tools for your devops stackKris Buytaert
 
Automotive Grade Linux and systemd
Automotive Grade Linux and systemdAutomotive Grade Linux and systemd
Automotive Grade Linux and systemdAlison Chaiken
 
Apache zookeeper 101
Apache zookeeper 101Apache zookeeper 101
Apache zookeeper 101Quach Tung
 
Extending CMS Made Simple
Extending CMS Made SimpleExtending CMS Made Simple
Extending CMS Made Simplecmsmssjg
 
Understanding Doctrine at True North PHP 2013
Understanding Doctrine at True North PHP 2013Understanding Doctrine at True North PHP 2013
Understanding Doctrine at True North PHP 2013Juti Noppornpitak
 
11 tools for your PHP devops stack
11 tools for your PHP devops stack11 tools for your PHP devops stack
11 tools for your PHP devops stackKris Buytaert
 
Introduction to apache zoo keeper
Introduction to apache zoo keeper Introduction to apache zoo keeper
Introduction to apache zoo keeper Omid Vahdaty
 

Similar a Condor tuning optimization for dynamic glideinWMS pools (20)

Understanding priorities in HTCondor
Understanding priorities in HTCondorUnderstanding priorities in HTCondor
Understanding priorities in HTCondor
 
Introduction to glideinWMS
Introduction to glideinWMSIntroduction to glideinWMS
Introduction to glideinWMS
 
glideinWMS validation scirpts - glideinWMS Training Jan 2012
glideinWMS validation scirpts - glideinWMS Training Jan 2012glideinWMS validation scirpts - glideinWMS Training Jan 2012
glideinWMS validation scirpts - glideinWMS Training Jan 2012
 
Condor overview - glideinWMS Training Jan 2012
Condor overview - glideinWMS Training Jan 2012Condor overview - glideinWMS Training Jan 2012
Condor overview - glideinWMS Training Jan 2012
 
Condor from the user point of view - glideinWMS Training Jan 2012
Condor from the user point of view - glideinWMS Training Jan 2012Condor from the user point of view - glideinWMS Training Jan 2012
Condor from the user point of view - glideinWMS Training Jan 2012
 
MySQL backup and restore performance
MySQL backup and restore performanceMySQL backup and restore performance
MySQL backup and restore performance
 
Building a continuous delivery platform for the biggest spike in e-commerce -...
Building a continuous delivery platform for the biggest spike in e-commerce -...Building a continuous delivery platform for the biggest spike in e-commerce -...
Building a continuous delivery platform for the biggest spike in e-commerce -...
 
glideinWMS Frontend Monitoring - glideinWMS Training Jan 2012
glideinWMS Frontend Monitoring - glideinWMS Training Jan 2012glideinWMS Frontend Monitoring - glideinWMS Training Jan 2012
glideinWMS Frontend Monitoring - glideinWMS Training Jan 2012
 
Monitoring with Ganglia
Monitoring with GangliaMonitoring with Ganglia
Monitoring with Ganglia
 
PGConf.ASIA 2019 Bali - Patroni in 2019 - Alexander Kukushkin
PGConf.ASIA 2019 Bali - Patroni in 2019 - Alexander KukushkinPGConf.ASIA 2019 Bali - Patroni in 2019 - Alexander Kukushkin
PGConf.ASIA 2019 Bali - Patroni in 2019 - Alexander Kukushkin
 
Buytaert kris tools
Buytaert kris toolsBuytaert kris tools
Buytaert kris tools
 
7 tools for your devops stack
7 tools for your devops stack7 tools for your devops stack
7 tools for your devops stack
 
Automotive Grade Linux and systemd
Automotive Grade Linux and systemdAutomotive Grade Linux and systemd
Automotive Grade Linux and systemd
 
Apache zookeeper 101
Apache zookeeper 101Apache zookeeper 101
Apache zookeeper 101
 
CFEngine 3
CFEngine 3CFEngine 3
CFEngine 3
 
Extending CMS Made Simple
Extending CMS Made SimpleExtending CMS Made Simple
Extending CMS Made Simple
 
Understanding Doctrine at True North PHP 2013
Understanding Doctrine at True North PHP 2013Understanding Doctrine at True North PHP 2013
Understanding Doctrine at True North PHP 2013
 
11 tools for your PHP devops stack
11 tools for your PHP devops stack11 tools for your PHP devops stack
11 tools for your PHP devops stack
 
Gatling
GatlingGatling
Gatling
 
Introduction to apache zoo keeper
Introduction to apache zoo keeper Introduction to apache zoo keeper
Introduction to apache zoo keeper
 

Más de Igor Sfiligoi

Preparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYROPreparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYROIgor Sfiligoi
 
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...Igor Sfiligoi
 
Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...Igor Sfiligoi
 
The anachronism of whole-GPU accounting
The anachronism of whole-GPU accountingThe anachronism of whole-GPU accounting
The anachronism of whole-GPU accountingIgor Sfiligoi
 
Auto-scaling HTCondor pools using Kubernetes compute resources
Auto-scaling HTCondor pools using Kubernetes compute resourcesAuto-scaling HTCondor pools using Kubernetes compute resources
Auto-scaling HTCondor pools using Kubernetes compute resourcesIgor Sfiligoi
 
Speeding up bowtie2 by improving cache-hit rate
Speeding up bowtie2 by improving cache-hit rateSpeeding up bowtie2 by improving cache-hit rate
Speeding up bowtie2 by improving cache-hit rateIgor Sfiligoi
 
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsPerformance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsIgor Sfiligoi
 
Comparing GPU effectiveness for Unifrac distance compute
Comparing GPU effectiveness for Unifrac distance computeComparing GPU effectiveness for Unifrac distance compute
Comparing GPU effectiveness for Unifrac distance computeIgor Sfiligoi
 
Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...Igor Sfiligoi
 
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory AccessAccelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory AccessIgor Sfiligoi
 
Using A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific OutputUsing A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific OutputIgor Sfiligoi
 
Using commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobsUsing commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobsIgor Sfiligoi
 
Modest scale HPC on Azure using CGYRO
Modest scale HPC on Azure using CGYROModest scale HPC on Azure using CGYRO
Modest scale HPC on Azure using CGYROIgor Sfiligoi
 
Data-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstData-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstIgor Sfiligoi
 
Scheduling a Kubernetes Federation with Admiralty
Scheduling a Kubernetes Federation with AdmiraltyScheduling a Kubernetes Federation with Admiralty
Scheduling a Kubernetes Federation with AdmiraltyIgor Sfiligoi
 
Accelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACCAccelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACCIgor Sfiligoi
 
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Igor Sfiligoi
 
Porting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUsPorting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUsIgor Sfiligoi
 
Demonstrating 100 Gbps in and out of the public Clouds
Demonstrating 100 Gbps in and out of the public CloudsDemonstrating 100 Gbps in and out of the public Clouds
Demonstrating 100 Gbps in and out of the public CloudsIgor Sfiligoi
 
TransAtlantic Networking using Cloud links
TransAtlantic Networking using Cloud linksTransAtlantic Networking using Cloud links
TransAtlantic Networking using Cloud linksIgor Sfiligoi
 

Más de Igor Sfiligoi (20)

Preparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYROPreparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYRO
 
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
 
Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...
 
The anachronism of whole-GPU accounting
The anachronism of whole-GPU accountingThe anachronism of whole-GPU accounting
The anachronism of whole-GPU accounting
 
Auto-scaling HTCondor pools using Kubernetes compute resources
Auto-scaling HTCondor pools using Kubernetes compute resourcesAuto-scaling HTCondor pools using Kubernetes compute resources
Auto-scaling HTCondor pools using Kubernetes compute resources
 
Speeding up bowtie2 by improving cache-hit rate
Speeding up bowtie2 by improving cache-hit rateSpeeding up bowtie2 by improving cache-hit rate
Speeding up bowtie2 by improving cache-hit rate
 
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsPerformance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
 
Comparing GPU effectiveness for Unifrac distance compute
Comparing GPU effectiveness for Unifrac distance computeComparing GPU effectiveness for Unifrac distance compute
Comparing GPU effectiveness for Unifrac distance compute
 
Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...
 
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory AccessAccelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
 
Using A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific OutputUsing A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific Output
 
Using commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobsUsing commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobs
 
Modest scale HPC on Azure using CGYRO
Modest scale HPC on Azure using CGYROModest scale HPC on Azure using CGYRO
Modest scale HPC on Azure using CGYRO
 
Data-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstData-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud Burst
 
Scheduling a Kubernetes Federation with Admiralty
Scheduling a Kubernetes Federation with AdmiraltyScheduling a Kubernetes Federation with Admiralty
Scheduling a Kubernetes Federation with Admiralty
 
Accelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACCAccelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACC
 
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
 
Porting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUsPorting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUs
 
Demonstrating 100 Gbps in and out of the public Clouds
Demonstrating 100 Gbps in and out of the public CloudsDemonstrating 100 Gbps in and out of the public Clouds
Demonstrating 100 Gbps in and out of the public Clouds
 
TransAtlantic Networking using Cloud links
TransAtlantic Networking using Cloud linksTransAtlantic Networking using Cloud links
TransAtlantic Networking using Cloud links
 

Último

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Último (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

Condor tuning optimization for dynamic glideinWMS pools

  • 1. glideinWMS Training @ UCSD Condor tunning by Igor Sfiligoi (UCSD) UCSD Jan 18th 2012 Condor Tunning 1
  • 2. Regulating User Priorities UCSD Jan 18th 2012 Condor Tunning 2
  • 3. User priorities ● By default, the Negotiator treats all users equally ● You get fair-share out of the box If you have two users, on average each gets ½ of slots ● If you have different needs, Negotiator supports ● Priority factors – PF reduces user priority – If users X and Y have PFX=(N-1)*PFY, on average user X gets 1/N of slots (with user Y the rest) ● Accounting groups (see next slide) http://research.cs.wisc.edu/condor/manual/v7.6/3_4User_Priorities.html UCSD Jan 18th 2012 Condor Tunning 3
  • 4. Accounting groups ● Users can be joined in accounting groups ● The Negotiator defines the groups, but jobs specify which group they belong to ● Each group can be given a quota ● Can be relative to the size of the pool ● Sum of group jobs cannot exceed it ● If quotas >100%, can be used for relative prio ● Here higher is better Jobs without any ● Each group will be given, group may never get anything on average, quota /sum(quotas) of slots G UCSD Jan 18th 2012 Condor Tunning 4
  • 5. Configuring AC ● To enable accounting groups, just define the quotas in the Negotiator config file ● Which jobs go into which group not defined here #condor_config.local #condor_config.local ## 1.0 is100% 1.0 is 100% GROUP_QUOTA_DYNAMIC_group_higgs == 3.2 GROUP_QUOTA_DYNAMIC_group_higgs 3.2 GROUP_QUOTA_DYNAMIC_group_b == GROUP_QUOTA_DYNAMIC_group_b 2.2 2.2 GROUP_QUOTA_DYNAMIC_group_k == GROUP_QUOTA_DYNAMIC_group_k 6.5 6.5 GROUP_QUOTA_DYNAMIC_group_susy == 8.8 GROUP_QUOTA_DYNAMIC_group_susy 8.8 UCSD Jan 18th 2012 Condor Tunning 5
  • 6. Using the AC ● Users must specify which group they belong to ● No automatic mapping or validation in Condor ● Based on trust ● Jobs must add to their submit file +AccountingGroup = "<group>.<user>" Universe Universe = vanilla = vanilla Executable = cosmos Executable = cosmos Arguments = -k 1543.3 Arguments = -k 1543.3 Output Output = cosmos.out = cosmos.out Input Input = cosmos.in = cosmos.in Log Log = cosmos.log = cosmos.log +AccountingGroup = "group_higgs.frieda" +AccountingGroup = "group_higgs.frieda" Queue 1 Queue 1 UCSD Jan 18th 2012 Condor Tunning 6
  • 7. Configuring PF Anyone can read ● Priority factors a dynamic property Can only be set by the Negotiator ● Set by cmdline tool condor_userprio administrator $ condor_userprio -all -allusers |grep user1@node1 $ condor_userprio -all -allusers |grep user1@node1 user1@node1 8016.22 8.02 10.00 0 15780.63 11/23/2011 05:59 12/30/2011 20:37 user1@node1 8016.22 8.02 10.00 0 15780.63 11/23/2011 05:59 12/30/2011 20:37 $ condor_userprio -setfactor user1@node1 1000 $ condor_userprio -setfactor user1@node1 1000 The priority factor of user1@node1 was set to 1000.000000 The priority factor of user1@node1 was set to 1000.000000 $ condor_userprio -all -allusers |grep user1@node1 $ condor_userprio -all -allusers |grep user1@node1 user1@node1 8016.22 8.02 1000.00 0 15780.63 11/23/2011 05:59 12/30/2011 20:37 user1@node1 8016.22 8.02 1000.00 0 15780.63 11/23/2011 05:59 12/30/2011 20:37 ● Although persistent between Negotiator restarts ● May want to set higher default PF (e.g. 1000) – default of 1, and user PF cannot go below – Attribute regulated by DEFAULT_PRIO_FACTOR http://research.cs.wisc.edu/condor/manual/v7.6/2_7Priorities_Preemption.html#sec:user-priority-explained UCSD Jan 18th 2012 Condor Tunning 7
  • 8. Negotiation cycle optimization UCSD Jan 18th 2012 Condor Tunning 8
  • 9. Negotiation internals ● The Negotiator does the matchmaking in loops ● Get all Machine ClassAds (from Collector) ● Get Job ClassAds (from Schedds) one by one, prioritized by user – Tell Schedd if a match has been found ● Sleep ● Repeat UCSD Jan 18th 2012 Condor Tunning 9
  • 10. Sleep time ● The Negotiator sleeps between cycles to save CPU ● But will be interrupted if new jobs are submitted ● Makes sense for static pools The sleep will not be interrupted if a new glidein join ● But can be a problem for dynamic ones like glideins Unmatched glideins ● Keep the sleep as low as feasible NEGOTIATOR_INTERVAL ● Installer sets it to 60s NEGOTIATOR_INTERVAL ==60 NEGOTIATOR_INTERVAL 60 http://www.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#param:NegotiatorInterval UCSD Jan 18th 2012 Condor Tunning 10
  • 11. Preemption ● By default, Condor will preempt a user as soon as jobs from a higher priority users are available ● Preemption == wasted CPU Preemption==killing jobs ● The glideinWMS installer disables preemption ● Once a glidein is given to a user, he can keep it until the glidein dies ## Prevent preemption Prevent preemption Glidein lifetime PREEMPTION_REQUIREMENTS ==False PREEMPTION_REQUIREMENTS False is limited, so ## Speed up Matchmaking Speed up Matchmaking fair share NEGOTIATOR_CONSIDER_PREEMPTION ==False NEGOTIATOR_CONSIDER_PREEMPTION False still enforced long term http://www.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#param:NegotiatorConsiderPreemption UCSD Jan 18th 2012 Condor Tunning 11
  • 12. Protecting against dead glideins ● Glideins can die ● Just a fact of life on the Grid ● But their ClassAds will stay in the Collector for about 20 mins ● If they were Idle at the time, will match jobs! ● Make sure the lowest priority user get stuck ● Order Machine ClassAds by freshness NEGOTIATOR_POST_JOB_RANK ==MY.LastHeardFrom NEGOTIATOR_POST_JOB_RANK MY.LastHeardFrom http://www.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#param:NegotiatorPostJobRank UCSD Jan 18th 2012 Condor Tunning 12
  • 13. Protecting against slow schedds ● If one schedd gets stuck, you want to limit the damage, and match jobs from other schedds ● Negotiator can limit the time spent on each schedd Was really important before 7.6.X... less of an issue now ● The glideinWMS installer sets these limits ● See Condor manual for details NEGOTIATOR_MAX_TIME_PER_SUBMITTER=60 NEGOTIATOR_MAX_TIME_PER_SUBMITTER=60 NEGOTIATOR_MAX_TIME_PER_PIESPIN=20 NEGOTIATOR_MAX_TIME_PER_PIESPIN=20 http://www.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#param:NegotiatorMaxTimePerSubmitter UCSD Jan 18th 2012 Condor Tunning 13
  • 14. One way matchmaking ● The Negotiator normally sends the match both the Schedd and the Startd ● Only Schedd strictly needed ● Given that the glideins can be behind a firewall, it is a good idea to disable this behavior NEGOTIATOR_INFORM_STARTD ==False NEGOTIATOR_INFORM_STARTD False UCSD Jan 18th 2012 Condor Tunning 14
  • 15. Collector tunning UCSD Jan 18th 2012 Condor Tunning 15
  • 16. Collector tunning ● The major tunning is the Collector tree ● Already discussed in the Installation section ● Three more optional tunnings ● Security session lifetime ● File descriptor limits ● Client handling parameters UCSD Jan 18th 2012 Condor Tunning 16
  • 17. Security session lifetime ● All communication to the Collector (leafs) is GSI based ● But GSI handshakes are expensive! ● Condor thus uses it just to exchange a shared secret (i.e. password) with the remote glidein ● Will use this passwd for all further communications ● But the passwd also has a limited lifetime To limit damage ● Default lifetime quite long in case it is stolen ● The glideinWMS installer Was months in old limits it to 12h versions of Condor! SEC_DAEMON_SESSION_DURATION == 50000 SEC_DAEMON_SESSION_DURATION 50000 http://www.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#param:SecDefaultSessionDuration http://iopscience.iop.org/1742-6596/219/4/042017 UCSD Jan 18th 2012 Condor Tunning 17
  • 18. File descriptor limit ● The Collector (leafs) also function as CCB ● Will use 5+ FDs for every glidein they serve! ● By default, OS only gives 1024 FDs to each process ● But Condor will limit itself to ~80% of that ● So it has a few FDs left for internal needs (e.g. logs) Not done by ● You may want to increase this limit glideinWMS installer ● Must be done as root The alternative is to just (and Collector usually not root) run more secondary collectors (recommended) ● And don't forget the system-wide limits /proc/sys/fs/file-max UCSD Jan 18th 2012 Condor Tunning 18
  • 19. Client handling limits ● Every time someone runs condor_status Used by the Frontend processes the Collector must act on it ● A query is also done implicitly when you run condor_q -name <schedd_name> ● Since the Collector is a busy process, the Condor team added the option to fork on query It only uses RAM if the parent ● This will use up RAM! memory pages change But not unusual on a busy Collector ● So there is a limit COLLECTOR_QUERY_WORKERS Defaults to 16 You may want to either lower or increase it UCSD Jan 18th 2012 Condor Tunning 19
  • 20. Firewall tunning ● If you decided you want to keep the firewall you may need to tune it ● Open ports for the Collector tree ● And possibly the Negotiator Negotiator port usually (if Schedds outside the perimeter) dynamic... will need to fix it with ● You may also need to NEGOTIATOR_ARGS = -f -p 9617 tune the firewall scalability knobs Statefull firewalls ## foriptables for iptables will have to track sysctl -w net.ipv4.netfilter.ip_conntrack_max=385552; sysctl -w net.ipv4.netfilter.ip_conntrack_max=385552; 5+ TCP connections sysctl -p; sysctl -p; per glidein echo 48194 >>/sys/module/ip_conntrack/parameters/hashsize echo 48194 /sys/module/ip_conntrack/parameters/hashsize UCSD Jan 18th 2012 Condor Tunning 20
  • 21. Schedd tunning UCSD Jan 18th 2012 Condor Tunning 21
  • 22. Schedd tunning ● The following knobs may need to be tuned ● Job limits ● Startup/Termination limits ● Client handling limits ● Match attributes ● Requirements defaults ● Periodic expressions ● You also may need to tune the system settings UCSD Jan 18th 2012 Condor Tunning 22
  • 23. Job limits ● As you may remember, each running job uses a non-negligible amount of resources ● Thus the Schedd has a limit on them MAX_JOB_RUNNING ● Set it to a value You may waste some that is right for your HW CPU on glideins, but better than crashing the submit node ## glideinWMS installerprovided value glideinWMS installer provided value MAX_JOBS_RUNNING == 6000 MAX_JOBS_RUNNING 6000 http://www.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#param:MaxJobRunning UCSD Jan 18th 2012 Condor Tunning 23
  • 24. Startup limits ● Each time a job is started ● The Schedd must create a shadow ● The Shadow must load the input files from disk ● The Shadow must transfer the files over the network ● If too many start at once, may overload the node ● Two sets of limits: Limit at the Schedd ● JOB_START_DELAY/JOB_START_COUNT ● MAX_CONCURRENT_UPLOADS Limit on the network http://www.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#param:JobStartCount http://www.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#param:MaxConcurrentUploads UCSD Jan 18th 2012 Condor Tunning 24
  • 25. Termination limits ● Termination can be problematic as well Can easily overload ● Output files must be transferred back the submit node. – And you do not know how big will it be! ● If using glexec, an authentication call will be made on each glidein Each calling the site GUMS. Imagine O(1k) within a second! ● Similar limits: Limit at the Schedd ● JOB_STOP_DELAY/JOB_STOP_COUNT ● MAX_CONCURRENT_DOWNLOADS Limit on the network http://www.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#param:JobStopCount http://www.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#param:MaxConcurrentDownloads UCSD Jan 18th 2012 Condor Tunning 25
  • 26. Client handling limits ● Every time someone runs condor_q Used by the Frontend processes the Schedd must act on it ● Since the Schedd is a busy process, the Condor team added the option to fork on query ● This will use up RAM! It only uses RAM if the parent memory pages change ● So there is a limit But not unusual on a busy Schedd SCHEDD_QUERY_WORKERS Defaults to 3 You will want to increase it http://www.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#param:ScheddQueryWorkers UCSD Jan 18th 2012 Condor Tunning 26
  • 27. Match attributes for history ● It is often useful to know where were the jobs running ● By default, you get just the host name LastRemoteHost ● May be useful to know which glidein factory it came from, too (etc.) ● But you can get match info easily if you ... ● Add to the job submit files job_attribute = $$(glidein attribute) After matching, Schedd adds in the ClassAd MATCH_EXP_job_attribute UCSD Jan 18th 2012 Condor Tunning 27
  • 28. System level addition ● You can add them in the system config ## condor_config.local condor_config.local JOB_GLIDEIN_Factory =="$$(GLIDEIN_Factory:Unknown)" JOB_GLIDEIN_Factory "$$(GLIDEIN_Factory:Unknown)" JOB_GLIDEIN_Name == "$$(GLIDEIN_Name:Unknown)" Factory identification JOB_GLIDEIN_Name "$$(GLIDEIN_Name:Unknown)" JOB_GLIDEIN_Entry_Name =="$$(GLIDEIN_Entry_Name:Unknown)" JOB_GLIDEIN_Entry_Name "$$(GLIDEIN_Entry_Name:Unknown)" JOB_GLIDEIN_ClusterId == "$$(GLIDEIN_ClusterId:Unknown)" Glidein identification JOB_GLIDEIN_ClusterId "$$(GLIDEIN_ClusterId:Unknown)" (to give the Factory admins) JOB_GLIDEIN_ProcId == "$$(GLIDEIN_ProcId:Unknown)" JOB_GLIDEIN_ProcId "$$(GLIDEIN_ProcId:Unknown)" JOB_GLIDECLIENT_Name =="$$(GLIDECLIENT_Name:Unknown)" JOB_GLIDECLIENT_Name "$$(GLIDECLIENT_Name:Unknown)" Frontend identification JOB_GLIDECLIENT_Group == "$$((GLIDECLIENT_Group:Unknown)" JOB_GLIDECLIENT_Group "$$((GLIDECLIENT_Group:Unknown)" JOB_Site == "$$(GLIDEIN_Site:Unknown)" JOB_Site "$$(GLIDEIN_Site:Unknown)" Any other attribute SUBMIT_EXPRS == $(SUBMIT_EXPRS) SUBMIT_EXPRS $(SUBMIT_EXPRS) JOB_GLIDEIN_Factory JOB_GLIDEIN_Name JOB_GLIDEIN_Factory JOB_GLIDEIN_Name JOB_GLIDEIN_Entry_Name This is how you push JOB_GLIDEIN_Entry_Name them in the user job JOB_GLIDEIN_ClusterId JOB_GLIDEIN_ProcId JOB_GLIDEIN_ClusterId JOB_GLIDEIN_ProcId JOB_GLIDECLIENT_Name JOB_GLIDECLIENT_Group ClassAds JOB_GLIDECLIENT_Name JOB_GLIDECLIENT_Group JOB_Site JOB_Site UCSD Jan 18th 2012 Condor Tunning 28
  • 29. Additional job defaults ● With glideins, we often want empty user job requirements ● But Condor does not allow it ● It will forcibly add something about ● Arch e.g. (Arch == "INTEL") ● Disk e.g. (Disk>DiskUsage) ● Memory e.g. (Memory>ImageSize) ● Unless they are already there... so lets add them ## condor_config.local condor_config.local APPEND_REQ_VANILLA ==(Memory>=1)&&(Disk>=1)&&(Arch=!="fake") APPEND_REQ_VANILLA (Memory>=1)&&(Disk>=1)&&(Arch=!="fake") Just to make sure it always evaluates to True UCSD Jan 18th 2012 Condor Tunning 29
  • 30. Periodic expressions ● Unless you have very well behaved users, some of the jobs will be pathologically broken ● You want to detect them and The will be wasting CPU cycles at best remove the from the system ● Condor allows for periodic expressions SYSTEM_PERIODIC_REMOVE Just remove SYSTEM_PERIODIC_HOLD Leave in the queue, but do not match ● An arbitrary boolean expression ● What to look for comes with experience UCSD Jan 18th 2012 Condor Tunning 30
  • 31. Example periodic expression ## condor_config.local condor_config.local SYSTEM_PERIODIC_HOLD == SYSTEM_PERIODIC_HOLD ((ImageSize>2000000) |||| Too much memory ((ImageSize>2000000) (DiskSize>1000000) |||| (DiskSize>1000000) Too much disk space (((( JobStatus == 2 ) && JobStatus == 2 ) && ((CurrentTime-JobCurrentStartDate ) ) >100000)) |||| Took too long ((CurrentTime-JobCurrentStartDate > 100000)) (JobRunCount>5)) (JobRunCount>5)) Tried too many times SYSTEM_PERIODIC_HOLD_REASON == SYSTEM_PERIODIC_HOLD_REASON ifThenElse((ImageSize>2000000), ifThenElse((ImageSize>2000000), “Used too much memory”, In Condor 7.7.X “Used too much memory”, you can also provide … … ifThenElse((JobRunCount>5), a human readable reason string ifThenElse((JobRunCount>5), “Tried too many times”, “Tried too many times”, “Reason unknown”)...) “Reason unknown”)...) http://www.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#param:SystemPeriodicHold http://www.cs.wisc.edu/condor/manual/v7.7/3_3Configuration.html#param:SystemPeriodicHoldReason UCSD Jan 18th 2012 Condor Tunning 31
  • 32. File descriptors and process IDs ● Each Shadow keeps open O(10) FDs ● And we have a shadow per running job ● Make sure you have enough system-wide file descriptors ● Make also sure the system can support all the shadow PIDs ~$ echo 4849118 >> /proc/sys/fs/file-max ~$ echo 4849118 /proc/sys/fs/file-max ~$ cat /proc/sys/fs/file-nr ~$ cat /proc/sys/fs/file-nr 59160 00 4849118 59160 4849118 ~$ echo 128000 >> /proc/sys/kernel/pid_max ~$ echo 128000 /proc/sys/kernel/pid_max http://www.cs.wisc.edu/condor/condorg/linux_scalability.html UCSD Jan 18th 2012 Condor Tunning 32
  • 33. Firewall tunning ● If you decided you want to keep the firewall you may need to tune it ● At minimum, open the port of the Shared Port daemon ● You may also need to tune the firewall scalability knobs Statefull firewalls will have to track ## foriptables for iptables 2 connections sysctl -w net.ipv4.netfilter.ip_conntrack_max=385552; sysctl -w net.ipv4.netfilter.ip_conntrack_max=385552; per running job sysctl -p; sysctl -p; echo 48194 >>/sys/module/ip_conntrack/parameters/hashsize echo 48194 /sys/module/ip_conntrack/parameters/hashsize UCSD Jan 18th 2012 Condor Tunning 33
  • 34. The End UCSD Jan 18th 2012 Condor Tunning 34
  • 35. Pointers ● The official project Web page is http://tinyurl.com/glideinWMS ● glideinWMS development team is reachable at glideinwms-support@fnal.gov ● Condor Home Page http://www.cs.wisc.edu/condor/ ● Condor support condor-user@cs.wisc.edu condor-admin@cs.wisc.edu UCSD Jan 18th 2012 Condor Tunning 35
  • 36. Acknowledgments ● The glideinWMS is a CMS-led project developed mostly at FNAL, with contributions from UCSD and ISI ● The glideinWMS factory operations at UCSD is sponsored by OSG ● The funding comes from NSF, DOE and the UC system UCSD Jan 18th 2012 Condor Tunning 36