SlideShare una empresa de Scribd logo
1 de 40
Descargar para leer sin conexión
Capacity Planning for N1
Sun Network 2003 Presentation
                                     SunSigma
                                     DFSS
Adrian.Cockcroft@sun.com
                                     Project
Chief Architect - High Performance   P925
 Technical Computing
August 29, 2003
Project: Capacity Planning for N1
          ID: P925



      What is N1?
    Datacenter Automation
       Manage “N” systems as if they were “1” system
       Solve the Total Cost of Ownership (TCO) problems
       Manage all the “fabrics” as one - Network/VLAN, SAN/Zone, power,
         consoles, cluster

    Heterogenous Support
       Solaris, Linux, AIX, HP-UX, Windows, EMC etc…

    Layered Provisioning
       Platform/OS, Application, Service

    Roadmap Includes Acquisitions
       2001 Sun internal N1 architectural definition
       2002 Terraspring platform level virtualization
       2003 CenterRun Application level provisioning
       ……….
2
Project: Capacity Planning for N1
          ID: P925


        Voice of the Customer
        “We want better performance at a lower price”
    _




        “We want higher utilization”
    _




        “We don’t want application performance to
    _

        degrade at times of peak load”

        “We want more and faster application changes”
    _




        “How do we do capacity planning with N1?”
    _



                                               Scope…
3
DEFINE       Project: Capacity Planning for N1
                 ID: P925


             Capacity Planning for N1
             Define
         _
                 Project goals, scope and plan, VOC, stakeholders
             –
             Measure
         _

                 Definition of Capacity Planning measurements
             –
             Analyze
         _
                 Gaps, N1CP Processes Concept Design, Survey
             –
             Design
         _
                 Prototype Use Cases
             –
             Verify
         _
                 Stakeholder communication and transition plan
             –
             Monitor
         _
                 N1 Capacity Planning implementation tracked as
             –
                 subgroup of N1 Strategic Working Group

4
MEASURE     Project: Capacity Planning for N1
            ID: P925


          Translate VOC to Measurements
     “We want better performance at a lower price”
          Fast, well tuned and efficient systems
          Lower Total Cost of Ownership
          Flexibility - choice of systems by price, performance, reliability,
             scalability, compatibility and feature set
     “We want higher utilization”
          Consistently high utilization of expensive resources
     “We don’t want application performance to degrade at times of peak load”
         Consistent and fast application or service response times
         Headroom needed to handle peak loads
     “We want more and faster application changes”
         Flexible scenario planning, rapid provisioning

     Question: “My company already has capacity planning processes and
       tools” - do you agree or disagree with this statement?
5
MEASURE     Project: Capacity Planning for N1
            ID: P925


          N1 as a Constraint and Opportunity
          Centralized control and monitoring
     _


          Highly replicated hardware configurations
     _


          Well defined workload and capacity characterization
     _


          Arrays of load-balanced systems, structured network
     _


          Large SMP nodes, standardized storage layout
     _


          Web services workloads follow an “open system”
     _

          queuing model, which is simple to plan against
          Dynamic system domains and virtualized provisioning
     _

          allow rapid capacity adjustments and pooled resources
          Primary capacity metrics are CPU power and storage,
     _

          secondary metrics (memory, network and thermal) may
          be over-provisioned but should be watched


6
MEASURE               Project: Capacity Planning for N1
                      ID: P925


                    Utilization Definition
                    Utilization is the proportion of busy time
            _

                    Always defined over a time interval
            _

                    Sum over devices
            _
                                                                               OnCPU Scheduling for Each CPU
                     (mean load level)




                                                        Mean CPU Util
                                                         OnCPU and
                                                                        0.56
                          usr+sys CPU for Peak Period

            100
                                                                          0
            90
            80                                                                 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41
            70
                                                                                                    Microseconds
            60
    CPU %




            50
            40
                                       Utilization
            30
            20
            10
                0
                                        Time




7
MEASURE        Project: Capacity Planning for N1
               ID: P925


          Headroom Definition
          Headroom is available usable resources
     _

                  Total Capacity minus Peak Utilization and Margin
          –
                  Applies to CPU, RAM, Net, Disk and OS
          –
                  Depends upon workload mixture
          –
                  Can be very complex to determine
          –

                            usr+sys CPU for Peak Period

                      100
                                           Margin
                      90
                      80
                                           Headroom
                      70
                      60
              CPU %




                      50
                      40
                                       Utilization
                      30
                      20
                      10
                       0
                                          Time




8
MEASURE        Project: Capacity Planning for N1
               ID: P925


          CPU Capacity Measurements
          CPU utilization is defined as busy time divided by
     _

          elapsed time for each CPU
          Number of CPUs is dynamic, so capacity at “100%” is
     _

          not constant. Use units of “processors” to measure load.
          CPU type and speed varies so we need something like
     _

          MIPS or M-Values for mixed systems
          CPU utilization should be managed within a range that
     _

          safely minimizes headroom to give stable performance
          at minimum cost
          Process level CPU wait time measures the time a
     _

          process spent on the run queue waiting for a free CPU
               This allows response time increase to be observed directly so that
           –
               increased capacity can be provisioned before headroom is
               exhausted

9
MEASURE                                       Project: Capacity Planning for N1
                                              ID: P925


                    Response Time Definition
          Service time occurs while using a resource
     _

          Queue time waits for access to a resource
     _

          Response Time = Queue time + Service time
     _

                                          Response time curves for random arrival of work from large
                                            unknown user population (e.g. the Internet!)
                                                                    Response Time Curves
                                                                                                  R = S / (1 - (U/m)m)
                                          10.00
          Response Time Increase Factor




                                           9.00
                                           8.00
                                           7.00
                                           6.00
                                                                                                                 One CPU
                                           5.00                                                                  Two CPUs
                                                                                                                 Four CPUs
                                           4.00
                                           3.00
                                           2.00
                                           1.00
                                           0.00
                                                  0   0.5    1      1.5        2        2.5   3     3.5    4
                                                                      Mean CPU Load Level




10
MEASURE                                          Project: Capacity Planning for N1
                                                 ID: P925


                                         Response Time Curves
                                             Systems with many CPUs can run at higher utilization
                                               levels, but degrade more rapidly when they finally run out
                                               of capacity. Headroom margin should be set according to
                                               response time margin and CPU count.
                                                                       Response Time Curves                  R = S / (1 - (U%)m)
                                     10.00
     Response Time Increase Factor




                                      9.00
                                      8.00
                                                                                                                          One CPU
                                      7.00
                                                                                                                          Two CPUs
                                      6.00
                                                                                                                          Four CPUs
                                      5.00                                                                                Eight CPUs
                                                                                       Headroom                           16 CPUs
                                      4.00
                                                                                       margin                             32 CPUs
                                      3.00                                                                                64 CPUs
                                      2.00
                                      1.00
                                      0.00
                                             0      10    20     30      40      50       60       70   80   90    100
                                                                      Total System Utilization %




11
MEASURE                                     Project: Capacity Planning for N1
                                            ID: P925


          CPU Scalability Differences
                            SMP allows work to migrate between CPUs, “blades” don’t
                                              Single queue of work gives lower response time for user sessions
                            –
                                              at high utilization than arrays of uniprocessor “blades”
                                              Headroom margin on array of “blades” is constant as array grows
                            –

                                              Two to four CPU systems need much less margin than Uni-CPUs
                            –
                                              Measure and calibrate actual response curve per workload
                            –

                                                                        Response Time Curves
                                                            SMP R = S / (1 - (U/m)m) vs. Blade R = S / (1 - U/m)
                                          10.00
          Response Time Increase Factor




                                           9.00
                                           8.00
                                           7.00
                                                                                                                       1   CPU/Blade
                                           6.00
                                                                                                                       2   CPU SMP
                                           5.00                                                                        4   CPU SMP
                                                                                                                       2   Blades
                                           4.00
                                                                                                                       4   Blades
                                           3.00
                                           2.00
                                           1.00
                                           0.00
                                                  0   0.5      1       1.5          2           2.5   3   3.5      4
                                                                             CPU Demand Level



12
MEASURE        Project: Capacity Planning for N1
               ID: P925


     CPU Measurement System Issues
          Clock sampled CPU usage
      _

               Poor clock resolution at 10ms (optionally 1 ms)
           –
               Biased sample since clock schedules jobs
           –
               Underestimates more at lower utilization
           –
               Creates apparent lack of scalability
           –


          Microstate measured CPU usage
      _

               Measure state changes directly - “microstates”
           –
               Per-CPU microstate based counters are not available
           –
               Use microstates at process based workload level, sum over some or
           –
               all processes as needed (can take a while on big systems)
               Microstate method simply extends to measuring services and mixed
           –
               workloads


13
MEASURE         Project: Capacity Planning for N1
                ID: P925


       N1 Capacity Planning CTQs
                                                                        Gauge   Budget
          CTQ Name              Pri     Units       LSL       USL
                                                                         Acc.   Sigma

                                                    30% of
      CPU Utilization (TCO)      5      CPUs                            99%       3.0
                                                     total

       CPU Responsiveness                                    70-98%
                                10      CPUs                            99%       4.0
            (SLA)                                            of total
     Both of these Critical To Quality (CTQ) requirements are measured via the CPU load
     level which can accurately be measured with a Gauge accuracy estimated at 99% and a
     sigma goal based on defect cost. Using sampled CPU accuracy is estimated at 90%.
     For CPU Utilization a defect is unacceptable Total Cost of Ownership (TCO) and
     occurs if the total CPU load drops below the Lower Specification Limit (LSL) of 30%
     of the total configured for a sample taken during the peak load period.
     For CPU Responsiveness a defect is overload leading to a Service Level Agreement
     (SLA) failure and occurs if the total CPU load goes above the Upper Specification
     Limit (USL) which is 70% of the total configured for Uni-processors increasing for
     larger CPU counts.
14
ANALYZE     Project: Capacity Planning for N1
            ID: P925



          Concept Design - N1CP Roles
          Manager
                  Application Architect
              _

                   – Developers
                   – Database Administrators
                  Systems Architect
              _

                   – Systems Administrators
                   – Storage Administrators
                   – Network Administrators
          Others?

          Question: What roles do you do?


15
ANALYZE     Project: Capacity Planning for N1
            ID: P925



          Scenarios - Top Level Functional Breakdown

     Install N1
     Datacenter
                                   Provision
                                Provisionlevel
                                 System
                           Over-Provision
                              System level
                                 Applications
                            System level
                              Applications                      Provision
                            Applications                     Provisionlevel Repeat infrequently
                                                               System
                                                           Right-size
                                                            System level
                                                               Applications
                                                          System level
                                                            Applications
                                                          Applications
             Provision
          Provisionlevel Repeat on schedule
            System
       Re-Allocate
         System level
            Applications
     Resources during
         Applications
                                                          Provision
      Low load times
                                                       Provisionlevel
                                                         System       Repeat as needed
                                                 Grow or borrow
                                                     System level
                                                Capacity Applications
                                                         just before
                                                     Applications
                                                 Overload occurs
16
ANALYZE                Project: Capacity Planning for N1
                       ID: P925



                 Installation Sizing Scenario
       This scenario indicates the tasks for each role when an N1 datacenter fabric is created using
       currently available system level provisioning software. The tasks performed by each role in a
       scenario is called a “use case”. Future versions of N1 will configure services and policies
       during installation. Red arrows show the command flow between the roles.
       Manager        Application     Database     Developer       Systems      Systems         Network        Storage
                       Architect       Admin                        Architect    Admin           Admin          Admin

       I want an N1   Choose and      Install      Install         Choose       Size systems    Size overall   Size overall
         ready         size             generic      generic        systems       mix             network        storage
         datacenter    applications     database     application
                       and              images       servers
                       platforms
Time
                                                                                Build generic   Setup          Setup SANs
                                                                                 system          switches       and storage
                                                                                 images          and VLANs      for N1
                                                                                                 for N1


                                                                                Measure
                                                                                 capacity of
                                                                                 generic
                                                                                 systems



17
ANALYZE               Project: Capacity Planning for N1
                      ID: P925



                    Over-Provisioning Scenario
       This gives an indication of the tasks performed by each role as a new application is
       provisioned using the capabilities of todays N1 products. The initial goal is to over-provision
       the capacity for initial bring-up of the application then later right-size it as its actual usage
       pattern becomes better understood. In future releases more and more of this activity will be
       automated, and more of the work will move to become pre-work that is related to setting up
       the overall N1 datacenter infrastructure.
       Manager       Application   Database      Developer     Systems       Systems        Network       Storage
                      Architect     Admin                       Architect     Admin          Admin         Admin

       Provide an    Use these     Database      App server    Use these     Systems        Network       Storage
        online        apps          versions      versions      platforms     selection &    sizing        sizing
        service                     and sizing    and sizing                  versions
                                   Configure     Configure     Define        Build          Provision     Provision
Time                                database      app server    operations    replicable     Internet      LUNs
                                                                policies      system         connection
                                                                              images
                                   Populate      Acceptance                  Use N1 GUI     Configure     Configure
                                    database      test                        to over-       access and    backup
                                                                              provision      security      strategy
                                                                              initial
                                                                              system
                                                                             Enable user
                                                                              access

18
ANALYZE               Project: Capacity Planning for N1
                      ID: P925



                  Rightsizing Scenario
       Rightsizing adjusts the headroom for each component of the system to make sure that the
       usage level falls inside the specification limits. Rightsizing can be performed during an
       offline maintenance window but all the technologies exist to adjust domain size for tier 3
       systems, and adjust the number of tier 1 and tier 2 systems dynamically.



        Manager       Application   Database       Developer   Systems      Systems        Network        Storage
                       Architect     Admin                      Architect    Admin          Admin          Admin

        Business                    Monitor                                 Monitor CPU,   Monitor WAN    Monitor
         level and                   database                                Network        / Internet     storage
         trend plan                  headroom                                and            headroom       headroom
                                     (memory                                 memory
                                     and tables)
Time
                                    Increase                                Increase       Increase       Increase
                                      headroom                                headroom       headroom       headroom
                                      for                                     for            for            for
                                      bottleneck                              bottleneck     bottleneck     bottleneck
                                    Reduce                                  Reduce         Reduce         Reduce
                                     headroom                                headroom       headroom       headroom
                                     for under                               for under      for under      for under
                                     utilized                                utilized       utilized       utilized
                                     database                                systems        bandwidth      storage

19
ANALYZE               Project: Capacity Planning for N1
                      ID: P925



                   Re-Allocation Scenario
        Load levels vary during the day and the week. Regular times of low utilization can have
        other work performed - e.g. overnight batch jobs. Batch workloads that cannot run on the
        same systems due to configuration or security issues can run on systems (or Grids) that are
        provisioned each night using spare capacity from other systems.
       Manager      Application     Database   Developer       Systems        Systems        Network   Storage
                     Architect       Admin                      Architect      Admin          Admin     Admin

       Batch        Define batch               Build or        Define batch   Determine
        workload     capable                    configure       mechanism      timing and
        capacity     applications               batch                          depth of
        needed                                  capable                        capacity to
                                                applications                   re-allocate
Time
                                                                              Move
                                                                               resources
                                                                               to Grid
                                                                               after peak
                                                                               load time
                                                                              Bring
                                                                               resources
                                                                               back before
                                                                               peak load
                                                                               time




20
ANALYZE                 Project: Capacity Planning for N1
                        ID: P925



                   Overload Scenario
       Load levels vary during the day and the week in a fairly consistent and predictable manner. Sizing
       for the normal load level allows high utilization levels. Higher load levels can be handled as an
       exception by watching for abnormally high levels before the load peaks and borrowing capacity
       from lower priority applications such as development environments.

       Question: “Are dynamic capacity adjustments a mature and reliable technology?”

         Manager        Application   Database   Developer   Systems      Systems           Network   Storage
                         Architect     Admin                  Architect    Admin             Admin     Admin

         Higher                                                           Determine
          utilization                                                      normal load
          needed to                                                        curve for time
          reduce cost                                                      of day and
          of service                                                       day of week
Time
         Negotiate                                                        Monitor
          victim to                                                        deviations
          steal                                                            above normal
          capacity                                                         load level
          from
                                                                          Provision extra
                                                                           capacity
                                                                           before it is
                                                                           needed

21
ANALYZE        Project: Capacity Planning for N1
               ID: P925


          Rightsizing Scenario
          Detailed Design Concept via an Example
     _



          Large scale Internet workload
     _

               Fairly predictable load shape
           –
               Peaks every evening (use peak hours)
           –
               Grows every week
           –
          Key CTQs
     _

               Performance during peak hour
           –
               Cost of maintaining performance level
           –
               Risk of downtime
           –
          Tier 3 backend database server
     _

               Primary bottleneck, over-provisioned elsewhere
           –
               Highest cost of CPU headroom (E10K/F15K class)
           –
               Initially 56 CPUs in domain, average 30 CPUs load
           –

22
ANALYZE                           Project: Capacity Planning for N1
                                  ID: P925


          CPU Load Level
          Monitor for days or weeks to establish baseline and time of
            peak load, then track that timeslot daily
          CPU load (units are CPUs, 56 configured) for a busy day:
                                                                                                             Summed CPU Utilization


                                                                                                                                                                                                                                             Peak
                                  50
                                                                                                                                                                                                                                             2 Hrs
          CPU Utilization Level




                                  40

                                  30

                                  20

                                  10

                                   0
                                       0:00:00
                                                 0:58:00
                                                           1:56:00
                                                                     2:54:00
                                                                               3:52:00
                                                                                         4:50:00
                                                                                                   5:48:00
                                                                                                             6:46:00
                                                                                                                       7:44:00
                                                                                                                                 8:42:00
                                                                                                                                           9:40:00
                                                                                                                                                     10:38:00
                                                                                                                                                                11:36:00
                                                                                                                                                                           12:34:00
                                                                                                                                                                                      13:32:00
                                                                                                                                                                                                 14:30:00
                                                                                                                                                                                                            15:28:00
                                                                                                                                                                                                                       16:26:00
                                                                                                                                                                                                                                  17:24:00
                                                                                                                                                                                                                                             18:22:00
                                                                                                                                                                                                                                                        19:20:00
                                                                                                                                                                                                                                                                   20:18:01
                                                                                                                                                                                                                                                                              21:16:00
                                                                                                                                                                                                                                                                                         22:14:00
                                                                                                                                                                                                                                                                                                    23:12:00
                                                                                                                                                     Time of Day




23
ANALYZE     Project: Capacity Planning for N1
            ID: P925


          Utilization Distribution
          Capability plot for peak time shows system is less than half
            utilized about 25% of the time, too much headroom. Defect
            rate corresponds to Sigma level of 2.18.




                                      CPU Demand Level



24
ANALYZE     Project: Capacity Planning for N1
            ID: P925


          Increase Utilization
          Reduce system to 40 CPUs, assume linear increase in utilization -
            predicted sigma = 5.2
          Over-simplified - headroom margin and non-linearities not included
            in the plan. So add a little extra headroom to compensate




                                      CPU Demand Level


25
DESIGN                 Project: Capacity Planning for N1
                            ID: P925


                      Headroom Tool Prototype
                Solaris specific prototype
          _

                       Rapid prototype using SE Toolkit from http://www.setoolkit.com
                  –
                       Shows component level headroom vs. utilization goal
                  –
                       Automatic margin calculation based on CPU count
                  –
                       Samples every few minutes, reports every 30-60 minutes
                  –
                       Microstate based, sums over all processes
                  –
                       Headroom predictor uses mean plus two standard deviations
                  –
                       Text based, logs data to a daily file, 3.5 sigma headroom
                  –
                  Code p.=processor, r.=ram, n.=network, d.=disk, .st=status, .cf=configured,
                     .ll=min lsl, .ul=limit usl, .ld=mean load, .h%=headroom, .sd=std deviation,
                     .tco=TCO defect rate, .sla=SLA defect rate, .tK=throughput K, .rm=response
                     time in milliseconds, .rp=response time proportional increase

          time               pll     pul pcf  pst ptco psla                             pld       psd ph% ptK                    prm        prp
          17:36:04           3.6    11.6 12 Green 0.00 0.00                            5.26      0.28 50 15.8                   1.05        1.08
          18:06:04           3.6    11.6 12 Green 0.00 0.00                            4.90      0.38 51 13.9                   1.01        1.06
          18:36:04           3.6    11.6 12 Blue 0.40 0.00                             4.55      2.19 23 13.0                   0.93        1.09
          19:06:03           3.6    11.6 12 Blue 1.00 0.00                             3.02      0.17 71 12.7                   0.86        1.05
          19:36:03           3.6    11.6 12 Blue 0.93 0.00                             2.82      0.53 67 12.0                   0.67        1.04
                                                                                                                     CPU Throughput is based on
     Samples taken every   12 CPUs configured,        Status is based on measured      Mean load level and
                                                                                                                     voluntary context switches,
     two minutes and       Lower limit 30% = 3.6,     defect proportion of time that   standard deviation are
                                                                                                                     prm is very short, but prp
     reported every 30     Upper limit based on CPU   load level is below pll=TCO or   compared to the upper limit
     minutes                                          above pul=SLA limits             to calculate headroom.        defines a response time curve
                           count at 11.6
26
DESIGN     Project: Capacity Planning for N1
                ID: P925


              Headroom Calculations
         Set configured total to number of processors online
         conf = sysconf(_SC_NPROCESSORS_ONLN);

         Set lower spec limit to 30% for TOC failures
         lsl = conf * 0.3;

         Use response time goal of 3 times baseline on curve to
           determine margin for maximum load level
         rpgoal = 3.0;

         Calculate max load level from theoretical response time curve
         /* rp = R/S, rp = 1/(1-(U^m)) so U = exp(log((rp-1)/rp)/ m)) */
         usl = conf * exp(log((rpgoal-1.0)/rpgoal)/conf);

         Calculate headroom % from mean plus two standard
           deviations versus upper spec limit
         headp = 100.0 * (1.0 - (mean + 2.0*sd) / usl);
         Calculate Sigma Zst
         tco_sigma = 1.5 + (mean - lsl) / sd);
         sla_sigma = 1.5 + (usl - mean) / sd);
27
DESIGN       Project: Capacity Planning for N1
                  ID: P925


              Design Optimization
         Compare the “traditional” approach with the new design
              Run the headroom tool on a big and busy server, collect data and show how a simplistic approach
                  compares with the method described in this project

              SunRay timesharing server monitored for several days. System is loaded to the limit at peak times,
                  but idle out of hours, so focus on a scheduled capacity reallocation scenario.
         Simplistic “Traditional” Approach
              Collect data using vmstat, sar, SunMC or 3rd party tools
              Plot CPU % busy - as shown on next slide
              There is spare capacity, but no indication of how many CPUs are unused
              Need extra information that this is a 12-CPU system
         N1CP Approach
              Collect data using headroom prototype
              Plot CPU load level in CPU units, no need to guess or replot data
              Calculate margin, headroom and sigma levels
              Plan capacity reallocation and recalculate margin, headroom and sigma levels
28
29
                                                         CPU %busy
                    0
                        :3
                       0
                         :
                    3




                                0%
                                     10%
                                           20%
                                                 30%
                                                       40%
                                                             50%
                                                                   60%
                                                                         70%
                                                                               80%
                                                                                     90%
                                                                                           100%
                     :3 05
                       0
                    6 :0
                                                                                                                                                                                                                                                       DESIGN


                     :3 5
                       0
                    9 :0
                     :3 5
                   1 0:
                    20
                     :3 9
                   1 0:
                    51
                     :3 1
                   1 0:
                    82
                     :3 1
                   2 0:
                    10
                     :3 7
                                                                                                                                                                                                                               ID: P925




                       0
                    0 :0
                     :3 6
                       0
                    3 :0
                     :3 6
                       0
                    6 :0
                     :3 5
                       0
                    9 :0
                     :3 6
                   1 0:
                    21
                     :3 2
                   1 3:
                    51
                     :3 6
                   1 3:
                    81
                     :3 6
                   2 3:
                    10
                     :3 8
                       3
                                                                                                                                                                                                                               Project: Capacity Planning for N1




                    0 :0
                     :3 7
                       3
                    3 :0
                     :3 6
                       3
                                                                                                                                                                                                             Simplistic View




                    6 :0
                     :3 7
                       3




     Time of Day
                    9 :0
                     :3 7
                   1 3:
                    51
                     :0 4
                   1 6:
                    81
                     :0 7
                   2 6:
                    10
                     :0 7
                                                                                                  CPU Utilization Monday-Thursday




                       6
                    0 :0
                     :0 6
                       6
                    3 :0
                     :0 6
                       6
                    6 :0
                     :0 6
                       6
                    9 :0
                     :0 6
                   1 6:
                    21
                     :0 0
                   1 6:
                    51
                     :0 0
                   1 6:
                    81
                     :0 3
                   2 6:
                                                                                                                                    There is no indication of how many CPUs are in use, util = 59% overall




                    10
                     :0 7
                       6
                         :0
                            7
DESIGN                                                           Project: Capacity Planning for N1
                                                                      ID: P925



                                                  N1CP View free overnight, system overloads at peak times
                                                                                - CPU Counts
                                                  There are 12 CPUs, 6 to 8 are
                                                                                           Mean+2sd Load vs Configured and Upper Limit

                                                                                                                                                                                          pcf             pul                pmd+2psd

             14



             12



             10



              8
 CPU Count




              6



                            Mean CPU Load                                                                    7.03
              4

                            Mean Util                                                                        59%                                                                                                     DPMO                                Min Sigma
                                                                                                                                            Summary
                            Mean headroom                                                                    34%
              2                                                                                                                             TCO                                                                 110215                                        -1.5 Zst
                            Mean capacity                                                                12.00                              SLA                                                                             538                                    2.5 Zst
              0
                  0:30:05
                            3:00:05
                                      5:30:05
                                                8:00:06
                                                          10:30:16
                                                                     13:00:14
                                                                                15:30:21
                                                                                           18:00:08
                                                                                                      20:30:06
                                                                                                                 23:00:06
                                                                                                                            1:30:06
                                                                                                                                      4:00:06
                                                                                                                                                6:30:06
                                                                                                                                                          9:00:09
                                                                                                                                                                    11:30:15
                                                                                                                                                                               14:03:13
                                                                                                                                                                                          16:33:10
                                                                                                                                                                                                     19:03:07
                                                                                                                                                                                                                21:33:07
                                                                                                                                                                                                                           0:03:07
                                                                                                                                                                                                                                     2:33:06
                                                                                                                                                                                                                                               5:03:07
                                                                                                                                                                                                                                                         7:33:07
                                                                                                                                                                                                                                                                   12:36:12
                                                                                                                                                                                                                                                                              15:06:17
                                                                                                                                                                                                                                                                                         17:36:07
                                                                                                                                                                                                                                                                                                    20:06:06
                                                                                                                                                                                                                                                                                                               22:36:06
                                                                                                                                                                                                                                                                                                                          1:06:06
                                                                                                                                                                                                                                                                                                                                    3:36:06
                                                                                                                                                                                                                                                                                                                                              6:06:06
                                                                                                                                                                                                                                                                                                                                                        8:36:08
                                                                                                                                                                                                                                                                                                                                                                  11:06:12
                                                                                                                                                                                                                                                                                                                                                                             13:36:12
                                                                                                                                                                                                                                                                                                                                                                                        16:06:12
                                                                                                                                                                                                                                                                                                                                                                                                   18:36:07
                                                                                                                                                                                                                                                                                                                                                                                                              21:06:07
                                                                                                                                                                                                                                                                                                                                                                                                                         23:36:06
                                                                                                                                                                                                     Time of Day

30
DESIGN                        Project: Capacity Planning for N1
                                   ID: P925


                               N1CP - Response Curve
                               System is close to overload, this timeshared workload has a flatter curve
                               than internet workloads (closed rather than open queuing model)
                                                 Response Time vs Load Level

                     2.5




                      2
 Response Increase




                     1.5




                      1




                     0.5




                      0
                           0          2               4                6         8             10          12
                                                                 CPU Count

31
32
                                                         CPU %busy
                    0
                     :3
                        0
                    3 :0




                                0%
                                     10%
                                           20%
                                                 30%
                                                       40%
                                                             50%
                                                                   60%
                                                                         70%
                                                                               80%
                                                                                     90%
                                                                                           100%
                      :3 5
                        0
                    6 :0
                                                                                                                                                                                                                                                            DESIGN


                      :3 5
                        0
                    9 :0
                      :3 5
                   1 0:
                    20
                     :3 9
                   1 0:
                    51
                     :3 1
                   1 0:
                    82
                     :3 1
                   2 0:
                    10
                     :3 7
                        0
                                                                                                                                                                                                                                    ID: P925




                    0 :0
                      :3 6
                        0
                    3 :0
                      :3 6
                        0
                    6 :0
                      :3 5
                        0
                    9 :0
                      :3 6
                   1 0:
                    21
                     :3 2
                   1 3:
                    51
                     :3 6
                   1 3:
                    81
                     :3 6
                   2 3:
                    10
                     :3 8
                        3
                                                                                                                                                                                                                                    Project: Capacity Planning for N1




                    0 :0
                      :3 7
                        3
                    3 :0
                      :3 6
                        3
                    6 :0
                      :3 7
                        3




     Time of Day
                    9 :0
                      :3 7
                   1 3:
                    51
                     :0 4
                   1 6:
                    81
                     :0 7
                   2 6:
                    10
                     :0 7
                        6
                                                                                                                                               There is no indication of how many CPUs are in use




                    0 :0
                      :0 6
                        6
                    3 :0
                      :0 6
                                                                                                  CPU Utilization with Capacity Optimization




                        6
                    6 :0
                      :0 6
                        6
                    9 :0
                      :0 6
                   1 6:
                    21
                     :0 0
                   1 6:
                    51
                     :0 0
                   1 6:
                    81
                     :0 3
                   2 6:
                    10
                     :0 7
                                                                                                                                                                                                    Simplistic - CPUs reallocated




                        6
                          :0
                            7
DESIGN                   Project: Capacity Planning for N1
                              ID: P925



                        N1CPcount and times daily, and borrow extra for the peak load
                                     View - Dynamic!
                        Vary the CPU
                                     CPU mean+2sd Load vs Config and Upper Limit

                                                               pcf    pul    pmd+2psd

                  14
                                                                                        3.2s
                                   3.2s                    3.5s
                                                                                                      4.3s
                  12



                       6.3s
                  10


                                                                                                             3.6s
      CPU Count




                   8
                                                                              5.2s
                                            3.2s
                   6
                                                                                               5.7s

                       Mean CPU load      7.03
                   4

                                                                            Min Sigma
                       Mean Util          74%      Predicted
                   2                               TCO                        2.0 Zst
                       Mean headroom      16%
                                                   SLA                        3.2 Zst
                       Mean capacity      9.52
                   0
                     30 5

                     30 5

                     35
                    :3 09




                     30 6

                     30 5

                     36
                    :3 12




                     33 6

                     33 7

                     37
                    :0 14




                     06 6

                     06 6

                     06
                    :0 10
                    :3 11

                    :3 21

                    :3 07

                     30 6




                    :3 16

                    :3 16

                    :3 08

                     33 7




                    :0 17

                    :0 07

                     06 6




                    :0 10

                    :0 13

                    :0 07
                        07
                   3: :0

                   6: :0

                   9: :0




                   3: :0

                   6: :0

                   9: :0




                   3: :0

                   6: :0

                   9: :0




                   3: :0

                   6: :0

                   9: :0
                   0: :0




                   0: :0




                   0: :0
                  12 0:




                  12 0:




                  15 3:




                  12 6:
                  15 0:

                  18 0:

                  21 0:




                  15 3:

                  18 3:

                  21 3:




                  18 6:

                  21 6:




                  15 6:

                  18 6:

                  21 6:
                      6:
                     30




                      0




                      3




                      6
                   0:




                                                                     Time of Day
33
DESIGN       Project: Capacity Planning for N1
                  ID: P925


              Summary
              Performance Impact
                  SLA Sigma levels improve from minimum of 2.5 Zst to 3.2 Zst
                  Improvement of 0.7 Sigma by allowing for extra peak load
                  Simplistic methods do not allow quality of service prediction
              Cost Impact
                  TCO Sigma levels improve from minimum of -1.5 Zst to 2.0 Zst
                  Improvement of 3.5 Sigma by reducing capacity from 12 to 9.5
              Observability Impact
                  Headroom tool prototype generates all required statistics
                  Sigma level is simply calculated, or headroom tool could print it
                  Simplistic methods do not show what is going on
              Complexity Impact
                  Dynamic reconfiguration must be enabled
                  One reconfiguration each morning and each evening
              Applicability (Assertions, out of scope for this project!)
                  CPU based example can be applied to blades, RAM, disk, net, thermal
                  Method can be extended from platform level to services
34
VERIFY    Project: Capacity Planning for N1
               ID: P925


              N1 Console Screenshots




35
GRID     Project: Capacity Planning for N1
              ID: P925



            Capacity for Sale
        Uses for Spare Capacity
            Carefully schedule batch work and backups
            Remotely support global timezones
            Run engineering dept. simulation jobs

        Grid Oriented Solutions
            Project Grid - departmental cluster (Sun Grid Engine)
            Enterprise Grid - collection of clusters forming a general
              purpose Grid service (Sun Grid Engine Enterprise Edition)
            The Global Grid - Internet level - GT2.2, OGSA/OGSI/GT3
            Provision an Enterprise Grid service using N1
            Join The Global Grid and sell or share capacity


36
GRID         Project: Capacity Planning for N1
                  ID: P925



              Relationships: N1 and Grid
     N1 is about provisioning things you own, Grid is about access to things you don’t own

                                                                  Business
                                    Infrastructure
                                                                   Model
          Things you
                                                                   Utility
                                               N1
           own and
                                                                 Computing
            control
          Things you                Grid Services                  Utility
           borrow or
                                                                 Computing
                                    Web Services
             lease
37
GRID         Project: Capacity Planning for N1
                  ID: P925



        Capacity Flows in a Grid Enabled N1 Datacenter
                                                                                   Utility
                                                                                   Computing
                                    N1 Virtualized Datacenter
                                                                                   Capacity
                                                                                   Requests
                        Capacity
       Purchase
                          On                                             Tier 0
       C.O.D.                                                   Tier 1
                                           Tier 3     Tier 2
                        Demand                                           Web        Web User /
                                                                Web
                                          Database     App               Front      Web Services
                                                               Servers
                                          Storage    Servers              End

       Purchase
       Capacity
                         Free                                             Sun
                         Pool                                             Grid
                                                  Cluster Grid
                        Unused                                                      Grid User /
                                                                         Engine
                                                 Compute and
                       Resources                                                    Grid Services
                                                                         Enter-
                                               Storage Resources
                                                                          Prise
       Retire
                                                                         Edition
       Obsolete
       Capacity



                   Repair and Replace


38
GRID         Project: Capacity Planning for N1
                  ID: P925

              IT market segments by “need to share”
                             Defense         Commercial          Technical         Consumer
                              spooks            suits              geeks             users
        What can be                                               Operating
                              Nothing           Hardware                            Everything
                                                                   System
          shared
                             Nothing,          N1, Server                           P2P apps,
                                                                  Grid, VPN,
             What is          physical        domains, VLAN                        SETI, Kazaa,
                                                                  encryption,
                             separation       and SAN Zone                           Limewire,
             trusted                                               firewalls
                              required          partitioning                          People!

                                                               Everything in The     Everything
             What is       Local systems,     Local systems
                                                                  Global Grid      including other
                                               and Internet
              visible       Project Grids                         community             users

                                                Storage.
                            CPU cycles,
                                                                 CPU cycles.         Network
                                             Organizational,
             Primary         Latency.                                               bandwidth.
                                                                Organizational
                                                  legal,
            constraints      National                               issues          Know-how
                                               contractual
                              security           issues



39
Questions?

Capacity Planning for N1
Adrian.Cockcroft@sun.com            Sun Sigma
                                    DFSS
                                    Project
                                    P925

Más contenido relacionado

La actualidad más candente

2013 OTM EU SIG evolv applications Data Management
2013 OTM EU SIG evolv applications Data Management2013 OTM EU SIG evolv applications Data Management
2013 OTM EU SIG evolv applications Data ManagementMavenWire
 
Mtc learnings from isv & enterprise interaction
Mtc learnings from isv & enterprise  interactionMtc learnings from isv & enterprise  interaction
Mtc learnings from isv & enterprise interactionGovind Kanshi
 
Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)Govind Kanshi
 
OTM in the Cloud - OTM SIG 2012
OTM in the Cloud - OTM SIG 2012OTM in the Cloud - OTM SIG 2012
OTM in the Cloud - OTM SIG 2012MavenWire
 
Top 5 performance and capacity challenges for z/OS
Top 5 performance and capacity challenges for z/OS Top 5 performance and capacity challenges for z/OS
Top 5 performance and capacity challenges for z/OS Metron
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and FutureRajesh Balamohan
 
VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...
VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...
VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...VMworld
 
Virtualising Tier 1 Apps
Virtualising Tier 1 AppsVirtualising Tier 1 Apps
Virtualising Tier 1 AppsIwan Rahabok
 
Inspirage OTM in the Cloud
Inspirage OTM in the CloudInspirage OTM in the Cloud
Inspirage OTM in the CloudInspirage
 
02 2017 emea_roadshow_milan_ha
02 2017 emea_roadshow_milan_ha02 2017 emea_roadshow_milan_ha
02 2017 emea_roadshow_milan_hamlraviol
 
Deploying Maximum HA Architecture With PostgreSQL
Deploying Maximum HA Architecture With PostgreSQLDeploying Maximum HA Architecture With PostgreSQL
Deploying Maximum HA Architecture With PostgreSQLDenish Patel
 
Munich 2016 - Z011601 Martin Packer - Parallel Sysplex Performance Topics topics
Munich 2016 - Z011601 Martin Packer - Parallel Sysplex Performance Topics topicsMunich 2016 - Z011601 Martin Packer - Parallel Sysplex Performance Topics topics
Munich 2016 - Z011601 Martin Packer - Parallel Sysplex Performance Topics topicsMartin Packer
 
Architecting with power vm
Architecting with power vmArchitecting with power vm
Architecting with power vmCharlie Cler
 
Capacity Management for system z license charge reporting
Capacity Management for system z  license charge reportingCapacity Management for system z  license charge reporting
Capacity Management for system z license charge reportingMetron
 
SLA-aware Dynamic CPU Scaling in Business Cloud Computing Environments
SLA-aware Dynamic CPU Scaling in Business Cloud Computing EnvironmentsSLA-aware Dynamic CPU Scaling in Business Cloud Computing Environments
SLA-aware Dynamic CPU Scaling in Business Cloud Computing EnvironmentsZhenyun Zhuang
 
SAP Adaptive Computing Design
SAP Adaptive Computing DesignSAP Adaptive Computing Design
SAP Adaptive Computing DesignGary Jackson MBCS
 
HBase Operations and Best Practices
HBase Operations and Best PracticesHBase Operations and Best Practices
HBase Operations and Best PracticesVenu Anuganti
 
ANSYS SCADE Usage for Unmanned Aircraft Vehicles
ANSYS SCADE Usage for Unmanned Aircraft VehiclesANSYS SCADE Usage for Unmanned Aircraft Vehicles
ANSYS SCADE Usage for Unmanned Aircraft VehiclesAnsys
 
zIIP Capacity Planning
zIIP Capacity PlanningzIIP Capacity Planning
zIIP Capacity PlanningMartin Packer
 

La actualidad más candente (20)

2013 OTM EU SIG evolv applications Data Management
2013 OTM EU SIG evolv applications Data Management2013 OTM EU SIG evolv applications Data Management
2013 OTM EU SIG evolv applications Data Management
 
Mtc learnings from isv & enterprise interaction
Mtc learnings from isv & enterprise  interactionMtc learnings from isv & enterprise  interaction
Mtc learnings from isv & enterprise interaction
 
Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)
 
OTM in the Cloud - OTM SIG 2012
OTM in the Cloud - OTM SIG 2012OTM in the Cloud - OTM SIG 2012
OTM in the Cloud - OTM SIG 2012
 
Top 5 performance and capacity challenges for z/OS
Top 5 performance and capacity challenges for z/OS Top 5 performance and capacity challenges for z/OS
Top 5 performance and capacity challenges for z/OS
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
 
VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...
VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...
VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...
 
Virtualising Tier 1 Apps
Virtualising Tier 1 AppsVirtualising Tier 1 Apps
Virtualising Tier 1 Apps
 
Inspirage OTM in the Cloud
Inspirage OTM in the CloudInspirage OTM in the Cloud
Inspirage OTM in the Cloud
 
02 2017 emea_roadshow_milan_ha
02 2017 emea_roadshow_milan_ha02 2017 emea_roadshow_milan_ha
02 2017 emea_roadshow_milan_ha
 
Deploying Maximum HA Architecture With PostgreSQL
Deploying Maximum HA Architecture With PostgreSQLDeploying Maximum HA Architecture With PostgreSQL
Deploying Maximum HA Architecture With PostgreSQL
 
Munich 2016 - Z011601 Martin Packer - Parallel Sysplex Performance Topics topics
Munich 2016 - Z011601 Martin Packer - Parallel Sysplex Performance Topics topicsMunich 2016 - Z011601 Martin Packer - Parallel Sysplex Performance Topics topics
Munich 2016 - Z011601 Martin Packer - Parallel Sysplex Performance Topics topics
 
Architecting with power vm
Architecting with power vmArchitecting with power vm
Architecting with power vm
 
Capacity Management for system z license charge reporting
Capacity Management for system z  license charge reportingCapacity Management for system z  license charge reporting
Capacity Management for system z license charge reporting
 
SLA-aware Dynamic CPU Scaling in Business Cloud Computing Environments
SLA-aware Dynamic CPU Scaling in Business Cloud Computing EnvironmentsSLA-aware Dynamic CPU Scaling in Business Cloud Computing Environments
SLA-aware Dynamic CPU Scaling in Business Cloud Computing Environments
 
SAP Adaptive Computing Design
SAP Adaptive Computing DesignSAP Adaptive Computing Design
SAP Adaptive Computing Design
 
HBase Operations and Best Practices
HBase Operations and Best PracticesHBase Operations and Best Practices
HBase Operations and Best Practices
 
ANSYS SCADE Usage for Unmanned Aircraft Vehicles
ANSYS SCADE Usage for Unmanned Aircraft VehiclesANSYS SCADE Usage for Unmanned Aircraft Vehicles
ANSYS SCADE Usage for Unmanned Aircraft Vehicles
 
5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance
 
zIIP Capacity Planning
zIIP Capacity PlanningzIIP Capacity Planning
zIIP Capacity Planning
 

Destacado

Capacity Planning
Capacity PlanningCapacity Planning
Capacity PlanningMOHD ARISH
 
Why average response time is not a right measure of your webapplication's per...
Why average response time is not a right measure of your webapplication's per...Why average response time is not a right measure of your webapplication's per...
Why average response time is not a right measure of your webapplication's per...Thoughtworks
 
Response time difference analysis of performance testing tools
Response time difference analysis of performance testing toolsResponse time difference analysis of performance testing tools
Response time difference analysis of performance testing toolsSpoorthi Sham
 
Network Critical
Network CriticalNetwork Critical
Network Criticalgigamon
 
Capacity Planning for Web Operations - Web20 Expo 2008
Capacity Planning for Web Operations - Web20 Expo 2008Capacity Planning for Web Operations - Web20 Expo 2008
Capacity Planning for Web Operations - Web20 Expo 2008John Allspaw
 
New Frameworks for Measuring Capacity and Assessing Performance
New Frameworks for Measuring Capacity and Assessing PerformanceNew Frameworks for Measuring Capacity and Assessing Performance
New Frameworks for Measuring Capacity and Assessing PerformanceTCC Group
 
SIP Trunking & Security in an Enterprise Network
SIP Trunking & Security  in an Enterprise NetworkSIP Trunking & Security  in an Enterprise Network
SIP Trunking & Security in an Enterprise NetworkDan York
 
Cctv And Ip Surveillance
Cctv And Ip SurveillanceCctv And Ip Surveillance
Cctv And Ip Surveillancefaleepay
 
Secure Network Design with High-Availability & VoIP
Secure Network Design with High-Availability & VoIPSecure Network Design with High-Availability & VoIP
Secure Network Design with High-Availability & VoIPArpan Patel
 
A level PE Info processing, memory and reaction time
A level PE Info processing, memory and reaction timeA level PE Info processing, memory and reaction time
A level PE Info processing, memory and reaction timeKerry Harrison
 
Supply chain management ch04 chopra
Supply chain management ch04 chopraSupply chain management ch04 chopra
Supply chain management ch04 chopraJamil Ahmed AKASH
 
Capacity Planning with reference to McDonlds
Capacity Planning with reference to McDonldsCapacity Planning with reference to McDonlds
Capacity Planning with reference to McDonldsSai Praveen Chettupalli
 
Process Strategies and Capacity Planning
Process Strategies and Capacity PlanningProcess Strategies and Capacity Planning
Process Strategies and Capacity PlanningJaisa Gapuz
 
Inventory Management
Inventory ManagementInventory Management
Inventory Managementanoos
 

Destacado (20)

Capacity planning
Capacity planning Capacity planning
Capacity planning
 
Capacity Planning
Capacity PlanningCapacity Planning
Capacity Planning
 
Capacity Management
Capacity ManagementCapacity Management
Capacity Management
 
Why average response time is not a right measure of your webapplication's per...
Why average response time is not a right measure of your webapplication's per...Why average response time is not a right measure of your webapplication's per...
Why average response time is not a right measure of your webapplication's per...
 
Response time difference analysis of performance testing tools
Response time difference analysis of performance testing toolsResponse time difference analysis of performance testing tools
Response time difference analysis of performance testing tools
 
Pro Viva Emmanuel
Pro Viva EmmanuelPro Viva Emmanuel
Pro Viva Emmanuel
 
Network Critical
Network CriticalNetwork Critical
Network Critical
 
Capacity Planning for Web Operations - Web20 Expo 2008
Capacity Planning for Web Operations - Web20 Expo 2008Capacity Planning for Web Operations - Web20 Expo 2008
Capacity Planning for Web Operations - Web20 Expo 2008
 
New Frameworks for Measuring Capacity and Assessing Performance
New Frameworks for Measuring Capacity and Assessing PerformanceNew Frameworks for Measuring Capacity and Assessing Performance
New Frameworks for Measuring Capacity and Assessing Performance
 
SIP Trunking & Security in an Enterprise Network
SIP Trunking & Security  in an Enterprise NetworkSIP Trunking & Security  in an Enterprise Network
SIP Trunking & Security in an Enterprise Network
 
Cctv And Ip Surveillance
Cctv And Ip SurveillanceCctv And Ip Surveillance
Cctv And Ip Surveillance
 
BBC - What is IPTV?
BBC - What is IPTV?BBC - What is IPTV?
BBC - What is IPTV?
 
Secure Network Design with High-Availability & VoIP
Secure Network Design with High-Availability & VoIPSecure Network Design with High-Availability & VoIP
Secure Network Design with High-Availability & VoIP
 
A level PE Info processing, memory and reaction time
A level PE Info processing, memory and reaction timeA level PE Info processing, memory and reaction time
A level PE Info processing, memory and reaction time
 
Supply chain management ch04 chopra
Supply chain management ch04 chopraSupply chain management ch04 chopra
Supply chain management ch04 chopra
 
Capacity Planning with reference to McDonlds
Capacity Planning with reference to McDonldsCapacity Planning with reference to McDonlds
Capacity Planning with reference to McDonlds
 
Inventory types
Inventory typesInventory types
Inventory types
 
Process Strategies and Capacity Planning
Process Strategies and Capacity PlanningProcess Strategies and Capacity Planning
Process Strategies and Capacity Planning
 
Cctv presentation
Cctv presentationCctv presentation
Cctv presentation
 
Inventory Management
Inventory ManagementInventory Management
Inventory Management
 

Similar a Capacity Planning for Virtualized Datacenters - Sun Network 2003

Applying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System IntegrationsApplying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System Integrationsinside-BigData.com
 
"Architecture assessment from classics to details", Dmytro Ovcharenko
"Architecture assessment from classics to details",  Dmytro Ovcharenko"Architecture assessment from classics to details",  Dmytro Ovcharenko
"Architecture assessment from classics to details", Dmytro OvcharenkoFwdays
 
SREcon 2016 Performance Checklists for SREs
SREcon 2016 Performance Checklists for SREsSREcon 2016 Performance Checklists for SREs
SREcon 2016 Performance Checklists for SREsBrendan Gregg
 
PowerArtist: RTL Design for Power Platform
PowerArtist: RTL Design for Power PlatformPowerArtist: RTL Design for Power Platform
PowerArtist: RTL Design for Power PlatformAnsys
 
Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC WorkloadsLarge-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloadsinside-BigData.com
 
[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...
[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...
[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...Matteo Ferroni
 
System models for distributed and cloud computing
System models for distributed and cloud computingSystem models for distributed and cloud computing
System models for distributed and cloud computingpurplesea
 
Iaetsd active resource provision in cloud computing
Iaetsd active resource provision in cloud computingIaetsd active resource provision in cloud computing
Iaetsd active resource provision in cloud computingIaetsd Iaetsd
 
Cloudsim & Green Cloud
Cloudsim & Green CloudCloudsim & Green Cloud
Cloudsim & Green CloudNeda Maleki
 
Cloudsim & greencloud
Cloudsim & greencloud Cloudsim & greencloud
Cloudsim & greencloud nedamaleki87
 
Adaptive Computing Using PlateSpin Orchestrate
Adaptive Computing Using PlateSpin OrchestrateAdaptive Computing Using PlateSpin Orchestrate
Adaptive Computing Using PlateSpin OrchestrateNovell
 
XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...
XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...
XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...NECST Lab @ Politecnico di Milano
 
Performance architecture for cloud connect
Performance architecture for cloud connectPerformance architecture for cloud connect
Performance architecture for cloud connectAdrian Cockcroft
 
AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...Ryousei Takano
 
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric EnvironmentsScheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric EnvironmentsLEGATO project
 
CloudComputing_UNIT5.pdf
CloudComputing_UNIT5.pdfCloudComputing_UNIT5.pdf
CloudComputing_UNIT5.pdfkhan593595
 
Cloud Computing System models for Distributed and cloud computing & Performan...
Cloud Computing System models for Distributed and cloud computing & Performan...Cloud Computing System models for Distributed and cloud computing & Performan...
Cloud Computing System models for Distributed and cloud computing & Performan...hrmalik20
 

Similar a Capacity Planning for Virtualized Datacenters - Sun Network 2003 (20)

Univa Presentation at DAC 2020
Univa Presentation at DAC 2020 Univa Presentation at DAC 2020
Univa Presentation at DAC 2020
 
Gupta_Keynote_VTDC-3
Gupta_Keynote_VTDC-3Gupta_Keynote_VTDC-3
Gupta_Keynote_VTDC-3
 
Applying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System IntegrationsApplying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System Integrations
 
"Architecture assessment from classics to details", Dmytro Ovcharenko
"Architecture assessment from classics to details",  Dmytro Ovcharenko"Architecture assessment from classics to details",  Dmytro Ovcharenko
"Architecture assessment from classics to details", Dmytro Ovcharenko
 
SREcon 2016 Performance Checklists for SREs
SREcon 2016 Performance Checklists for SREsSREcon 2016 Performance Checklists for SREs
SREcon 2016 Performance Checklists for SREs
 
PowerArtist: RTL Design for Power Platform
PowerArtist: RTL Design for Power PlatformPowerArtist: RTL Design for Power Platform
PowerArtist: RTL Design for Power Platform
 
Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC WorkloadsLarge-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloads
 
[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...
[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...
[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...
 
System models for distributed and cloud computing
System models for distributed and cloud computingSystem models for distributed and cloud computing
System models for distributed and cloud computing
 
Iaetsd active resource provision in cloud computing
Iaetsd active resource provision in cloud computingIaetsd active resource provision in cloud computing
Iaetsd active resource provision in cloud computing
 
Notes
NotesNotes
Notes
 
Cloudsim & Green Cloud
Cloudsim & Green CloudCloudsim & Green Cloud
Cloudsim & Green Cloud
 
Cloudsim & greencloud
Cloudsim & greencloud Cloudsim & greencloud
Cloudsim & greencloud
 
Adaptive Computing Using PlateSpin Orchestrate
Adaptive Computing Using PlateSpin OrchestrateAdaptive Computing Using PlateSpin Orchestrate
Adaptive Computing Using PlateSpin Orchestrate
 
XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...
XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...
XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...
 
Performance architecture for cloud connect
Performance architecture for cloud connectPerformance architecture for cloud connect
Performance architecture for cloud connect
 
AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...
 
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric EnvironmentsScheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
 
CloudComputing_UNIT5.pdf
CloudComputing_UNIT5.pdfCloudComputing_UNIT5.pdf
CloudComputing_UNIT5.pdf
 
Cloud Computing System models for Distributed and cloud computing & Performan...
Cloud Computing System models for Distributed and cloud computing & Performan...Cloud Computing System models for Distributed and cloud computing & Performan...
Cloud Computing System models for Distributed and cloud computing & Performan...
 

Más de Adrian Cockcroft

Yow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with NotesYow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with NotesAdrian Cockcroft
 
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...Adrian Cockcroft
 
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...Adrian Cockcroft
 
Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013Adrian Cockcroft
 
Netflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search RoadshowNetflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search RoadshowAdrian Cockcroft
 
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)Adrian Cockcroft
 
Gluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial IntroductionGluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial IntroductionAdrian Cockcroft
 
AWS Re:Invent - High Availability Architecture at Netflix
AWS Re:Invent - High Availability Architecture at NetflixAWS Re:Invent - High Availability Architecture at Netflix
AWS Re:Invent - High Availability Architecture at NetflixAdrian Cockcroft
 
Architectures for High Availability - QConSF
Architectures for High Availability - QConSFArchitectures for High Availability - QConSF
Architectures for High Availability - QConSFAdrian Cockcroft
 
Netflix Global Cloud Architecture
Netflix Global Cloud ArchitectureNetflix Global Cloud Architecture
Netflix Global Cloud ArchitectureAdrian Cockcroft
 
SV Forum Platform Architecture SIG - Netflix Open Source Platform
SV Forum Platform Architecture SIG - Netflix Open Source PlatformSV Forum Platform Architecture SIG - Netflix Open Source Platform
SV Forum Platform Architecture SIG - Netflix Open Source PlatformAdrian Cockcroft
 
Cassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWSCassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWSAdrian Cockcroft
 
Netflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at GlueconNetflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at GlueconAdrian Cockcroft
 
Netflix in the Cloud at SV Forum
Netflix in the Cloud at SV ForumNetflix in the Cloud at SV Forum
Netflix in the Cloud at SV ForumAdrian Cockcroft
 
Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3) Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3) Adrian Cockcroft
 
Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Cloud Architecture Tutorial - Platform Component Architecture (2of3)Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Cloud Architecture Tutorial - Platform Component Architecture (2of3)Adrian Cockcroft
 

Más de Adrian Cockcroft (20)

Yow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with NotesYow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with Notes
 
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
 
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
 
Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013
 
Netflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search RoadshowNetflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search Roadshow
 
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
 
Gluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial IntroductionGluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
 
Gluecon keynote
Gluecon keynoteGluecon keynote
Gluecon keynote
 
Dystopia as a Service
Dystopia as a ServiceDystopia as a Service
Dystopia as a Service
 
Netflix and Open Source
Netflix and Open SourceNetflix and Open Source
Netflix and Open Source
 
NetflixOSS Meetup
NetflixOSS MeetupNetflixOSS Meetup
NetflixOSS Meetup
 
AWS Re:Invent - High Availability Architecture at Netflix
AWS Re:Invent - High Availability Architecture at NetflixAWS Re:Invent - High Availability Architecture at Netflix
AWS Re:Invent - High Availability Architecture at Netflix
 
Architectures for High Availability - QConSF
Architectures for High Availability - QConSFArchitectures for High Availability - QConSF
Architectures for High Availability - QConSF
 
Netflix Global Cloud Architecture
Netflix Global Cloud ArchitectureNetflix Global Cloud Architecture
Netflix Global Cloud Architecture
 
SV Forum Platform Architecture SIG - Netflix Open Source Platform
SV Forum Platform Architecture SIG - Netflix Open Source PlatformSV Forum Platform Architecture SIG - Netflix Open Source Platform
SV Forum Platform Architecture SIG - Netflix Open Source Platform
 
Cassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWSCassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWS
 
Netflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at GlueconNetflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at Gluecon
 
Netflix in the Cloud at SV Forum
Netflix in the Cloud at SV ForumNetflix in the Cloud at SV Forum
Netflix in the Cloud at SV Forum
 
Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3) Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3)
 
Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Cloud Architecture Tutorial - Platform Component Architecture (2of3)Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Cloud Architecture Tutorial - Platform Component Architecture (2of3)
 

Último

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 

Último (20)

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

Capacity Planning for Virtualized Datacenters - Sun Network 2003

  • 1. Capacity Planning for N1 Sun Network 2003 Presentation SunSigma DFSS Adrian.Cockcroft@sun.com Project Chief Architect - High Performance P925 Technical Computing August 29, 2003
  • 2. Project: Capacity Planning for N1 ID: P925 What is N1? Datacenter Automation Manage “N” systems as if they were “1” system Solve the Total Cost of Ownership (TCO) problems Manage all the “fabrics” as one - Network/VLAN, SAN/Zone, power, consoles, cluster Heterogenous Support Solaris, Linux, AIX, HP-UX, Windows, EMC etc… Layered Provisioning Platform/OS, Application, Service Roadmap Includes Acquisitions 2001 Sun internal N1 architectural definition 2002 Terraspring platform level virtualization 2003 CenterRun Application level provisioning ………. 2
  • 3. Project: Capacity Planning for N1 ID: P925 Voice of the Customer “We want better performance at a lower price” _ “We want higher utilization” _ “We don’t want application performance to _ degrade at times of peak load” “We want more and faster application changes” _ “How do we do capacity planning with N1?” _ Scope… 3
  • 4. DEFINE Project: Capacity Planning for N1 ID: P925 Capacity Planning for N1 Define _ Project goals, scope and plan, VOC, stakeholders – Measure _ Definition of Capacity Planning measurements – Analyze _ Gaps, N1CP Processes Concept Design, Survey – Design _ Prototype Use Cases – Verify _ Stakeholder communication and transition plan – Monitor _ N1 Capacity Planning implementation tracked as – subgroup of N1 Strategic Working Group 4
  • 5. MEASURE Project: Capacity Planning for N1 ID: P925 Translate VOC to Measurements “We want better performance at a lower price” Fast, well tuned and efficient systems Lower Total Cost of Ownership Flexibility - choice of systems by price, performance, reliability, scalability, compatibility and feature set “We want higher utilization” Consistently high utilization of expensive resources “We don’t want application performance to degrade at times of peak load” Consistent and fast application or service response times Headroom needed to handle peak loads “We want more and faster application changes” Flexible scenario planning, rapid provisioning Question: “My company already has capacity planning processes and tools” - do you agree or disagree with this statement? 5
  • 6. MEASURE Project: Capacity Planning for N1 ID: P925 N1 as a Constraint and Opportunity Centralized control and monitoring _ Highly replicated hardware configurations _ Well defined workload and capacity characterization _ Arrays of load-balanced systems, structured network _ Large SMP nodes, standardized storage layout _ Web services workloads follow an “open system” _ queuing model, which is simple to plan against Dynamic system domains and virtualized provisioning _ allow rapid capacity adjustments and pooled resources Primary capacity metrics are CPU power and storage, _ secondary metrics (memory, network and thermal) may be over-provisioned but should be watched 6
  • 7. MEASURE Project: Capacity Planning for N1 ID: P925 Utilization Definition Utilization is the proportion of busy time _ Always defined over a time interval _ Sum over devices _ OnCPU Scheduling for Each CPU (mean load level) Mean CPU Util OnCPU and 0.56 usr+sys CPU for Peak Period 100 0 90 80 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 70 Microseconds 60 CPU % 50 40 Utilization 30 20 10 0 Time 7
  • 8. MEASURE Project: Capacity Planning for N1 ID: P925 Headroom Definition Headroom is available usable resources _ Total Capacity minus Peak Utilization and Margin – Applies to CPU, RAM, Net, Disk and OS – Depends upon workload mixture – Can be very complex to determine – usr+sys CPU for Peak Period 100 Margin 90 80 Headroom 70 60 CPU % 50 40 Utilization 30 20 10 0 Time 8
  • 9. MEASURE Project: Capacity Planning for N1 ID: P925 CPU Capacity Measurements CPU utilization is defined as busy time divided by _ elapsed time for each CPU Number of CPUs is dynamic, so capacity at “100%” is _ not constant. Use units of “processors” to measure load. CPU type and speed varies so we need something like _ MIPS or M-Values for mixed systems CPU utilization should be managed within a range that _ safely minimizes headroom to give stable performance at minimum cost Process level CPU wait time measures the time a _ process spent on the run queue waiting for a free CPU This allows response time increase to be observed directly so that – increased capacity can be provisioned before headroom is exhausted 9
  • 10. MEASURE Project: Capacity Planning for N1 ID: P925 Response Time Definition Service time occurs while using a resource _ Queue time waits for access to a resource _ Response Time = Queue time + Service time _ Response time curves for random arrival of work from large unknown user population (e.g. the Internet!) Response Time Curves R = S / (1 - (U/m)m) 10.00 Response Time Increase Factor 9.00 8.00 7.00 6.00 One CPU 5.00 Two CPUs Four CPUs 4.00 3.00 2.00 1.00 0.00 0 0.5 1 1.5 2 2.5 3 3.5 4 Mean CPU Load Level 10
  • 11. MEASURE Project: Capacity Planning for N1 ID: P925 Response Time Curves Systems with many CPUs can run at higher utilization levels, but degrade more rapidly when they finally run out of capacity. Headroom margin should be set according to response time margin and CPU count. Response Time Curves R = S / (1 - (U%)m) 10.00 Response Time Increase Factor 9.00 8.00 One CPU 7.00 Two CPUs 6.00 Four CPUs 5.00 Eight CPUs Headroom 16 CPUs 4.00 margin 32 CPUs 3.00 64 CPUs 2.00 1.00 0.00 0 10 20 30 40 50 60 70 80 90 100 Total System Utilization % 11
  • 12. MEASURE Project: Capacity Planning for N1 ID: P925 CPU Scalability Differences SMP allows work to migrate between CPUs, “blades” don’t Single queue of work gives lower response time for user sessions – at high utilization than arrays of uniprocessor “blades” Headroom margin on array of “blades” is constant as array grows – Two to four CPU systems need much less margin than Uni-CPUs – Measure and calibrate actual response curve per workload – Response Time Curves SMP R = S / (1 - (U/m)m) vs. Blade R = S / (1 - U/m) 10.00 Response Time Increase Factor 9.00 8.00 7.00 1 CPU/Blade 6.00 2 CPU SMP 5.00 4 CPU SMP 2 Blades 4.00 4 Blades 3.00 2.00 1.00 0.00 0 0.5 1 1.5 2 2.5 3 3.5 4 CPU Demand Level 12
  • 13. MEASURE Project: Capacity Planning for N1 ID: P925 CPU Measurement System Issues Clock sampled CPU usage _ Poor clock resolution at 10ms (optionally 1 ms) – Biased sample since clock schedules jobs – Underestimates more at lower utilization – Creates apparent lack of scalability – Microstate measured CPU usage _ Measure state changes directly - “microstates” – Per-CPU microstate based counters are not available – Use microstates at process based workload level, sum over some or – all processes as needed (can take a while on big systems) Microstate method simply extends to measuring services and mixed – workloads 13
  • 14. MEASURE Project: Capacity Planning for N1 ID: P925 N1 Capacity Planning CTQs Gauge Budget CTQ Name Pri Units LSL USL Acc. Sigma 30% of CPU Utilization (TCO) 5 CPUs 99% 3.0 total CPU Responsiveness 70-98% 10 CPUs 99% 4.0 (SLA) of total Both of these Critical To Quality (CTQ) requirements are measured via the CPU load level which can accurately be measured with a Gauge accuracy estimated at 99% and a sigma goal based on defect cost. Using sampled CPU accuracy is estimated at 90%. For CPU Utilization a defect is unacceptable Total Cost of Ownership (TCO) and occurs if the total CPU load drops below the Lower Specification Limit (LSL) of 30% of the total configured for a sample taken during the peak load period. For CPU Responsiveness a defect is overload leading to a Service Level Agreement (SLA) failure and occurs if the total CPU load goes above the Upper Specification Limit (USL) which is 70% of the total configured for Uni-processors increasing for larger CPU counts. 14
  • 15. ANALYZE Project: Capacity Planning for N1 ID: P925 Concept Design - N1CP Roles Manager Application Architect _ – Developers – Database Administrators Systems Architect _ – Systems Administrators – Storage Administrators – Network Administrators Others? Question: What roles do you do? 15
  • 16. ANALYZE Project: Capacity Planning for N1 ID: P925 Scenarios - Top Level Functional Breakdown Install N1 Datacenter Provision Provisionlevel System Over-Provision System level Applications System level Applications Provision Applications Provisionlevel Repeat infrequently System Right-size System level Applications System level Applications Applications Provision Provisionlevel Repeat on schedule System Re-Allocate System level Applications Resources during Applications Provision Low load times Provisionlevel System Repeat as needed Grow or borrow System level Capacity Applications just before Applications Overload occurs 16
  • 17. ANALYZE Project: Capacity Planning for N1 ID: P925 Installation Sizing Scenario This scenario indicates the tasks for each role when an N1 datacenter fabric is created using currently available system level provisioning software. The tasks performed by each role in a scenario is called a “use case”. Future versions of N1 will configure services and policies during installation. Red arrows show the command flow between the roles. Manager Application Database Developer Systems Systems Network Storage Architect Admin Architect Admin Admin Admin I want an N1 Choose and Install Install Choose Size systems Size overall Size overall ready size generic generic systems mix network storage datacenter applications database application and images servers platforms Time Build generic Setup Setup SANs system switches and storage images and VLANs for N1 for N1 Measure capacity of generic systems 17
  • 18. ANALYZE Project: Capacity Planning for N1 ID: P925 Over-Provisioning Scenario This gives an indication of the tasks performed by each role as a new application is provisioned using the capabilities of todays N1 products. The initial goal is to over-provision the capacity for initial bring-up of the application then later right-size it as its actual usage pattern becomes better understood. In future releases more and more of this activity will be automated, and more of the work will move to become pre-work that is related to setting up the overall N1 datacenter infrastructure. Manager Application Database Developer Systems Systems Network Storage Architect Admin Architect Admin Admin Admin Provide an Use these Database App server Use these Systems Network Storage online apps versions versions platforms selection & sizing sizing service and sizing and sizing versions Configure Configure Define Build Provision Provision Time database app server operations replicable Internet LUNs policies system connection images Populate Acceptance Use N1 GUI Configure Configure database test to over- access and backup provision security strategy initial system Enable user access 18
  • 19. ANALYZE Project: Capacity Planning for N1 ID: P925 Rightsizing Scenario Rightsizing adjusts the headroom for each component of the system to make sure that the usage level falls inside the specification limits. Rightsizing can be performed during an offline maintenance window but all the technologies exist to adjust domain size for tier 3 systems, and adjust the number of tier 1 and tier 2 systems dynamically. Manager Application Database Developer Systems Systems Network Storage Architect Admin Architect Admin Admin Admin Business Monitor Monitor CPU, Monitor WAN Monitor level and database Network / Internet storage trend plan headroom and headroom headroom (memory memory and tables) Time Increase Increase Increase Increase headroom headroom headroom headroom for for for for bottleneck bottleneck bottleneck bottleneck Reduce Reduce Reduce Reduce headroom headroom headroom headroom for under for under for under for under utilized utilized utilized utilized database systems bandwidth storage 19
  • 20. ANALYZE Project: Capacity Planning for N1 ID: P925 Re-Allocation Scenario Load levels vary during the day and the week. Regular times of low utilization can have other work performed - e.g. overnight batch jobs. Batch workloads that cannot run on the same systems due to configuration or security issues can run on systems (or Grids) that are provisioned each night using spare capacity from other systems. Manager Application Database Developer Systems Systems Network Storage Architect Admin Architect Admin Admin Admin Batch Define batch Build or Define batch Determine workload capable configure mechanism timing and capacity applications batch depth of needed capable capacity to applications re-allocate Time Move resources to Grid after peak load time Bring resources back before peak load time 20
  • 21. ANALYZE Project: Capacity Planning for N1 ID: P925 Overload Scenario Load levels vary during the day and the week in a fairly consistent and predictable manner. Sizing for the normal load level allows high utilization levels. Higher load levels can be handled as an exception by watching for abnormally high levels before the load peaks and borrowing capacity from lower priority applications such as development environments. Question: “Are dynamic capacity adjustments a mature and reliable technology?” Manager Application Database Developer Systems Systems Network Storage Architect Admin Architect Admin Admin Admin Higher Determine utilization normal load needed to curve for time reduce cost of day and of service day of week Time Negotiate Monitor victim to deviations steal above normal capacity load level from Provision extra capacity before it is needed 21
  • 22. ANALYZE Project: Capacity Planning for N1 ID: P925 Rightsizing Scenario Detailed Design Concept via an Example _ Large scale Internet workload _ Fairly predictable load shape – Peaks every evening (use peak hours) – Grows every week – Key CTQs _ Performance during peak hour – Cost of maintaining performance level – Risk of downtime – Tier 3 backend database server _ Primary bottleneck, over-provisioned elsewhere – Highest cost of CPU headroom (E10K/F15K class) – Initially 56 CPUs in domain, average 30 CPUs load – 22
  • 23. ANALYZE Project: Capacity Planning for N1 ID: P925 CPU Load Level Monitor for days or weeks to establish baseline and time of peak load, then track that timeslot daily CPU load (units are CPUs, 56 configured) for a busy day: Summed CPU Utilization Peak 50 2 Hrs CPU Utilization Level 40 30 20 10 0 0:00:00 0:58:00 1:56:00 2:54:00 3:52:00 4:50:00 5:48:00 6:46:00 7:44:00 8:42:00 9:40:00 10:38:00 11:36:00 12:34:00 13:32:00 14:30:00 15:28:00 16:26:00 17:24:00 18:22:00 19:20:00 20:18:01 21:16:00 22:14:00 23:12:00 Time of Day 23
  • 24. ANALYZE Project: Capacity Planning for N1 ID: P925 Utilization Distribution Capability plot for peak time shows system is less than half utilized about 25% of the time, too much headroom. Defect rate corresponds to Sigma level of 2.18. CPU Demand Level 24
  • 25. ANALYZE Project: Capacity Planning for N1 ID: P925 Increase Utilization Reduce system to 40 CPUs, assume linear increase in utilization - predicted sigma = 5.2 Over-simplified - headroom margin and non-linearities not included in the plan. So add a little extra headroom to compensate CPU Demand Level 25
  • 26. DESIGN Project: Capacity Planning for N1 ID: P925 Headroom Tool Prototype Solaris specific prototype _ Rapid prototype using SE Toolkit from http://www.setoolkit.com – Shows component level headroom vs. utilization goal – Automatic margin calculation based on CPU count – Samples every few minutes, reports every 30-60 minutes – Microstate based, sums over all processes – Headroom predictor uses mean plus two standard deviations – Text based, logs data to a daily file, 3.5 sigma headroom – Code p.=processor, r.=ram, n.=network, d.=disk, .st=status, .cf=configured, .ll=min lsl, .ul=limit usl, .ld=mean load, .h%=headroom, .sd=std deviation, .tco=TCO defect rate, .sla=SLA defect rate, .tK=throughput K, .rm=response time in milliseconds, .rp=response time proportional increase time pll pul pcf pst ptco psla pld psd ph% ptK prm prp 17:36:04 3.6 11.6 12 Green 0.00 0.00 5.26 0.28 50 15.8 1.05 1.08 18:06:04 3.6 11.6 12 Green 0.00 0.00 4.90 0.38 51 13.9 1.01 1.06 18:36:04 3.6 11.6 12 Blue 0.40 0.00 4.55 2.19 23 13.0 0.93 1.09 19:06:03 3.6 11.6 12 Blue 1.00 0.00 3.02 0.17 71 12.7 0.86 1.05 19:36:03 3.6 11.6 12 Blue 0.93 0.00 2.82 0.53 67 12.0 0.67 1.04 CPU Throughput is based on Samples taken every 12 CPUs configured, Status is based on measured Mean load level and voluntary context switches, two minutes and Lower limit 30% = 3.6, defect proportion of time that standard deviation are prm is very short, but prp reported every 30 Upper limit based on CPU load level is below pll=TCO or compared to the upper limit minutes above pul=SLA limits to calculate headroom. defines a response time curve count at 11.6 26
  • 27. DESIGN Project: Capacity Planning for N1 ID: P925 Headroom Calculations Set configured total to number of processors online conf = sysconf(_SC_NPROCESSORS_ONLN); Set lower spec limit to 30% for TOC failures lsl = conf * 0.3; Use response time goal of 3 times baseline on curve to determine margin for maximum load level rpgoal = 3.0; Calculate max load level from theoretical response time curve /* rp = R/S, rp = 1/(1-(U^m)) so U = exp(log((rp-1)/rp)/ m)) */ usl = conf * exp(log((rpgoal-1.0)/rpgoal)/conf); Calculate headroom % from mean plus two standard deviations versus upper spec limit headp = 100.0 * (1.0 - (mean + 2.0*sd) / usl); Calculate Sigma Zst tco_sigma = 1.5 + (mean - lsl) / sd); sla_sigma = 1.5 + (usl - mean) / sd); 27
  • 28. DESIGN Project: Capacity Planning for N1 ID: P925 Design Optimization Compare the “traditional” approach with the new design Run the headroom tool on a big and busy server, collect data and show how a simplistic approach compares with the method described in this project SunRay timesharing server monitored for several days. System is loaded to the limit at peak times, but idle out of hours, so focus on a scheduled capacity reallocation scenario. Simplistic “Traditional” Approach Collect data using vmstat, sar, SunMC or 3rd party tools Plot CPU % busy - as shown on next slide There is spare capacity, but no indication of how many CPUs are unused Need extra information that this is a 12-CPU system N1CP Approach Collect data using headroom prototype Plot CPU load level in CPU units, no need to guess or replot data Calculate margin, headroom and sigma levels Plan capacity reallocation and recalculate margin, headroom and sigma levels 28
  • 29. 29 CPU %busy 0 :3 0 : 3 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% :3 05 0 6 :0 DESIGN :3 5 0 9 :0 :3 5 1 0: 20 :3 9 1 0: 51 :3 1 1 0: 82 :3 1 2 0: 10 :3 7 ID: P925 0 0 :0 :3 6 0 3 :0 :3 6 0 6 :0 :3 5 0 9 :0 :3 6 1 0: 21 :3 2 1 3: 51 :3 6 1 3: 81 :3 6 2 3: 10 :3 8 3 Project: Capacity Planning for N1 0 :0 :3 7 3 3 :0 :3 6 3 Simplistic View 6 :0 :3 7 3 Time of Day 9 :0 :3 7 1 3: 51 :0 4 1 6: 81 :0 7 2 6: 10 :0 7 CPU Utilization Monday-Thursday 6 0 :0 :0 6 6 3 :0 :0 6 6 6 :0 :0 6 6 9 :0 :0 6 1 6: 21 :0 0 1 6: 51 :0 0 1 6: 81 :0 3 2 6: There is no indication of how many CPUs are in use, util = 59% overall 10 :0 7 6 :0 7
  • 30. DESIGN Project: Capacity Planning for N1 ID: P925 N1CP View free overnight, system overloads at peak times - CPU Counts There are 12 CPUs, 6 to 8 are Mean+2sd Load vs Configured and Upper Limit pcf pul pmd+2psd 14 12 10 8 CPU Count 6 Mean CPU Load 7.03 4 Mean Util 59% DPMO Min Sigma Summary Mean headroom 34% 2 TCO 110215 -1.5 Zst Mean capacity 12.00 SLA 538 2.5 Zst 0 0:30:05 3:00:05 5:30:05 8:00:06 10:30:16 13:00:14 15:30:21 18:00:08 20:30:06 23:00:06 1:30:06 4:00:06 6:30:06 9:00:09 11:30:15 14:03:13 16:33:10 19:03:07 21:33:07 0:03:07 2:33:06 5:03:07 7:33:07 12:36:12 15:06:17 17:36:07 20:06:06 22:36:06 1:06:06 3:36:06 6:06:06 8:36:08 11:06:12 13:36:12 16:06:12 18:36:07 21:06:07 23:36:06 Time of Day 30
  • 31. DESIGN Project: Capacity Planning for N1 ID: P925 N1CP - Response Curve System is close to overload, this timeshared workload has a flatter curve than internet workloads (closed rather than open queuing model) Response Time vs Load Level 2.5 2 Response Increase 1.5 1 0.5 0 0 2 4 6 8 10 12 CPU Count 31
  • 32. 32 CPU %busy 0 :3 0 3 :0 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% :3 5 0 6 :0 DESIGN :3 5 0 9 :0 :3 5 1 0: 20 :3 9 1 0: 51 :3 1 1 0: 82 :3 1 2 0: 10 :3 7 0 ID: P925 0 :0 :3 6 0 3 :0 :3 6 0 6 :0 :3 5 0 9 :0 :3 6 1 0: 21 :3 2 1 3: 51 :3 6 1 3: 81 :3 6 2 3: 10 :3 8 3 Project: Capacity Planning for N1 0 :0 :3 7 3 3 :0 :3 6 3 6 :0 :3 7 3 Time of Day 9 :0 :3 7 1 3: 51 :0 4 1 6: 81 :0 7 2 6: 10 :0 7 6 There is no indication of how many CPUs are in use 0 :0 :0 6 6 3 :0 :0 6 CPU Utilization with Capacity Optimization 6 6 :0 :0 6 6 9 :0 :0 6 1 6: 21 :0 0 1 6: 51 :0 0 1 6: 81 :0 3 2 6: 10 :0 7 Simplistic - CPUs reallocated 6 :0 7
  • 33. DESIGN Project: Capacity Planning for N1 ID: P925 N1CPcount and times daily, and borrow extra for the peak load View - Dynamic! Vary the CPU CPU mean+2sd Load vs Config and Upper Limit pcf pul pmd+2psd 14 3.2s 3.2s 3.5s 4.3s 12 6.3s 10 3.6s CPU Count 8 5.2s 3.2s 6 5.7s Mean CPU load 7.03 4 Min Sigma Mean Util 74% Predicted 2 TCO 2.0 Zst Mean headroom 16% SLA 3.2 Zst Mean capacity 9.52 0 30 5 30 5 35 :3 09 30 6 30 5 36 :3 12 33 6 33 7 37 :0 14 06 6 06 6 06 :0 10 :3 11 :3 21 :3 07 30 6 :3 16 :3 16 :3 08 33 7 :0 17 :0 07 06 6 :0 10 :0 13 :0 07 07 3: :0 6: :0 9: :0 3: :0 6: :0 9: :0 3: :0 6: :0 9: :0 3: :0 6: :0 9: :0 0: :0 0: :0 0: :0 12 0: 12 0: 15 3: 12 6: 15 0: 18 0: 21 0: 15 3: 18 3: 21 3: 18 6: 21 6: 15 6: 18 6: 21 6: 6: 30 0 3 6 0: Time of Day 33
  • 34. DESIGN Project: Capacity Planning for N1 ID: P925 Summary Performance Impact SLA Sigma levels improve from minimum of 2.5 Zst to 3.2 Zst Improvement of 0.7 Sigma by allowing for extra peak load Simplistic methods do not allow quality of service prediction Cost Impact TCO Sigma levels improve from minimum of -1.5 Zst to 2.0 Zst Improvement of 3.5 Sigma by reducing capacity from 12 to 9.5 Observability Impact Headroom tool prototype generates all required statistics Sigma level is simply calculated, or headroom tool could print it Simplistic methods do not show what is going on Complexity Impact Dynamic reconfiguration must be enabled One reconfiguration each morning and each evening Applicability (Assertions, out of scope for this project!) CPU based example can be applied to blades, RAM, disk, net, thermal Method can be extended from platform level to services 34
  • 35. VERIFY Project: Capacity Planning for N1 ID: P925 N1 Console Screenshots 35
  • 36. GRID Project: Capacity Planning for N1 ID: P925 Capacity for Sale Uses for Spare Capacity Carefully schedule batch work and backups Remotely support global timezones Run engineering dept. simulation jobs Grid Oriented Solutions Project Grid - departmental cluster (Sun Grid Engine) Enterprise Grid - collection of clusters forming a general purpose Grid service (Sun Grid Engine Enterprise Edition) The Global Grid - Internet level - GT2.2, OGSA/OGSI/GT3 Provision an Enterprise Grid service using N1 Join The Global Grid and sell or share capacity 36
  • 37. GRID Project: Capacity Planning for N1 ID: P925 Relationships: N1 and Grid N1 is about provisioning things you own, Grid is about access to things you don’t own Business Infrastructure Model Things you Utility N1 own and Computing control Things you Grid Services Utility borrow or Computing Web Services lease 37
  • 38. GRID Project: Capacity Planning for N1 ID: P925 Capacity Flows in a Grid Enabled N1 Datacenter Utility Computing N1 Virtualized Datacenter Capacity Requests Capacity Purchase On Tier 0 C.O.D. Tier 1 Tier 3 Tier 2 Demand Web Web User / Web Database App Front Web Services Servers Storage Servers End Purchase Capacity Free Sun Pool Grid Cluster Grid Unused Grid User / Engine Compute and Resources Grid Services Enter- Storage Resources Prise Retire Edition Obsolete Capacity Repair and Replace 38
  • 39. GRID Project: Capacity Planning for N1 ID: P925 IT market segments by “need to share” Defense Commercial Technical Consumer spooks suits geeks users What can be Operating Nothing Hardware Everything System shared Nothing, N1, Server P2P apps, Grid, VPN, What is physical domains, VLAN SETI, Kazaa, encryption, separation and SAN Zone Limewire, trusted firewalls required partitioning People! Everything in The Everything What is Local systems, Local systems Global Grid including other and Internet visible Project Grids community users Storage. CPU cycles, CPU cycles. Network Organizational, Primary Latency. bandwidth. Organizational legal, constraints National issues Know-how contractual security issues 39
  • 40. Questions? Capacity Planning for N1 Adrian.Cockcroft@sun.com Sun Sigma DFSS Project P925