SlideShare una empresa de Scribd logo
1 de 23
Code Optimization & Performance Tuning using Intel VTune
Installing Windows XP Professional Using Attended Installation

Objectives


                In this session, you will learn to:
                   Measure performance-related data for processors
                   Identify the hierarchy of memory
                   Benchmark processor performance




     Ver. 1.0                                                        Slide 1 of 23
Code Optimization & Performance Tuning using Intel VTune
Installing Windows XP Professional Using Attended Installation

Examining Processor Specifications


                Processor:
                   Computes the instructions in a program and calculates the
                   result.
                   Should be used optimally by the application.
                   Performance also affects application performance.
                   Performance should be measured to know how the processor
                   is utilized.




     Ver. 1.0                                                         Slide 2 of 23
Code Optimization & Performance Tuning using Intel VTune
Installing Windows XP Professional Using Attended Installation

Identifying Processor Performance


                Processors consists of functional units that execute specific
                instructions.
                Different types of processors have different speed of
                executing instructions.
                Before beginning to optimize the application performance,
                you need to:
                   Identify processor speed
                   Identify the execution process
                   Identify the functional units of a processor




     Ver. 1.0                                                         Slide 3 of 23
Code Optimization & Performance Tuning using Intel VTune
Installing Windows XP Professional Using Attended Installation

Identifying Processor Performance (Contd.)


                Pipelining is an important concept used in high-performance
                computing.
                Pipelining is shown in the following figure.
                                               C y c le                  C y c le                   C y c le           C y c le            C y c le               C y c le
                                                 one                      tw o                      th re e             fo u r              f iv e                  s ix

                                                                                                 C o m p u te
                In s tr u c tio n 1         R e a d th e              R e a d th e                                  W r it e t h e
                                                                                                      th e
                                           in s t r u c t io n           d a ta                                      R e s u lt
                                                                                                in s tr u c tio n
                                                                                                                     C o m p u te
                In s tr u c tio n 2                                   R e a d th e               R e a d th e                           W r ite th e
                                                                                                                          th e
                                                                     in s t r u c t io n            d a ta                               R e s u lt
                                                                                                                    in s tr u c tio n
                                                                                                                                         C o m p u te
                In s t r u c tio n 3                                                             R e a d th e       R e a d th e                                W r it e t h e
                                                                                                                                              th e
                                                                                                in s tr u c tio n      d a ta                                    R e s u lt
                                                                                                                                        in s tr u c tio n
                                       0                         1                         2                   3                   4                        5                    6
                                                                                               N u m b e r o f c lo c k c y c le s




     Ver. 1.0                                                                                                                                                                        Slide 4 of 23
Code Optimization & Performance Tuning using Intel VTune
Installing Windows XP Professional Using Attended Installation

Identifying Processor Performance (Contd.)


                Pipelining has multiple stages.
                Different parts of pipeline perform different jobs.
                Some parts of the pipeline can be duplicated so that less
                work is done at each stage.
                Pipelining has substantial impact on the performance of the
                application.




     Ver. 1.0                                                       Slide 5 of 23
Code Optimization & Performance Tuning using Intel VTune
Installing Windows XP Professional Using Attended Installation

Identifying Processor Performance (Contd.)


                A process consists of different phases of processor and
                memory utilization.
                The sequence processes follow are:
                ►   Phase 1: Memory burst            Read the instruction to be executed
                ►   Phase 2: CPU burst               Read the data from the memory
                                                     During this time, the process is
                                                     either running or waiting for the
                ►   Phase 3: Memory burst            During this time, the process is
                                                     processor.
                                                     waiting for memory write operation




     Ver. 1.0                                                             Slide 6 of 23
Code Optimization & Performance Tuning using Intel VTune
Installing Windows XP Professional Using Attended Installation

Identifying Processor Performance (Contd.)


                Instructions for different applications are of diverse types.
                Typically, each application will have multiple types of
                instructions.
                Different parts of processor, called functional units,
                executes different types of instructions.
                Functional units are of the following types:
                    Memory operations
                    Integer operations
                    Floating-point operations




     Ver. 1.0                                                           Slide 7 of 23
Code Optimization & Performance Tuning using Intel VTune
Installing Windows XP Professional Using Attended Installation

Measuring Processor Performance


                Processor performance is measured in terms of the
                following parameters:
                ►   Branch mispredictions        • It means that the branch executed is not the
                                                   same as predicted by the processor.
                ►   Loads/Stores complete            It refers to the process of loading data
                                                 • In such a case, there is stores refer to
                                                     from the memory and an additional
                ►   Throughput                     overhead to the number data values for the
                                                     It refers in loading the of processes that
                                                     writing data back to the memory per unit
                                                   branch not their execution ofprocessor.
                                                     complete executed by the unit time.
                                                                                  per
                ►   Turnaround time                  time.
                                                        It refers to the amount time to execute a
                                                        particular process. It is also called
                ►   Instruction execution time         It refers to the execution time for an
                                                        execution time.
                ►   Program execution time           Itinstruction.
                                                        refers to thee execution time for a
                                                     program.
                ►   Waiting time                     It refers to the amount of time a process
                                                     It is the sum total of the ready queue. for
                                                     has been waiting in the execution time
                ►   Response time                    It refers to the amount of time taken to is
                                                     each instruction.
                                                      It refers to the fraction of time the CPU
                                                     generate a response to a request.
                ►   CPU utilization                   processing instructions.
                                                     It refers to the fraction of time a process is
                                                     usingdifference between CPU utilization
                                                      The the CPU.
                ►   CPU efficiency                    and CPU efficiency is that CPU utilization
                                                      is the fraction of time when the CPU is not
                                                      idle while CPU efficiency is the amount of
                                                      time when the CPU is computing
                                                      instructions.

     Ver. 1.0                                                                         Slide 8 of 23
Code Optimization & Performance Tuning using Intel VTune
Installing Windows XP Professional Using Attended Installation

Measuring Processor Performance (Contd.)


                    Some standard metrics to measure the processor
                    performance are:
                ►      Instructions retired
                ►      Clock Cycles Per instruction Retired (CPI)
                ►      Percentage of floating-point instructions

                         CPI ismetric reports thethe percentage cycles tothat are retired
                           This the ratio of the number of of instructions the number
                                       measures number clock of retired floating-point
                         of instructions retired.
                           instructions.
                           during program execution.
                         ItWhen the execution of the instructions is complete, the that
                           A high percentage processor's internal resource utilization.
                            is a measure of a of floating-point instructions indicate
                         A high value indicates only resource utilization. while other
                           processor doesusing low a the instructions any longer.
                           the program is not require specific resource
                           resources are idle.
                           Thus, when the processor discards these instructions, they
                           are said to be retired.




     Ver. 1.0                                                                               Slide 9 of 23
Code Optimization & Performance Tuning using Intel VTune
Installing Windows XP Professional Using Attended Installation

Just a minute


                How can you measure processor performance?


                Answer:
                   Processor performance is measured in terms of the following
                   parameters:
                      Branch mispredictions
                      Loads/Stores complete
                      Throughput
                      Turnaround time
                      Instruction execution time
                      Program execution time
                      Waiting time
                      Response time
                      CPU utilization
                      CPU efficiency
     Ver. 1.0                                                           Slide 10 of 23
Code Optimization & Performance Tuning using Intel VTune
Installing Windows XP Professional Using Attended Installation

Examining Memory Specifications


                The performance of a processor also depends on how fast
                data can be read from and written to the main memory.
                Memory speed is considerably slower than processor
                speed.
                The difference in the speeds of the processor and the
                memory affects application performance.
                In spite of computers with better processing power, the
                impact of processor speed on the performance of
                applications is not substantial.
                The solution is to minimize the mismatch between the
                processor and memory speeds.
                To optimize application performance, it is important to
                understand the memory hierarchy on a computer and the
                performance of different components of the memory.

     Ver. 1.0                                                    Slide 11 of 23
Code Optimization & Performance Tuning using Intel VTune
Installing Windows XP Professional Using Attended Installation

Understanding the Memory Hierarchy


                    The following figure shows the memory hierarchy on a
                    computer system.


                ►             R e g is te r s                                     Registers speed up the execution
                                                                                  of instructions by providing fast
                                                                                  access to intermediate values
                                                                                  This is the during a calculation.
                                                                                  computed lowest level of cache
                ►         Level 1 C ache              F a s t e r / S m a lle r
                                                                                  memory, which is faster and
                                                                                  smaller

                ►         Level 2 C ache                                          It is larger in size but slower
                                                                                  than the L1 cache

                ►          M a in M e m o r y         S lo w e r / L a r g e r    It is slower and cheaper than
                                                                                  cache memory but faster and
                                                                                  more expensive than virtual
                                                                                  The processor cannot directly
                                                                                  memory.
                ►         V ir tu a l M e m o r y                                 access virtual memory.
                                                                                  It is measured in megabytes.
                                                                                  When data referenced by a
                       M e m o r y H ie r a r c h y                               virtual address is requested,
                                                                                  the virtual address is translated
                                                                                  to a main memory address
     Ver. 1.0                                                                                            Slide 12 of 23
Code Optimization & Performance Tuning using Intel VTune
Installing Windows XP Professional Using Attended Installation

Just a minute


                What is the purpose of cache memory?




                Answer:
                   Cache memory reduces the mismatch in the speeds of the
                   processor and the main memory.



     Ver. 1.0                                                        Slide 13 of 23
Code Optimization & Performance Tuning using Intel VTune
Installing Windows XP Professional Using Attended Installation

Understanding Memory Performance


                When executing an instruction, the processor waits for the
                data to be fetched from the memory.
                The processor cannot execute any other instruction while
                waiting because the previous instructions are loaded into
                registers.
                To achieve optimal performance, you must store the data as
                near as possible to the processor so that the processor is
                not idle.
                This helps to reduce the time utilized for memory access
                and improve processor utilization.




     Ver. 1.0                                                     Slide 14 of 23
Code Optimization & Performance Tuning using Intel VTune
Installing Windows XP Professional Using Attended Installation

Understanding Memory Performance (Contd.)


                You can calculate the time taken for memory access by
                knowing the hit and miss ratios.
                   The hit ratio is the number of times required data is available to
                   the total number of times data is requested from memory.
                   The miss ratio is the number of times data is not found to the
                   total number of times data is requested from memory.




     Ver. 1.0                                                               Slide 15 of 23
Code Optimization & Performance Tuning using Intel VTune
Installing Windows XP Professional Using Attended Installation

Understanding Memory Performance (Contd.)


                To improve the performance of memory, you should ensure
                that the data that the processor requested is at the nearest
                location.
                For this, you must be able to predict which data the
                processor will reference.
                This can be accomplished using the principle of locality of
                reference.
                The two types of locality of reference are:
                ►   Spatial locality         Memory locations near each other
                                             are usually used together.
                ►   Temporal locality        If a program accesses a particular
                                             If a program accesses a particular
                                             memory location, it might soon
                                             memorythe same memorysoon
                                             access location, it might location.
                                             access a nearby memory location.
                                             This location is called temporal
                                             This location is called spatial
                                             locality.
                                             locality.



     Ver. 1.0                                                                 Slide 16 of 23
Code Optimization & Performance Tuning using Intel VTune
Installing Windows XP Professional Using Attended Installation

Analyzing Issues Affecting Memory Performance


                Some of the issues that affect memory performance are:
                ►   Cache compulsory loads      When the required data is not
                                                found in the cache, it has to be
                ►   Cache capacity loads        At times, the cache has tois known
                                                loaded in the cache. This remove
                                                recently used data to load.
                ►   Cache conflict loads        as a cache compulsory
                                                Cache conflict loads occur if the
                                                accommodate other data requested
                                                processor accesses five or is
                                                This occurs whenis the ratiomore
                ►   Cache efficiency            Cache processor. the data of data
                                                by the efficiency
                                                units of data that use the the
                                                loaded for the first time insame
                                                loaded because, the capacity of the
                                                This is into the cache to the data
                ►   Data alignment              row. alignment is the organization
                                                cache.
                                                Data
                                                used. is limited.
                                                cache
                                                You can avoid cache conflict loads
                                                of data in memory.
                ►   Software prefetch           Software prefetch enables a
                                                by changing memory alignment,
                                                Effective data alignment can
                                                processor to load a specific
                                                using registers efficiency. data, or
                                                improve of memoryholding it is
                                                                 for
                                                location cache         before
                                                using algorithms that use fewer
                                                required for processing.
                                                regions of memory.
                                                As a result, the time taken for reads
                                                and writes is reduced by the
                                                amount of time that is saved while
                                                the data is being loaded in the
                                                cache.




     Ver. 1.0                                                              Slide 17 of 23
Code Optimization & Performance Tuning using Intel VTune
Installing Windows XP Professional Using Attended Installation

Benchmarking


                A benchmark is a standard that is used for comparison.
                In terms of application performance, you can consider
                processor and memory benchmarks.
                To arrive at a specific benchmark, you can use tests to
                compare the performance of hardware and software running
                a specified workload.
                If you use graphic applications, a benchmark that tests
                graphics speed might be useful.




     Ver. 1.0                                                    Slide 18 of 23
Code Optimization & Performance Tuning using Intel VTune
Installing Windows XP Professional Using Attended Installation

Benchmarking (Contd.)


                The different types of benchmarks are:
                ►   Single stream benchmarks     Single stream benchmarks
                                                 measure the time taken by the
                ►   Throughput benchmarks        Throughput benchmarks
                                                 computer to execute a collection of
                                                 benchmark processor performance
                ►   Interactive benchmarks       programs. benchmarks benchmark
                                                 Interactive
                                                 for several jobs or a mix of codes
                                                 the components of a computer
                                                 running simultaneously.
                                                 such as input/output system,
                                                 operating system, and networks.




     Ver. 1.0                                                              Slide 19 of 23
Code Optimization & Performance Tuning using Intel VTune
Installing Windows XP Professional Using Attended Installation

Just a minute


                What are various benchmarks for measuring processor
                performance?




                Answer:
                   The different types of benchmarks are:
                      Single stream benchmarks
                      Throughput benchmarks
                      Interactive benchmarks


     Ver. 1.0                                                    Slide 20 of 23
Code Optimization & Performance Tuning using Intel VTune
Installing Windows XP Professional Using Attended Installation

R e a d in g C P U C y c l e s t o M e a s u r e P r o c e s s o r P e r f o r m a n c e


                  The benchmarks for processor performance are:
                     Read Time Stamp Counter (RDTSC)
                     Million Instructions Per Second (MIPS)
                     Million Floating Point Multiply Operations (MFLOPS)




       Ver. 1.0                                                                Slide 21 of 23
Code Optimization & Performance Tuning using Intel VTune
Installing Windows XP Professional Using Attended Installation

Summary


                In this session, you learned that:
                   Application performance is closely related to hardware
                   resources, such as processors and memory.
                   Processor speed is measured in clock cycles per second. This
                   is an indication of the number of instructions executed in unit
                   time.
                   Pipelining is an approach used for high-performance
                   computing to obtain maximum processor output.
                   The execution process of an instruction consists of CPU and
                   memory bursts.
                   A processor contains different functional units for executing
                   memory, integers, and floating-point instructions.




     Ver. 1.0                                                             Slide 22 of 23
Code Optimization & Performance Tuning using Intel VTune
Installing Windows XP Professional Using Attended Installation

Summary (Contd.)


                 Processor performance can be measured in terms of branch
                 mispredictions, loads/stores complete, throughput, turnaround
                 time, instruction execution time, program execution time,
                 waiting time, response time, CPU utilization, and CPU
                 efficiency.
                 Computer memory consists of registers, cache memory, main
                 memory, and virtual memory.
                 The performance of memory depends on the speed of the
                 memory.
                 Cache compulsory loads, cache capacity loads, cache conflict
                 loads, data alignment, and the software prefetch capability
                 affect memory performance.
                 Performance benchmarking is the process of defining
                 standards for application performance in terms of processors
                 and memory.


     Ver. 1.0                                                         Slide 23 of 23

Más contenido relacionado

Similar a Install Windows XP Using Attended Installation

Ricardo Klatlovsky - Plugging In The Consumer: Results and Conclusions of the...
Ricardo Klatlovsky - Plugging In The Consumer: Results and Conclusions of the...Ricardo Klatlovsky - Plugging In The Consumer: Results and Conclusions of the...
Ricardo Klatlovsky - Plugging In The Consumer: Results and Conclusions of the...Shane Mitchell
 
HP - 2martie2011
HP - 2martie2011HP - 2martie2011
HP - 2martie2011Agora Group
 
ApacheCon 2013 SSO and Fine Grained Authorization in the Cloud
ApacheCon 2013 SSO and Fine Grained Authorization in the CloudApacheCon 2013 SSO and Fine Grained Authorization in the Cloud
ApacheCon 2013 SSO and Fine Grained Authorization in the CloudOliver Wulff
 
Share point content and application lifecycle management guidance
Share point content and application lifecycle management guidanceShare point content and application lifecycle management guidance
Share point content and application lifecycle management guidanceworihuela
 
Alison Fleming Michael Upton Collaborating for Success
Alison Fleming Michael Upton Collaborating for SuccessAlison Fleming Michael Upton Collaborating for Success
Alison Fleming Michael Upton Collaborating for SuccessFuture Perfect 2012
 

Similar a Install Windows XP Using Attended Installation (9)

Ricardo Klatlovsky - Plugging In The Consumer: Results and Conclusions of the...
Ricardo Klatlovsky - Plugging In The Consumer: Results and Conclusions of the...Ricardo Klatlovsky - Plugging In The Consumer: Results and Conclusions of the...
Ricardo Klatlovsky - Plugging In The Consumer: Results and Conclusions of the...
 
05 agencies
05 agencies05 agencies
05 agencies
 
If2036 model proses
If2036 model prosesIf2036 model proses
If2036 model proses
 
LOD2 Plenary Vienna 2012: WP10 - Training, Dissemination, Community Building,...
LOD2 Plenary Vienna 2012: WP10 - Training, Dissemination, Community Building,...LOD2 Plenary Vienna 2012: WP10 - Training, Dissemination, Community Building,...
LOD2 Plenary Vienna 2012: WP10 - Training, Dissemination, Community Building,...
 
HP - 2martie2011
HP - 2martie2011HP - 2martie2011
HP - 2martie2011
 
ApacheCon 2013 SSO and Fine Grained Authorization in the Cloud
ApacheCon 2013 SSO and Fine Grained Authorization in the CloudApacheCon 2013 SSO and Fine Grained Authorization in the Cloud
ApacheCon 2013 SSO and Fine Grained Authorization in the Cloud
 
IMPROVING ORDER-TO-CASH CYCLE.
IMPROVING ORDER-TO-CASH CYCLE.IMPROVING ORDER-TO-CASH CYCLE.
IMPROVING ORDER-TO-CASH CYCLE.
 
Share point content and application lifecycle management guidance
Share point content and application lifecycle management guidanceShare point content and application lifecycle management guidance
Share point content and application lifecycle management guidance
 
Alison Fleming Michael Upton Collaborating for Success
Alison Fleming Michael Upton Collaborating for SuccessAlison Fleming Michael Upton Collaborating for Success
Alison Fleming Michael Upton Collaborating for Success
 

Más de Niit Care (20)

Ajs 1 b
Ajs 1 bAjs 1 b
Ajs 1 b
 
Ajs 4 b
Ajs 4 bAjs 4 b
Ajs 4 b
 
Ajs 4 a
Ajs 4 aAjs 4 a
Ajs 4 a
 
Ajs 4 c
Ajs 4 cAjs 4 c
Ajs 4 c
 
Ajs 3 b
Ajs 3 bAjs 3 b
Ajs 3 b
 
Ajs 3 a
Ajs 3 aAjs 3 a
Ajs 3 a
 
Ajs 3 c
Ajs 3 cAjs 3 c
Ajs 3 c
 
Ajs 2 b
Ajs 2 bAjs 2 b
Ajs 2 b
 
Ajs 2 a
Ajs 2 aAjs 2 a
Ajs 2 a
 
Ajs 2 c
Ajs 2 cAjs 2 c
Ajs 2 c
 
Ajs 1 a
Ajs 1 aAjs 1 a
Ajs 1 a
 
Ajs 1 c
Ajs 1 cAjs 1 c
Ajs 1 c
 
Dacj 4 2-c
Dacj 4 2-cDacj 4 2-c
Dacj 4 2-c
 
Dacj 4 2-b
Dacj 4 2-bDacj 4 2-b
Dacj 4 2-b
 
Dacj 4 2-a
Dacj 4 2-aDacj 4 2-a
Dacj 4 2-a
 
Dacj 4 1-c
Dacj 4 1-cDacj 4 1-c
Dacj 4 1-c
 
Dacj 4 1-b
Dacj 4 1-bDacj 4 1-b
Dacj 4 1-b
 
Dacj 4 1-a
Dacj 4 1-aDacj 4 1-a
Dacj 4 1-a
 
Dacj 1-2 b
Dacj 1-2 bDacj 1-2 b
Dacj 1-2 b
 
Dacj 1-3 c
Dacj 1-3 cDacj 1-3 c
Dacj 1-3 c
 

Último

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 

Último (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 

Install Windows XP Using Attended Installation

  • 1. Code Optimization & Performance Tuning using Intel VTune Installing Windows XP Professional Using Attended Installation Objectives In this session, you will learn to: Measure performance-related data for processors Identify the hierarchy of memory Benchmark processor performance Ver. 1.0 Slide 1 of 23
  • 2. Code Optimization & Performance Tuning using Intel VTune Installing Windows XP Professional Using Attended Installation Examining Processor Specifications Processor: Computes the instructions in a program and calculates the result. Should be used optimally by the application. Performance also affects application performance. Performance should be measured to know how the processor is utilized. Ver. 1.0 Slide 2 of 23
  • 3. Code Optimization & Performance Tuning using Intel VTune Installing Windows XP Professional Using Attended Installation Identifying Processor Performance Processors consists of functional units that execute specific instructions. Different types of processors have different speed of executing instructions. Before beginning to optimize the application performance, you need to: Identify processor speed Identify the execution process Identify the functional units of a processor Ver. 1.0 Slide 3 of 23
  • 4. Code Optimization & Performance Tuning using Intel VTune Installing Windows XP Professional Using Attended Installation Identifying Processor Performance (Contd.) Pipelining is an important concept used in high-performance computing. Pipelining is shown in the following figure. C y c le C y c le C y c le C y c le C y c le C y c le one tw o th re e fo u r f iv e s ix C o m p u te In s tr u c tio n 1 R e a d th e R e a d th e W r it e t h e th e in s t r u c t io n d a ta R e s u lt in s tr u c tio n C o m p u te In s tr u c tio n 2 R e a d th e R e a d th e W r ite th e th e in s t r u c t io n d a ta R e s u lt in s tr u c tio n C o m p u te In s t r u c tio n 3 R e a d th e R e a d th e W r it e t h e th e in s tr u c tio n d a ta R e s u lt in s tr u c tio n 0 1 2 3 4 5 6 N u m b e r o f c lo c k c y c le s Ver. 1.0 Slide 4 of 23
  • 5. Code Optimization & Performance Tuning using Intel VTune Installing Windows XP Professional Using Attended Installation Identifying Processor Performance (Contd.) Pipelining has multiple stages. Different parts of pipeline perform different jobs. Some parts of the pipeline can be duplicated so that less work is done at each stage. Pipelining has substantial impact on the performance of the application. Ver. 1.0 Slide 5 of 23
  • 6. Code Optimization & Performance Tuning using Intel VTune Installing Windows XP Professional Using Attended Installation Identifying Processor Performance (Contd.) A process consists of different phases of processor and memory utilization. The sequence processes follow are: ► Phase 1: Memory burst Read the instruction to be executed ► Phase 2: CPU burst Read the data from the memory During this time, the process is either running or waiting for the ► Phase 3: Memory burst During this time, the process is processor. waiting for memory write operation Ver. 1.0 Slide 6 of 23
  • 7. Code Optimization & Performance Tuning using Intel VTune Installing Windows XP Professional Using Attended Installation Identifying Processor Performance (Contd.) Instructions for different applications are of diverse types. Typically, each application will have multiple types of instructions. Different parts of processor, called functional units, executes different types of instructions. Functional units are of the following types: Memory operations Integer operations Floating-point operations Ver. 1.0 Slide 7 of 23
  • 8. Code Optimization & Performance Tuning using Intel VTune Installing Windows XP Professional Using Attended Installation Measuring Processor Performance Processor performance is measured in terms of the following parameters: ► Branch mispredictions • It means that the branch executed is not the same as predicted by the processor. ► Loads/Stores complete It refers to the process of loading data • In such a case, there is stores refer to from the memory and an additional ► Throughput overhead to the number data values for the It refers in loading the of processes that writing data back to the memory per unit branch not their execution ofprocessor. complete executed by the unit time. per ► Turnaround time time. It refers to the amount time to execute a particular process. It is also called ► Instruction execution time It refers to the execution time for an execution time. ► Program execution time Itinstruction. refers to thee execution time for a program. ► Waiting time It refers to the amount of time a process It is the sum total of the ready queue. for has been waiting in the execution time ► Response time It refers to the amount of time taken to is each instruction. It refers to the fraction of time the CPU generate a response to a request. ► CPU utilization processing instructions. It refers to the fraction of time a process is usingdifference between CPU utilization The the CPU. ► CPU efficiency and CPU efficiency is that CPU utilization is the fraction of time when the CPU is not idle while CPU efficiency is the amount of time when the CPU is computing instructions. Ver. 1.0 Slide 8 of 23
  • 9. Code Optimization & Performance Tuning using Intel VTune Installing Windows XP Professional Using Attended Installation Measuring Processor Performance (Contd.) Some standard metrics to measure the processor performance are: ► Instructions retired ► Clock Cycles Per instruction Retired (CPI) ► Percentage of floating-point instructions CPI ismetric reports thethe percentage cycles tothat are retired This the ratio of the number of of instructions the number measures number clock of retired floating-point of instructions retired. instructions. during program execution. ItWhen the execution of the instructions is complete, the that A high percentage processor's internal resource utilization. is a measure of a of floating-point instructions indicate A high value indicates only resource utilization. while other processor doesusing low a the instructions any longer. the program is not require specific resource resources are idle. Thus, when the processor discards these instructions, they are said to be retired. Ver. 1.0 Slide 9 of 23
  • 10. Code Optimization & Performance Tuning using Intel VTune Installing Windows XP Professional Using Attended Installation Just a minute How can you measure processor performance? Answer: Processor performance is measured in terms of the following parameters: Branch mispredictions Loads/Stores complete Throughput Turnaround time Instruction execution time Program execution time Waiting time Response time CPU utilization CPU efficiency Ver. 1.0 Slide 10 of 23
  • 11. Code Optimization & Performance Tuning using Intel VTune Installing Windows XP Professional Using Attended Installation Examining Memory Specifications The performance of a processor also depends on how fast data can be read from and written to the main memory. Memory speed is considerably slower than processor speed. The difference in the speeds of the processor and the memory affects application performance. In spite of computers with better processing power, the impact of processor speed on the performance of applications is not substantial. The solution is to minimize the mismatch between the processor and memory speeds. To optimize application performance, it is important to understand the memory hierarchy on a computer and the performance of different components of the memory. Ver. 1.0 Slide 11 of 23
  • 12. Code Optimization & Performance Tuning using Intel VTune Installing Windows XP Professional Using Attended Installation Understanding the Memory Hierarchy The following figure shows the memory hierarchy on a computer system. ► R e g is te r s Registers speed up the execution of instructions by providing fast access to intermediate values This is the during a calculation. computed lowest level of cache ► Level 1 C ache F a s t e r / S m a lle r memory, which is faster and smaller ► Level 2 C ache It is larger in size but slower than the L1 cache ► M a in M e m o r y S lo w e r / L a r g e r It is slower and cheaper than cache memory but faster and more expensive than virtual The processor cannot directly memory. ► V ir tu a l M e m o r y access virtual memory. It is measured in megabytes. When data referenced by a M e m o r y H ie r a r c h y virtual address is requested, the virtual address is translated to a main memory address Ver. 1.0 Slide 12 of 23
  • 13. Code Optimization & Performance Tuning using Intel VTune Installing Windows XP Professional Using Attended Installation Just a minute What is the purpose of cache memory? Answer: Cache memory reduces the mismatch in the speeds of the processor and the main memory. Ver. 1.0 Slide 13 of 23
  • 14. Code Optimization & Performance Tuning using Intel VTune Installing Windows XP Professional Using Attended Installation Understanding Memory Performance When executing an instruction, the processor waits for the data to be fetched from the memory. The processor cannot execute any other instruction while waiting because the previous instructions are loaded into registers. To achieve optimal performance, you must store the data as near as possible to the processor so that the processor is not idle. This helps to reduce the time utilized for memory access and improve processor utilization. Ver. 1.0 Slide 14 of 23
  • 15. Code Optimization & Performance Tuning using Intel VTune Installing Windows XP Professional Using Attended Installation Understanding Memory Performance (Contd.) You can calculate the time taken for memory access by knowing the hit and miss ratios. The hit ratio is the number of times required data is available to the total number of times data is requested from memory. The miss ratio is the number of times data is not found to the total number of times data is requested from memory. Ver. 1.0 Slide 15 of 23
  • 16. Code Optimization & Performance Tuning using Intel VTune Installing Windows XP Professional Using Attended Installation Understanding Memory Performance (Contd.) To improve the performance of memory, you should ensure that the data that the processor requested is at the nearest location. For this, you must be able to predict which data the processor will reference. This can be accomplished using the principle of locality of reference. The two types of locality of reference are: ► Spatial locality Memory locations near each other are usually used together. ► Temporal locality If a program accesses a particular If a program accesses a particular memory location, it might soon memorythe same memorysoon access location, it might location. access a nearby memory location. This location is called temporal This location is called spatial locality. locality. Ver. 1.0 Slide 16 of 23
  • 17. Code Optimization & Performance Tuning using Intel VTune Installing Windows XP Professional Using Attended Installation Analyzing Issues Affecting Memory Performance Some of the issues that affect memory performance are: ► Cache compulsory loads When the required data is not found in the cache, it has to be ► Cache capacity loads At times, the cache has tois known loaded in the cache. This remove recently used data to load. ► Cache conflict loads as a cache compulsory Cache conflict loads occur if the accommodate other data requested processor accesses five or is This occurs whenis the ratiomore ► Cache efficiency Cache processor. the data of data by the efficiency units of data that use the the loaded for the first time insame loaded because, the capacity of the This is into the cache to the data ► Data alignment row. alignment is the organization cache. Data used. is limited. cache You can avoid cache conflict loads of data in memory. ► Software prefetch Software prefetch enables a by changing memory alignment, Effective data alignment can processor to load a specific using registers efficiency. data, or improve of memoryholding it is for location cache before using algorithms that use fewer required for processing. regions of memory. As a result, the time taken for reads and writes is reduced by the amount of time that is saved while the data is being loaded in the cache. Ver. 1.0 Slide 17 of 23
  • 18. Code Optimization & Performance Tuning using Intel VTune Installing Windows XP Professional Using Attended Installation Benchmarking A benchmark is a standard that is used for comparison. In terms of application performance, you can consider processor and memory benchmarks. To arrive at a specific benchmark, you can use tests to compare the performance of hardware and software running a specified workload. If you use graphic applications, a benchmark that tests graphics speed might be useful. Ver. 1.0 Slide 18 of 23
  • 19. Code Optimization & Performance Tuning using Intel VTune Installing Windows XP Professional Using Attended Installation Benchmarking (Contd.) The different types of benchmarks are: ► Single stream benchmarks Single stream benchmarks measure the time taken by the ► Throughput benchmarks Throughput benchmarks computer to execute a collection of benchmark processor performance ► Interactive benchmarks programs. benchmarks benchmark Interactive for several jobs or a mix of codes the components of a computer running simultaneously. such as input/output system, operating system, and networks. Ver. 1.0 Slide 19 of 23
  • 20. Code Optimization & Performance Tuning using Intel VTune Installing Windows XP Professional Using Attended Installation Just a minute What are various benchmarks for measuring processor performance? Answer: The different types of benchmarks are: Single stream benchmarks Throughput benchmarks Interactive benchmarks Ver. 1.0 Slide 20 of 23
  • 21. Code Optimization & Performance Tuning using Intel VTune Installing Windows XP Professional Using Attended Installation R e a d in g C P U C y c l e s t o M e a s u r e P r o c e s s o r P e r f o r m a n c e The benchmarks for processor performance are: Read Time Stamp Counter (RDTSC) Million Instructions Per Second (MIPS) Million Floating Point Multiply Operations (MFLOPS) Ver. 1.0 Slide 21 of 23
  • 22. Code Optimization & Performance Tuning using Intel VTune Installing Windows XP Professional Using Attended Installation Summary In this session, you learned that: Application performance is closely related to hardware resources, such as processors and memory. Processor speed is measured in clock cycles per second. This is an indication of the number of instructions executed in unit time. Pipelining is an approach used for high-performance computing to obtain maximum processor output. The execution process of an instruction consists of CPU and memory bursts. A processor contains different functional units for executing memory, integers, and floating-point instructions. Ver. 1.0 Slide 22 of 23
  • 23. Code Optimization & Performance Tuning using Intel VTune Installing Windows XP Professional Using Attended Installation Summary (Contd.) Processor performance can be measured in terms of branch mispredictions, loads/stores complete, throughput, turnaround time, instruction execution time, program execution time, waiting time, response time, CPU utilization, and CPU efficiency. Computer memory consists of registers, cache memory, main memory, and virtual memory. The performance of memory depends on the speed of the memory. Cache compulsory loads, cache capacity loads, cache conflict loads, data alignment, and the software prefetch capability affect memory performance. Performance benchmarking is the process of defining standards for application performance in terms of processors and memory. Ver. 1.0 Slide 23 of 23

Notas del editor

  1. Initiate the discussion by asking the students how the hardware considerations can help in enhancing performance of an application. Explain that using the available resources, such as processor and memory in an efficient manner can improve the performance of your application. Also ask students what is hyper threading technology? Hyper-Threading Technology enables multi-threaded software applications to execute threads in parallel. Threading was enabled in the software by splitting instructions into multiple streams so that multiple processors could act upon them. But Hyper-Threading Technology utilizes processor-level threading which offers more efficient use of processor resources.
  2. Ask students why it is necessary to understand the processor specifications to optimize performance of your application. Explain in detail the processor specifications, such as processor speed, functional units, and process execution. Ask them about the pipelining process and latency period of an instruction.
  3. Ask students why it is necessary to understand the processor specifications to optimize performance of your application. Explain in detail the processor specifications, such as processor speed, functional units, and process execution. Ask them about the pipelining process and latency period of an instruction.
  4. In this slide and the next slide, explain the concept of pipelining. Explain the different functional units of processor. You can explain processor architecture using the following example: Mobile Intel Celeron Processor for Embedded Computing is available at 1.2 GHz frequency. It has a 400 MHz processor system bus delivering 3.2 GB of data per second into and out of the processor. It uses the Hyper-pipelined technology. The functional units of the processor include two Arithmetic Logic Units and a floating-point unit. It consists of 128-bit floating-point registers an additional register for data movement. It supports 128-bit SIMD integer arithmetic operations and 128-bit SIMD double-precision floating-point operations. The Software Prefetch functionality of a Mobile Intel Celeron Processor anticipates the data needed by an application and pre-loads it. Explain that to identify processor speed, you need to consider the latency period of an instruction and the length of instructions. Ask students how identifying the different phases of processor and memory utilization can help to optimize the performance of your application.
  5. Explain the terms displayed on the slide with the help of animations.
  6. Ask students the standard metrics to measure performance of a processor. Ask students what are Retired events? Retired events refer to the events that occur due to instructions that are committed to the machine state. For example, when measuring Loads retired event, load occurring on a mispredicted path is not counted. Explain in detail the Instructions Retired, CPI, and Percentage of floating –Point Instructions standard metrics. Ask students what are Instructions Retired? Instructions Retired are the number of instructions that are committed to the processor state or executed completely. Instructions Retired standard metric can be used to view the number of instructions that are discarded during execution of program. CPI refers to the ratio of the number of clock cycles to the number of instructions retired. Percentage of Floating-Point Instructions measures the percentage of retired floating-point instructions.
  7. Ask students how understanding the memory specifications can enable you to enhance the performance of your application. Explain that the computer memory is a combination of various types of memory and that to get the optimal performance you need to understand the memory hierarchy.
  8. Explain the different levels of memory hierarchy as displayed on the slide. Registers enable fast execution of instructions as they provide fast access to values computed during calculation. Explain the multiple levels of cache memory Main memory is the primary storage of computer and is directly connected to the processor. Explain the process of paging in virtual memory.
  9. Ask how mismatch in memory and processor speed can decrease the performance of an application. Ask how you can calculate the time taken for memory access.
  10. Explain the Hit and Miss ratios as given in the slide. Ask the following question: If the data is requested 78 times and it is found in the cache 56 times, and for all the other times it has to be loaded from the main memory. What is the cache miss ratio? Ans: The miss ratio is 78-56/78 = 0.28
  11. Ask students the reason for data that the processor requested to be at the nearest location. Tell the students that for this you should be able to predict the data that the processor will reference. Explain the different types locality of references mentioned in the slide. Ask what applications exhibit spatial locality
  12. Ask students the reason for data that the processor requested to be at the nearest location. Explain the various performance issues that affect the memory performance. While explaining cache conflict loads, explain that the data in the cache is organized in rows. If multiple data (five or more) from a single row is accessed by different processes at the same time, a cache conflict load occurs.
  13. Ask students the reason to use benchmark for optimal performance of applications. Give an example that if you use graphic applications, benchmark that test graphics can be useful.
  14. Ask students the different types of benchmarks used. Explain the various types of benchmarks. Explain that single stream benchmarks measures the time that the computers take to execute a collection of programs.
  15. Ask the different types of benchmarks used for processor performance. Explain in detail the benchmarks for processor performance. Explain that MIPS or Million Instructions Per Second. It is a processor benchmark and refers to the low-level machine code instructions that a processor can execute in one second. Also, explain that MFLOPS refers to how many million floating-point multiply operations that can be performed per second.