6. Dynamic Capacitance is the ratio of the electrostatic charge on a conductor to the potential difference between the conductors required to maintain that charge.
7. Higher the No Of Pipeline Stages more Instructions in Pipeline.
8. Higher No Of Pipeline Stages reduces IPC as n/{k+(n-1)} .
9. Low IPC is offset by increasing the clock rate and reducing stage time.
10. Each Instruction is CISC based so decodes into micro operations.3/25/2011 3 AN ARCHITECTURE PERSPECTIVE
11.
12. SSE instructions are 128-bit integer arithmetic and 128-bit SIMD double precision floating-point operations.
13. They reduce the overall number of instructions required to execute a particular program task.
14. They accelerate a broad range of applications, including video, speech and image, photo processing, encryption, financial, engineering and scientific applications.
87. In previous generation processors, each incoming instruction was individually decoded and executed.
88. Macrofusion enables common instruction pairs (such as a compare followed by a conditional jump) to be combined into a single internal instruction (micro-op) during decoding.
93. Core executes one 128 bit SSE in 1 clock cycle.3/25/2011 21 AN ARCHITECTURE PERSPECTIVE
94.
95. Intelligent algorithms for identifying which loads are independent of stores or are okay to load ahead of stores ensuring that no data location dependencies are violated.
96. If at all the load is invalid, then it detects the conflict, reloads the correct data and re-executes the instruction.3/25/2011 AN ARCHITECTURE PERSPECTIVE 22
97.
98. The data only has to be stored in one place that each core can access thereby optimizing cache resources.
99. When one core has minimal cache requirements, other cores can increase their percentage of L2 cache.
100. Load based sharing reduces cache misses and increasing performance.
101. Advantage is higher cache hit rate, reduced bus traffic and lower latency to data. 3/25/2011 AN ARCHITECTURE PERSPECTIVE 23
102.
103. Includes an advanced power gating capability in which an ultra fine-grained logic control turns on individual processor logic subsystems only if and when they are needed.
104. Has many buses and arrays are split so that data required in some modes of operation can be put in a low power state when not needed.
105. Implementing power gating reduced the power footprint to a great extent compared to previous processors.3/25/2011 24 AN ARCHITECTURE PERSPECTIVE
148. Once identified the traditional branch prediction, fetch and decode phases of execution are temporarily turned off while the loop executes.
149. This saves the cycles that might have been otherwise wasted in these pipeline stages due to repeated set of instructions. 3/25/2011 AN ARCHITECTURE PERSPECTIVE 36
150.
151. New Second-Level Branch Target Buffer: To improve branch predictions in large coded apps (e.g., database applications).
152. New Renamed Return Stack Buffer: Stores forward and return pointers associated with calls and returns.
157. This boosts the XML parsing speed and enables faster search and pattern matching, lexing, tokenizing and regular expression evaluation.3/25/2011 AN ARCHITECTURE PERSPECTIVE 37
162. Automatically put processor and memory into the lowest available power states that will meet the requirement of the current workload. 3/25/2011 AN ARCHITECTURE PERSPECTIVE 39
163.
164. So hypervisor can pin Virtual machine to a specific execution microprocessor and its dedicated memory.
168. Speed data movement and eliminates much of the performance overhead by giving designated virtual machines their own dedicated I/O devices, thus reducing the overhead of the VMM in managing I/O traffic.
171. Performing routing functions to and from virtual machines in dedicated network silicon, it speeds delivery and reduces the load on the VMM and Server Processors.
172. Improves two times the throughput than non-hardware assisted devices.3/25/2011 AN ARCHITECTURE PERSPECTIVE 40
179. The Reorder Buffer has been made a third larger- up from 96 to 128 Entries.
180. The Reservation Station (which schedules operations to available Execution Units) has been given an extra four slots allowing 36 Entries .3/25/2011 AN ARCHITECTURE PERSPECTIVE 41
200. For Non Multi tasking Loads only the processor most optimized for it is turned on and rest are powered off.3/25/2011 AN ARCHITECTURE PERSPECTIVE 46