SlideShare a Scribd company logo
1 of 68
Computer Organization and
      Architecture

      (3 Credits/SKS)



  Prof. Dr. Bagio Budiardjo
  Semester Genap 2010/2011
About the Course :
Course Objectives: After completing this course the
  students are expected to understand and to be able to
  analyze the computer architecture, in particular the
  instruction-set design (e.g. addressing modes), and its
  influence to performance. The students are also expected
  to understand the meaning of computer organization, that
  is, the interconnections of computer sub-systems : CPU,
  memory, bus and I/O from a computing system.
  The student is expected to understand the more advanced
  technique in processor design : pipelining.
Key words : architecture, instruction-set design, computer
  organization, performance, processor design and,
  pipelining techniques
About the grading scheme :

• This part is actually not too rigid but it will
  appear as the combination of : homework, quiz,
  exercise, mid-test and final-test; whenever
  possible.
• One scheme possible is :
  Homework                : 15% (4)
  Mid test                : 40 %
  Final Test              : 45 %
• Grading the homework : Maximum point , 5 point
  each. Three levels of grading :Good(5), OK(3),
  and Bad(2).
The books and supporting materials :
•   Williams Stalling’s book titled Computer Organization and
    Architecture, Seventh Edition, Prentice Hall 2006; will be used
    as the main reference for this lecture. There is a new edition of
    this book, issued in 2010 but up till now is still unavailable in
    Jakarta.
•   The classic book is good (Logic and Computer Design
    Fundamentals) , by Morris M Manno and Charles Kilme -
    Pearson Asia – 2004), but too many stresses on digital logics.
    We use materials from this book to explain the hardware
    design of computer components, whenever possible
•   Chapters covered will be : Chapters: 1, 2, 3, 4, 5, 10 and 11
    and 13 (Stalling’s). Additional materials about pipelining are
    taken from another book.
Books and supporting materials - continued
• There will be no handouts (unless it is very important).
• Lecture notes are given through memory stick/CD, SAP
  could be downloaded from SIAK-NG
• Students are encouraged to read books/papers in this field
  of study.


Schedule of class :
• At scheduled time and place (K-102) for about 120
  minutes
• Lecture will be given mainly using LCD projector
About the “course direction”
Why do we study Computer Architecture ?
  History :
  Course under this name has been taught in many
  universities long before the microprocessors
  exist. Years ago, people studied mainframe
  architectures : IBM S/370, CDC Cyber, CRAY,
  Amdahl, etc.
  Since the microprocessors emerge, this course is
  changed slightly to cope with more advanced
  topics: Computer design and performance issues
About the “course direction”
     Computer Organization & Architecture
  Micro & Embedded                                                 OAK




  Microprocessors                   Processors Architecture & Design
                                    Processors Architecture & Design
  Application of µproc              Analyzing processor design emphasizing
                                           Analyzing & Implementing
                                    on how to obtain Systems to achieve
                                          Computer better processing speed
                                              (Cost effectiveness)
                                    best processing speed – Cost effectiveness



                                           Parallel & Distributed
Embedded Systems                            Computing Systems
embedding µproc based                  Organizing Processors/Computing
intelligence to new system/device    systems to obtain better speed up with
                                         different processing paradigm
About the “course direction” - continued
This course is aimed at :
       1. Explaining the phenomena of computer
          architecture and computer design
          Knowing the basic instruction cycle and its
          implication to processing speed
       2. Studying the “key” problems :
          a. CPU memory bottleneck
          b. CPU I/O devices problems
       3. Studying how the “performance” could be
          improved
          example : CPU-memory : cache memory
       4. How could we improve execution speed
          with other techniques ?
          Example : pipelining
Reasons for studying
           Computer Architecture
           (Stalling’s arguments)

• Able to select “proper” computer systems for a
  particular environment (cost and effectiveness)
• Able to analyzed a processor “embedded” to an
  environment. Able to analyzed the use of
  processor in automobile, able to use proper tools
  to analyzed
• Able to choose proper software for a particular
  computer system
View of a Computer System
– Processor Organization : Another view
                                             CPU : Central Processing Unit

                          Control
                           Unit
MMU : Mem Mng. Unit
                            IR
To/from                     PC
memory                                                           R1
           Cache            MAR
          memory            MBR

                                                                 R2

                 ALU1                 ALU2

                                                                 R3

                         ADDER

                                                                             Issues :
                            ALU3                                             Clock speed,
                                                                             Gating signal
          FPU : Floating Point Unit
                                             BUS
Implementation in CHIP
Frequently Asked Question


    What is the role of CPU clock ?
    What is the difference between P IV/2.4 G &
    P IV/3.0 G ? (CPU - clock speed 2.4 and 3.0 Ghz)

    Consider an instruction of a CPU :

    AR    R1, R2
                   (add register, content of R1 and content of
                    register R2, place result in R1)
– Execution steps of         AR      R1,R2
      The “possible” micro-execution steps are :

      a. ALU1 ← [R1]         {content of R1 is moved to ALU1}
      b. ALU2 ← [R2]         {content of R2 is moved to ALU2}
      c. ADD                 {content of ALU1 + ALU2 = ALU3}
      d. R1 ← [ALU3]         {Result of addition is moved to R1}


  If, each micro-step is executed in “one” clock-cycle,
  then this AR instruction needs 4 clock-cycles.
  For the time being, we ignore the fetch cycle
Question : How do we fetch the instruction?
               (from memory)

• There is a procedure to bring an instruction from memory
  to CPU (IR), is called the instruction fetch
• PC always hold the address of (next) instruction in
  memory
• PC tranfer the address to MAR, and READ memory
• PC ususally is icremented by 1 (point to next instruction)
• Instruction is placed by memory in MBR
• Content of MBR is transferred to IR
  (instruction is fetched, ready to be executed)
Question : How do we fetch the instruction?
               (from memory) - continued

• Or with register transfer language, we could express the
  fetch cycle as

  1.   MAR ← [PC]
  2.   READ (memory) and wait for completion
  3.   IR ← [MBR]

  In terms of CPU clock, this steps may take up to 50 CPU
  clocks depending on the memory clock speed.
– Processor Organization – continued.1

                       Control
                        Unit
                        IR
To/from
memory                  PC
                                                   R1
          MAR
          MBR

                                                   R2

                ALU1             ALU2

                                                   R3

                       ADDER



                        ALU3
                                                        ALU1           [R1]
                                        BUS                : jalur/unit tidak
                                                               aktif
– Processor Organization – continued.2

                       Control
                        Unit
                        IR
To/from
memory                  PC
                                                   R1
          MAR
          MBR

                                                   R2

                ALU1             ALU2

                                                   R3

                       ADDER

                                                        ALU2          [R2]
                        ALU3
                                                               : jalur/komponen tdk
                                                                 aktif
                                        BUS
– Processor Organization – continued.3

                       Control
                        Unit
                        IR
To/from
memory                  PC
                                                   R1
          MAR
          MBR

                                                   R2

                ALU1             ALU2

                                                   R3

                       ADDER

                                                        ADD
                        ALU3
                                                        : jalur/komponen tdk
                                                          aktif
                                        BUS
– Processor Organization – continued.4

                       Control
                        Unit
                        IR
To/from
memory                  PC
                                                   R1
          MAR
          MBR

                                                   R2

                ALU1             ALU2

                                                   R3

                       ADDER

                                                        R1       [ALU3]
                        ALU3
                                                         : jalur/komponen tdk
                                                           aktif
                                        BUS
Analysis of Instruction Cycle

• With single bus, it is slow, since in each “clock”
  only one transfer could be executed
• Is there any other way to “improve” the speed?
• Dual bus processor may be faster
• Additional processor cost
Dual processor-bus : A way to improve speed
                                            1. ALU1 ← [R1] (bus1)
                            1     2            ALU2 ← [R2] (bus2)
      Other components
     (Control Unit,IR,PC,                   2. ADD
         MAR,MBR)
                                            3. R1 ← [ALU3] (bus1)
                                       R1

                                               Only 3 clocks
                                               cycles needed,
                                       R2      25% faster


   ALU1           ALU2
                                             How about this :
                                       R3
                                            1. ALU1 ← [R1] (bus1)
          ADDER                                ALU2 ← [R2] (bus2)
                                               ADD
                                            2. R1 ← [ALU3] (bus1)
           ALU3
                                                Only 2 clocks
                                                cycles needed,
                            DUAL BUS            50% faster
Triple processor-bus : Can the processing speed imrpoved?

                             1   2 3
       Other components
      (Control Unit,IR,PC,
          MAR,MBR)
                                          R1   Please notice the
                                               direction of arrows



                                          R2

   ALU1           ALU2
                                               If all the CPU components
                                          R3   (registers, ALUs and adder)
                                               could work in a one third (1/3) clock
          ADDER                                cycle (transfer of bits, adding
                                               numbers), how many clock (s)
                                               needed to complete an addition
                                               operation (ADD R1,R2) ?
           ALU3                                Write down the “register transfer”
                                               (micro instruction steps)
                                               language!
                             Triple Bus
Program Execution
•   A scientific program using assembly language is run on a
    microprocessor with 1 Ghz clock. To complete the program , it needs
    to execute :
    a. 150.000 arithmetic instructions (e.g ADD R1,R2; MUL R1,R3;
    etc)
    b. 250.000 register transfer instructions (e.g MOV R1,R2; etc)
    c. 100.000 memory access instructions (e.g LOAD R1,X; STORE
    R2,Y; etc).
    If, average arithmetic instructions need 2 clocks (to complete), average
    register transfer instructions need 1 clock and average memory access
    instructions need 10 clocks; calculate the average CPI (clock per
    instruction) of the above mentioned program.
    How many times it needs to complete the program (in seconds)?
Can it be “one clock?” – Yes it can !
Views of Other Books on “Micro Operations”

• The Bus is called “data path”
• It is not only consist of bus (a bunch of wires), but
  other digital devices
• Enable signals is forced to fasten execution
• Additional (processor) cost
Datapath Example :
                Taken from Morris Manno’s book
                                            Load enable                                              A select                         B select
                                            Write                                             A address                       B address
•   Four parallel-load                 D data                       n


    registers                                               Load            R0                           2                              2
                                                                    n                 n

•   Two mux-based                                           Load
                                                                            R1

    register selectors                                                                n
                                                                                                                                      0
                                                                                                                                      1
                                                                                                                                        MUX
                                                                                                                                      2
•   Register destination                                            n
                                                                                                     0
                                                                                                     1
                                                                                                                                      3

                                                                                                         MUX
    decoder                                                 Load

                                                                    n
                                                                            R2                       2
                                                                                                     3
                                                                                      n
•   Mux B for external                                      Load            R3

    constant input                      0 1 2 3                                       n
                                                                                                                  n
                                                                                                                                      Register file
                                                                                                                                                   n

                                         Decoder

•
                                                                                                             A data                         B data
    Buses A and B with external             2
                                              D address
                                                       Constant in n                                                      n
                                       Destination select
    address and data outputs                              MB select
                                                                     n               1
                                                                                     MUX B
                                                                                               0

                                                                                                              n                                  Address
                                                                         Bus A
                                                                                                                                                 Out
•   ALU and Shifter with                                            A                    B
                                                                                             Bus B                n
                                                                                                                      n
                                                                                                                                                 Data
                                                                                                                                                 Out
                                                            G select                            H select
    Mux F for output select                            V
                                                                4     A
                                                                     S2:0 || Cin
                                                                                   B
                                                                                                   0
                                                                                                    2    S
                                                                                                         IR
                                                                                                                      B

                                                                                                                  Shifter        IL         0
                                                                     Arithmetic/logic

•   Mux D for external data input                      N
                                                        C               unit (ALU)
                                                                               G
                                                                                                     n
                                                                                                                      H
                                                                                 n
                                                       Z      Zero Detect
•   Logic for generating status bits                         MF select
                                                                             0
                                                                             MUX F
                                                                                   1
                                                                                                                  Function unit

    V, C, N, Z
                                                                               F
                                                                                 n                                                      n        Data In

                                                               MD select             0   1
                                                                         Bus D
                                                                                     MUX D
                                                        n
Datapath Example: Performing a Microoperation
Microoperation: R0 ← R1 + R2                   Load enable                                              A select                         B select
                                               Write                                             A address                       B address
                                          D data                       n

 Apply 01 to A select to place                                Load            R0                           2                              2
  contents of R1 onto Bus A                                            n                 n


 Apply 10 to B select to place                                Load
                                                                               R1

  contents of R2 onto B data and                                                         n
                                                                                                                                         0
                                                                                                                                         1
                                                                                                                                           MUX
  apply 0 to MB select to place                                        n
                                                                                                        0
                                                                                                                                         2
                                                                                                                                         3

  B data on Bus B                                              Load
                                                                               R2
                                                                                                        1
                                                                                                        2
                                                                                                            MUX

                                                                                                        3
 Apply 0010 to G select to perform                                    n                 n


  addition G = Bus A + Bus B                                   Load            R3
                                                                                                                     n                                n
 Apply 0 to MF select and 0 to MD         0 1 2 3                                       n                                               Register file
                                            Decoder
  select to place the value of G onto          2
                                                 D address
                                                          Constant in n
                                                                                                                A data
                                                                                                                             n
                                                                                                                                               B data



  BUS D                                   Destination select
                                                             MB select
                                                                        n               1         0
                                                                                        MUX B
 Apply 00 to Destination select to                                         Bus A
                                                                                                Bus B
                                                                                                                 n
                                                                                                                     n
                                                                                                                                                    Address
                                                                                                                                                    Out
                                                                                                                                                    Data
  enable the Load input to R0                                  G select
                                                                       A                    B
                                                                                                   H select
                                                                                                                         n
                                                                                                                         B
                                                                                                                                                    Out

                                                                   4     A            B                2
 Apply 1 to Load Enable to force the                     V
                                                                        S2:0 || Cin
                                                                        Arithmetic/logic              0
                                                                                                            S
                                                                                                            IR       Shifter        IL         0
                                                                           unit (ALU)
  Load input to R0 to 1 so that R0 is                     N
                                                           C
                                                                                  G
                                                                                                        n
                                                                                                                         H
                                                                                    n
  loaded on the clock pulse (not shown)                   Z      Zero Detect
                                                                                0     1
                                                                MF select       MUX F                                Function unit
 The overall microoperation requires                                             F
                                                                                    n                                                      n        Data In

  1 clock cycle (!)                                        n
                                                                  MD select
                                                                            Bus D
                                                                                        0   1
                                                                                        MUX D
Lesson Learned
• We could improve the instruction execution speed by
  increasing processor clock speed (can we?)
• We could improve the instruction execution speed by
  implementing dual bus (can we?)
• We can overcome (partly) the CPU-Memory bottleneck by
  inserting cache memory between CPU and Main Memory
  (can we?)
• Is there any other way to improve instruction execution
  speed (increasing performance)? - pipelining
• Are these improvements need extra cost? (cost vs
  performance issue)
What do we get after studying Computer
 Architecture ?

• It is always a complicated problem to answer.
• Basically we learn about the processor design
  issues, namely hardware of a computer but it was
  taught through “software” logics.
• At least we know about basic building blocks of a
  computer
• We know the design development trends
What is our topic ?
Intruction Set Architecture(ISA)

             Application
              Program

           Compiler   OS

                ISA
                CPU
               Design

               Circuit
               Design

                Chip
               Layout
Chapter 1 : Introduction
1. 1. Introduction : Organization & Architecture

• Organization and Architecture : two jargons that are often
  confusing
• Computer organization refers to the operational units and
  their interconnections that realize the architectural
  specifications (!)
• Computer Architecture refers to those attributes of a
  system visible to a programmer, or put another way, those
  attributes that have a direct impact on the logical execution
  of a program (!)
• The later definition (architecture) concerns more about the
  performance, compared to the first one (organization)
1. 1. Introduction - continued
• Architecture concerns more about the basic instruction
  design, that may lead to better performance of the system
• Organization, is the implementation of computer
  system, in terms of its interconnection of functional units :
  CPU, memory, bus and I/O devices.
• Example : IBM/S-370 family architecture. There are
  plenty of IBM products having the same architecture (S-
  370) but different organization, depending on its
  price/performance measures. Cost and performance differs
  the organizations
• So, organization of a computer is the implementation of
  its architecture, but tailored to fit the intended price and
  performance measures.
Chapter 2 :
Computer Evolution and
    Performance
ENIAC - background
• Electronic Numerical Integrator And Computer
• Eckert and Mauchly
• University of Pennsylvania
• Trajectory tables for weapons
• Started 1943
• Finished 1946
   – Too late for war effort
• Used until 1955
ENIAC - details
•   Decimal (not binary)
•   20 accumulators of 10 digits
•   Programmed manually by switches
•   18,000 vacuum tubes
•   30 tons
•   15,000 square feet
•   140 kW power consumption
•   5,000 additions per second
ENIAC
ENIAC
Another View of ENIAC




 YOUR PICTURE GALLERY IS NOW LOADING...
Structure of von Neumann machine
IAS - details
• 1000 x 40 bit words
   – Binary number
   – 2 x 20 bit instructions
• Set of registers (storage in CPU)
   – Memory Buffer Register
   – Memory Address Register
   – Instruction Register
   – Instruction Buffer Register
   – Program Counter
   – Accumulator
   – Multiplier Quotient
2. 1.Evolution and Performance - history
• 1946 Von Neuman and his gang proposed IAS (Institute
  for Advanced Studies)
• The design included :
   – main memory
   – ALU
   – Control Unit
   – I/O
• First Stored Program, able to perform :
  +, -, x, :
• The “father” of all modern computer/processor
Structure of IAS
IAS
2. 1. Evolution and Performance -history
IAS components are :
• MBR (memory buffer register), MAR (memory address
  register), IR (instruction register), IBR (instruction buffer
  register), PC (program counter), AC (accumulator and
  MQ (multiplier quotient), memory (1000 locations)
• 20 bit instruction : 8 bit opcode, 12 bit address (addressing
  one of 1000 memory locations - 0 to 999)
• 39 bit data (with sign bit - 1 bit)
• Operations : data transfer between registers and ALU,
  unconditional branch, conditional branch, arithmetic,
  address modify
2.1. Evolution - History of Commercial computers
• First Generation : 1950 Mauchly & Eckert developed
  UNIVAC I, used by Census Beureau
• Then appeared UNIVAC II, and later grew to UNIVAC 1100
  series (1103, 1104,1105,1106,1108) - vacuum tubes and later
  transistor
• Second Generation : Transistors, IBM 7094 (although there
  are NCR, RCA and others tried to develop their versions -
  commercially not successful)
• Third Generation : Integrated Circuit (IC) - SSI. IBM S/360
  was the successful example
• Later generations (possibly fourth and fifth) : LSI and VLSI
  technology
2.1. Evolution - history of commercial computers

                               Table 2.1
                  Approx                                        Speed
Generation        Time              Technology                 (opr/sec)
--------------------------------------------------------------------------
1.                1946-57           Vacuum tube                40,000
2.                1958-64           Transistor               200,000
3.                1965-71           SSI & MSI              1,000,000
4.                1972-77           LSI                   10,000,000
5.                1978-             VLSI                100,000,000
--------------------------------------------------------------------------
Vaccum Tubes
Transistor
2.1. Evolution - System 360 Family

                                  Model      Model       Model       Model
    Model
Characteristic                    30          40         50          65          75
----------------------------------------------------------------------------------------
    --
Max memory size (Bytes) 64K                   256K       256K        512K        512K
Memory data-rate(MB/s) 0.5                    0.8        2.0         8.0        16.0
Processor cycle time (µs) 1.0                 0.625      0.5         0.25       0.2
Relative Speed                     1           3.5       10          21         50
Max Number data channel 3                     3          4           6          6
Max chan. data-rate(KB/s) 250                 400        800         1250       1250
---------------------------------------------------------------------------------------
• Family architecture menyebabkan adanya istilah : upward dan
    downward compatible
Generations of Computer
• Vacuum tube - 1946-1957
• Transistor - 1958-1964
• Small scale integration - 1965 on
   – Up to 100 devices on a chip
• Medium scale integration - to 1971
   – 100-3,000 devices on a chip
• Large scale integration - 1971-1977
   – 3,000 - 100,000 devices on a chip
• Very large scale integration - 1978 to date
   – 100,000 - 100,000,000 devices on a chip
• Ultra large scale integration
   – Over 100,000,000 devices on a chip
Moore’s Law
•   Increased density of components on chip
•   Gordon Moore - cofounder of Intel
•   Number of transistors on a chip will double every year
•   Since 1970’s development has slowed a little
     – Number of transistors doubles every 18 months
•   Cost of a chip has remained almost unchanged
•   Higher packing density means shorter electrical paths,
    giving higher performance
•   Smaller size gives increased flexibility
•   Reduced power and cooling requirements
•   Fewer interconnections increases reliability
Moore’s Law
Growth in CPU Transistor Count
Growth in CPU Transistor Count
IBM 360 series
• 1964
• Replaced (& not compatible with) 7000 series
• First planned “family” of computers
   – Similar or identical instruction sets
   – Similar or identical O/S
   – Increasing speed
   – Increasing number of I/O ports (i.e. more terminals)
   – Increased memory size
   – Increased cost
• Multiplexed switch structure
2.1. Evolution - Later generations

• Semiconductor memories :
  1K,4K,16K,64K,256K,1M,4M,16 Mbits on a single chip
  At present : 256 Mbit, 512 Mbit per chip
• Microprocessors appeared :
  Intel 4004 (1971), Intel 8008 (72), Intel 8080 (8 bit-74),
  8086 (16 bit-81), 80386 (32bit-85) onward.
• At almost the same time : Motorola, 6800 (8bit), 68000
  (16bit), 68010 (16bit), 68020 (32bit), 68030/40 (32bit)
• Then Motorola’s product disappeared commercially
• Intel products dominated the market, since the appearance
  of IBM PC
2.1. Evolution of Microprocessors
                                     Table 2.2
----------------------------------------------------------------------------------------
    --
Feature                              8008      8080       8086 80386             80486
----------------------------------------------------------------------------------------
    --
Year introduced                      1972      1974 1978            1985        1989
# of instructions                    66        111       133        154         235
Address bus width                    8         16        20         32          32
Data bus width                       8         8         16         32          32
# of registers                       8         8         16         8           8
Memory addressability                16KB 64KB 1 MB 4 GB                        4 GB
Bus Bandwidth (MB/s)                  -        0.75       5         32          32
Reg-Reg add time (µs)                 -        1.3       0.3        0.125       0.06
----------------------------------------------------------------------------------------
2.2 Designing for Performance
• Price of µprocessor continue to drop every year
• $1000 for an advanced system is today’s price : in it you
  may find more than 100 million transistors !
• Even 100 millions pieces of toilet papers cost more !!
• Computing power is for free !!
• People solve problem that never been thought possible
  before : image processing, speech recognition,
  videoconferencing, multimedia authoring, etc.
• We need more and more computing power
• The organization and architecture of today’s processor
  remains the same (basically) as those of IAS !
• Algorithms to improve speed and efficiency differs !
2.2. Designing - µprocessor speed

• Intel Pentium and PowerPC follows Moore’s Law :
  By shrinking size of lines in IC chips by 10%, industry may get
  new IC with 4 times transistor density every 3 years !
• The above law is true for DRAM (Dynamic Random Access
  Memory)
• If the capacity does increase, the speed doesn’t increase
  automatically
• More work in designing instructions needed
• Also, techniques for faster instruction execution must be
  developed : branch prediction, data flow analysis and
  speculative execution
Pentium Evolution (1)
•   8080
     – first general purpose microprocessor
     – 8 bit data path
     – Used in first personal computer – Altair
•   8086
     – much more powerful
     – 16 bit
     – instruction cache, prefetch few instructions
     – 8088 (8 bit external bus) used in first IBM PC
•   80286
     – 16 Mbyte memory addressable
     – up from 1Mb
•   80386
     – 32 bit
     – Support for multitasking
Pentium Evolution (2)
• 80486
   – sophisticated powerful cache and instruction pipelining
   – built in maths co-processor
• Pentium
   – Superscalar
   – Multiple instructions executed in parallel
• Pentium Pro
   – Increased superscalar organization
   – Aggressive register renaming
   – branch prediction
   – data flow analysis
   – speculative execution
Pentium Evolution (3)
• Pentium II
   – MMX technology
   – graphics, video & audio processing
• Pentium III
   – Additional floating point instructions for 3D graphics
• Pentium 4
   – Note Arabic rather than Roman numerals
   – Further floating point and multimedia enhancements
• Itanium
   – 64 bit
   – see chapter 15
• See Intel web pages for detailed information on processors
Intel Microprocessor Performance
Summary: Important Points

•   Organization and Architecture
•   Family Architectures
•   Function of a Computer (Data Processing, Control, Data movement)
•   Born of Computers (Eniac-decimal, IAS-digital) Mauckly-Eckert
•   Microprocessors(I-4004,8008,8080,8086/16,80386/32)
•   IAS Instructions
•   Von Neuman bottleneck
•   Increasing clock speed, make bus wider, cache memory
•   Loosers : e.g. Motorola Micro Processor, Radio Shack,
•   More dense transistor in a single chip (4 times every 3 years, by
    shrinking lines by 10%)

More Related Content

What's hot

Basic structure of computers by aniket bhute
Basic structure of computers by aniket bhuteBasic structure of computers by aniket bhute
Basic structure of computers by aniket bhuteAniket Bhute
 
Basic MIPS implementation
Basic MIPS implementationBasic MIPS implementation
Basic MIPS implementationkavitha2009
 
Real time Operating System
Real time Operating SystemReal time Operating System
Real time Operating SystemTech_MX
 
How to Measure RTOS Performance
How to Measure RTOS Performance How to Measure RTOS Performance
How to Measure RTOS Performance mentoresd
 
SOC Chip Basics
SOC Chip BasicsSOC Chip Basics
SOC Chip BasicsA B Shinde
 
Chapter1 basic structure of computers
Chapter1  basic structure of computersChapter1  basic structure of computers
Chapter1 basic structure of computersjismymathew
 
System on chip architectures
System on chip architecturesSystem on chip architectures
System on chip architecturesA B Shinde
 
Computer Organisation and Architecture
Computer Organisation and ArchitectureComputer Organisation and Architecture
Computer Organisation and ArchitectureSubhasis Dash
 
Pipelining of Processors
Pipelining of ProcessorsPipelining of Processors
Pipelining of ProcessorsGaditek
 

What's hot (20)

Risc and cisc
Risc and ciscRisc and cisc
Risc and cisc
 
Oak 0-2011
Oak 0-2011Oak 0-2011
Oak 0-2011
 
Basic structure of computers by aniket bhute
Basic structure of computers by aniket bhuteBasic structure of computers by aniket bhute
Basic structure of computers by aniket bhute
 
Basic MIPS implementation
Basic MIPS implementationBasic MIPS implementation
Basic MIPS implementation
 
Unit 3
Unit 3Unit 3
Unit 3
 
Hg3612911294
Hg3612911294Hg3612911294
Hg3612911294
 
Real time Operating System
Real time Operating SystemReal time Operating System
Real time Operating System
 
Presentation on risc pipeline
Presentation on risc pipelinePresentation on risc pipeline
Presentation on risc pipeline
 
How to Measure RTOS Performance
How to Measure RTOS Performance How to Measure RTOS Performance
How to Measure RTOS Performance
 
Risc processors
Risc processorsRisc processors
Risc processors
 
SOC Chip Basics
SOC Chip BasicsSOC Chip Basics
SOC Chip Basics
 
Pipelining
PipeliningPipelining
Pipelining
 
RISC AND CISC PROCESSOR
RISC AND CISC PROCESSORRISC AND CISC PROCESSOR
RISC AND CISC PROCESSOR
 
Chapter1 basic structure of computers
Chapter1  basic structure of computersChapter1  basic structure of computers
Chapter1 basic structure of computers
 
design_flow
design_flowdesign_flow
design_flow
 
Architectures for parallel
Architectures for parallelArchitectures for parallel
Architectures for parallel
 
System on chip architectures
System on chip architecturesSystem on chip architectures
System on chip architectures
 
Computer Organisation and Architecture
Computer Organisation and ArchitectureComputer Organisation and Architecture
Computer Organisation and Architecture
 
CS6303 - Computer Architecture
CS6303 - Computer ArchitectureCS6303 - Computer Architecture
CS6303 - Computer Architecture
 
Pipelining of Processors
Pipelining of ProcessorsPipelining of Processors
Pipelining of Processors
 

Viewers also liked

What If the Top Transfers of this Summer Were Cars?
What If the Top Transfers of this Summer Were Cars?What If the Top Transfers of this Summer Were Cars?
What If the Top Transfers of this Summer Were Cars?Alin - Vlad Bădiliță
 
Western web work Jan 25, 2013
Western web work Jan 25, 2013Western web work Jan 25, 2013
Western web work Jan 25, 2013Marie Toler Raney
 
Hype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerLuminary Labs
 
Study: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsStudy: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsLinkedIn
 

Viewers also liked (8)

What If the Top Transfers of this Summer Were Cars?
What If the Top Transfers of this Summer Were Cars?What If the Top Transfers of this Summer Were Cars?
What If the Top Transfers of this Summer Were Cars?
 
Western web work Jan 25, 2013
Western web work Jan 25, 2013Western web work Jan 25, 2013
Western web work Jan 25, 2013
 
Atus website
Atus websiteAtus website
Atus website
 
WebTech work 2010-2015
WebTech work 2010-2015WebTech work 2010-2015
WebTech work 2010-2015
 
First level 2
First level 2 First level 2
First level 2
 
Design for college sites
Design for college sitesDesign for college sites
Design for college sites
 
Hype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI Explainer
 
Study: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsStudy: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving Cars
 

Similar to Oak 0-2011

basic structure of computers
basic structure of computersbasic structure of computers
basic structure of computersHimanshu Chawla
 
UNIT 3 - General Purpose Processors
UNIT 3 - General Purpose ProcessorsUNIT 3 - General Purpose Processors
UNIT 3 - General Purpose ProcessorsButtaRajasekhar2
 
unit 1ARM INTRODUCTION.pptx
unit 1ARM INTRODUCTION.pptxunit 1ARM INTRODUCTION.pptx
unit 1ARM INTRODUCTION.pptxKandavelEee
 
lec5 - The processor.pptx
lec5 - The processor.pptxlec5 - The processor.pptx
lec5 - The processor.pptxMahadevaAH
 
computer devices and memory unit 2 notes.pdf
computer devices and memory unit 2 notes.pdfcomputer devices and memory unit 2 notes.pdf
computer devices and memory unit 2 notes.pdfshubhangisonawane6
 
Processor Management
Processor ManagementProcessor Management
Processor ManagementSumit kumar
 
Computer Literacy Lesson Two
Computer Literacy Lesson TwoComputer Literacy Lesson Two
Computer Literacy Lesson Twocpashke
 
Unit 1 basic structure of computers
Unit 1   basic structure of computersUnit 1   basic structure of computers
Unit 1 basic structure of computerschidabdu
 
OSLecture1.ppt
OSLecture1.pptOSLecture1.ppt
OSLecture1.pptAkkiiDerp
 
POLITEKNIK MALAYSIA
POLITEKNIK MALAYSIAPOLITEKNIK MALAYSIA
POLITEKNIK MALAYSIAAiman Hud
 
Hardware-aware thread scheduling: the case of asymmetric multicore processors
Hardware-aware thread scheduling: the case of asymmetric multicore processorsHardware-aware thread scheduling: the case of asymmetric multicore processors
Hardware-aware thread scheduling: the case of asymmetric multicore processorsAchille Peternier
 
pipeline and pipeline hazards
pipeline and pipeline hazards pipeline and pipeline hazards
pipeline and pipeline hazards Bharti Khemani
 
comp. org Chapter 1
comp. org Chapter 1comp. org Chapter 1
comp. org Chapter 1Rajat Sharma
 
A REVIEW ON ANALYSIS OF 32-BIT AND 64-BIT RISC PROCESSORS
A REVIEW ON ANALYSIS OF 32-BIT AND 64-BIT RISC PROCESSORSA REVIEW ON ANALYSIS OF 32-BIT AND 64-BIT RISC PROCESSORS
A REVIEW ON ANALYSIS OF 32-BIT AND 64-BIT RISC PROCESSORSIRJET Journal
 
2 the computer system hardware
2  the computer system hardware2  the computer system hardware
2 the computer system hardwarearslanzafar13162
 
Embedded systems 101 final
Embedded systems 101 finalEmbedded systems 101 final
Embedded systems 101 finalKhalid Elmeadawy
 

Similar to Oak 0-2011 (20)

basic structure of computers
basic structure of computersbasic structure of computers
basic structure of computers
 
UNIT 3 - General Purpose Processors
UNIT 3 - General Purpose ProcessorsUNIT 3 - General Purpose Processors
UNIT 3 - General Purpose Processors
 
unit 1ARM INTRODUCTION.pptx
unit 1ARM INTRODUCTION.pptxunit 1ARM INTRODUCTION.pptx
unit 1ARM INTRODUCTION.pptx
 
H344250
H344250H344250
H344250
 
lec5 - The processor.pptx
lec5 - The processor.pptxlec5 - The processor.pptx
lec5 - The processor.pptx
 
computer devices and memory unit 2 notes.pdf
computer devices and memory unit 2 notes.pdfcomputer devices and memory unit 2 notes.pdf
computer devices and memory unit 2 notes.pdf
 
Processor Management
Processor ManagementProcessor Management
Processor Management
 
Computer Literacy Lesson Two
Computer Literacy Lesson TwoComputer Literacy Lesson Two
Computer Literacy Lesson Two
 
12429908.ppt
12429908.ppt12429908.ppt
12429908.ppt
 
Unit 1 basic structure of computers
Unit 1   basic structure of computersUnit 1   basic structure of computers
Unit 1 basic structure of computers
 
OSLecture1.ppt
OSLecture1.pptOSLecture1.ppt
OSLecture1.ppt
 
POLITEKNIK MALAYSIA
POLITEKNIK MALAYSIAPOLITEKNIK MALAYSIA
POLITEKNIK MALAYSIA
 
cs-procstruc.ppt
cs-procstruc.pptcs-procstruc.ppt
cs-procstruc.ppt
 
Processor types
Processor typesProcessor types
Processor types
 
Hardware-aware thread scheduling: the case of asymmetric multicore processors
Hardware-aware thread scheduling: the case of asymmetric multicore processorsHardware-aware thread scheduling: the case of asymmetric multicore processors
Hardware-aware thread scheduling: the case of asymmetric multicore processors
 
pipeline and pipeline hazards
pipeline and pipeline hazards pipeline and pipeline hazards
pipeline and pipeline hazards
 
comp. org Chapter 1
comp. org Chapter 1comp. org Chapter 1
comp. org Chapter 1
 
A REVIEW ON ANALYSIS OF 32-BIT AND 64-BIT RISC PROCESSORS
A REVIEW ON ANALYSIS OF 32-BIT AND 64-BIT RISC PROCESSORSA REVIEW ON ANALYSIS OF 32-BIT AND 64-BIT RISC PROCESSORS
A REVIEW ON ANALYSIS OF 32-BIT AND 64-BIT RISC PROCESSORS
 
2 the computer system hardware
2  the computer system hardware2  the computer system hardware
2 the computer system hardware
 
Embedded systems 101 final
Embedded systems 101 finalEmbedded systems 101 final
Embedded systems 101 final
 

Oak 0-2011

  • 1. Computer Organization and Architecture (3 Credits/SKS) Prof. Dr. Bagio Budiardjo Semester Genap 2010/2011
  • 2. About the Course : Course Objectives: After completing this course the students are expected to understand and to be able to analyze the computer architecture, in particular the instruction-set design (e.g. addressing modes), and its influence to performance. The students are also expected to understand the meaning of computer organization, that is, the interconnections of computer sub-systems : CPU, memory, bus and I/O from a computing system. The student is expected to understand the more advanced technique in processor design : pipelining. Key words : architecture, instruction-set design, computer organization, performance, processor design and, pipelining techniques
  • 3. About the grading scheme : • This part is actually not too rigid but it will appear as the combination of : homework, quiz, exercise, mid-test and final-test; whenever possible. • One scheme possible is : Homework : 15% (4) Mid test : 40 % Final Test : 45 % • Grading the homework : Maximum point , 5 point each. Three levels of grading :Good(5), OK(3), and Bad(2).
  • 4. The books and supporting materials : • Williams Stalling’s book titled Computer Organization and Architecture, Seventh Edition, Prentice Hall 2006; will be used as the main reference for this lecture. There is a new edition of this book, issued in 2010 but up till now is still unavailable in Jakarta. • The classic book is good (Logic and Computer Design Fundamentals) , by Morris M Manno and Charles Kilme - Pearson Asia – 2004), but too many stresses on digital logics. We use materials from this book to explain the hardware design of computer components, whenever possible • Chapters covered will be : Chapters: 1, 2, 3, 4, 5, 10 and 11 and 13 (Stalling’s). Additional materials about pipelining are taken from another book.
  • 5. Books and supporting materials - continued • There will be no handouts (unless it is very important). • Lecture notes are given through memory stick/CD, SAP could be downloaded from SIAK-NG • Students are encouraged to read books/papers in this field of study. Schedule of class : • At scheduled time and place (K-102) for about 120 minutes • Lecture will be given mainly using LCD projector
  • 6. About the “course direction” Why do we study Computer Architecture ? History : Course under this name has been taught in many universities long before the microprocessors exist. Years ago, people studied mainframe architectures : IBM S/370, CDC Cyber, CRAY, Amdahl, etc. Since the microprocessors emerge, this course is changed slightly to cope with more advanced topics: Computer design and performance issues
  • 7. About the “course direction” Computer Organization & Architecture Micro & Embedded OAK Microprocessors Processors Architecture & Design Processors Architecture & Design Application of µproc Analyzing processor design emphasizing Analyzing & Implementing on how to obtain Systems to achieve Computer better processing speed (Cost effectiveness) best processing speed – Cost effectiveness Parallel & Distributed Embedded Systems Computing Systems embedding µproc based Organizing Processors/Computing intelligence to new system/device systems to obtain better speed up with different processing paradigm
  • 8. About the “course direction” - continued This course is aimed at : 1. Explaining the phenomena of computer architecture and computer design Knowing the basic instruction cycle and its implication to processing speed 2. Studying the “key” problems : a. CPU memory bottleneck b. CPU I/O devices problems 3. Studying how the “performance” could be improved example : CPU-memory : cache memory 4. How could we improve execution speed with other techniques ? Example : pipelining
  • 9. Reasons for studying Computer Architecture (Stalling’s arguments) • Able to select “proper” computer systems for a particular environment (cost and effectiveness) • Able to analyzed a processor “embedded” to an environment. Able to analyzed the use of processor in automobile, able to use proper tools to analyzed • Able to choose proper software for a particular computer system
  • 10. View of a Computer System
  • 11. – Processor Organization : Another view CPU : Central Processing Unit Control Unit MMU : Mem Mng. Unit IR To/from PC memory R1 Cache MAR memory MBR R2 ALU1 ALU2 R3 ADDER Issues : ALU3 Clock speed, Gating signal FPU : Floating Point Unit BUS
  • 13. Frequently Asked Question What is the role of CPU clock ? What is the difference between P IV/2.4 G & P IV/3.0 G ? (CPU - clock speed 2.4 and 3.0 Ghz) Consider an instruction of a CPU : AR R1, R2 (add register, content of R1 and content of register R2, place result in R1)
  • 14. – Execution steps of AR R1,R2 The “possible” micro-execution steps are : a. ALU1 ← [R1] {content of R1 is moved to ALU1} b. ALU2 ← [R2] {content of R2 is moved to ALU2} c. ADD {content of ALU1 + ALU2 = ALU3} d. R1 ← [ALU3] {Result of addition is moved to R1} If, each micro-step is executed in “one” clock-cycle, then this AR instruction needs 4 clock-cycles. For the time being, we ignore the fetch cycle
  • 15. Question : How do we fetch the instruction? (from memory) • There is a procedure to bring an instruction from memory to CPU (IR), is called the instruction fetch • PC always hold the address of (next) instruction in memory • PC tranfer the address to MAR, and READ memory • PC ususally is icremented by 1 (point to next instruction) • Instruction is placed by memory in MBR • Content of MBR is transferred to IR (instruction is fetched, ready to be executed)
  • 16. Question : How do we fetch the instruction? (from memory) - continued • Or with register transfer language, we could express the fetch cycle as 1. MAR ← [PC] 2. READ (memory) and wait for completion 3. IR ← [MBR] In terms of CPU clock, this steps may take up to 50 CPU clocks depending on the memory clock speed.
  • 17. – Processor Organization – continued.1 Control Unit IR To/from memory PC R1 MAR MBR R2 ALU1 ALU2 R3 ADDER ALU3 ALU1 [R1] BUS : jalur/unit tidak aktif
  • 18. – Processor Organization – continued.2 Control Unit IR To/from memory PC R1 MAR MBR R2 ALU1 ALU2 R3 ADDER ALU2 [R2] ALU3 : jalur/komponen tdk aktif BUS
  • 19. – Processor Organization – continued.3 Control Unit IR To/from memory PC R1 MAR MBR R2 ALU1 ALU2 R3 ADDER ADD ALU3 : jalur/komponen tdk aktif BUS
  • 20. – Processor Organization – continued.4 Control Unit IR To/from memory PC R1 MAR MBR R2 ALU1 ALU2 R3 ADDER R1 [ALU3] ALU3 : jalur/komponen tdk aktif BUS
  • 21. Analysis of Instruction Cycle • With single bus, it is slow, since in each “clock” only one transfer could be executed • Is there any other way to “improve” the speed? • Dual bus processor may be faster • Additional processor cost
  • 22. Dual processor-bus : A way to improve speed 1. ALU1 ← [R1] (bus1) 1 2 ALU2 ← [R2] (bus2) Other components (Control Unit,IR,PC, 2. ADD MAR,MBR) 3. R1 ← [ALU3] (bus1) R1 Only 3 clocks cycles needed, R2 25% faster ALU1 ALU2 How about this : R3 1. ALU1 ← [R1] (bus1) ADDER ALU2 ← [R2] (bus2) ADD 2. R1 ← [ALU3] (bus1) ALU3 Only 2 clocks cycles needed, DUAL BUS 50% faster
  • 23. Triple processor-bus : Can the processing speed imrpoved? 1 2 3 Other components (Control Unit,IR,PC, MAR,MBR) R1 Please notice the direction of arrows R2 ALU1 ALU2 If all the CPU components R3 (registers, ALUs and adder) could work in a one third (1/3) clock ADDER cycle (transfer of bits, adding numbers), how many clock (s) needed to complete an addition operation (ADD R1,R2) ? ALU3 Write down the “register transfer” (micro instruction steps) language! Triple Bus
  • 24. Program Execution • A scientific program using assembly language is run on a microprocessor with 1 Ghz clock. To complete the program , it needs to execute : a. 150.000 arithmetic instructions (e.g ADD R1,R2; MUL R1,R3; etc) b. 250.000 register transfer instructions (e.g MOV R1,R2; etc) c. 100.000 memory access instructions (e.g LOAD R1,X; STORE R2,Y; etc). If, average arithmetic instructions need 2 clocks (to complete), average register transfer instructions need 1 clock and average memory access instructions need 10 clocks; calculate the average CPI (clock per instruction) of the above mentioned program. How many times it needs to complete the program (in seconds)?
  • 25. Can it be “one clock?” – Yes it can ! Views of Other Books on “Micro Operations” • The Bus is called “data path” • It is not only consist of bus (a bunch of wires), but other digital devices • Enable signals is forced to fasten execution • Additional (processor) cost
  • 26. Datapath Example : Taken from Morris Manno’s book Load enable A select B select Write A address B address • Four parallel-load D data n registers Load R0 2 2 n n • Two mux-based Load R1 register selectors n 0 1 MUX 2 • Register destination n 0 1 3 MUX decoder Load n R2 2 3 n • Mux B for external Load R3 constant input 0 1 2 3 n n Register file n Decoder • A data B data Buses A and B with external 2 D address Constant in n n Destination select address and data outputs MB select n 1 MUX B 0 n Address Bus A Out • ALU and Shifter with A B Bus B n n Data Out G select H select Mux F for output select V 4 A S2:0 || Cin B 0 2 S IR B Shifter IL 0 Arithmetic/logic • Mux D for external data input N C unit (ALU) G n H n Z Zero Detect • Logic for generating status bits MF select 0 MUX F 1 Function unit V, C, N, Z F n n Data In MD select 0 1 Bus D MUX D n
  • 27. Datapath Example: Performing a Microoperation Microoperation: R0 ← R1 + R2 Load enable A select B select Write A address B address D data n  Apply 01 to A select to place Load R0 2 2 contents of R1 onto Bus A n n  Apply 10 to B select to place Load R1 contents of R2 onto B data and n 0 1 MUX apply 0 to MB select to place n 0 2 3 B data on Bus B Load R2 1 2 MUX 3  Apply 0010 to G select to perform n n addition G = Bus A + Bus B Load R3 n n  Apply 0 to MF select and 0 to MD 0 1 2 3 n Register file Decoder select to place the value of G onto 2 D address Constant in n A data n B data BUS D Destination select MB select n 1 0 MUX B  Apply 00 to Destination select to Bus A Bus B n n Address Out Data enable the Load input to R0 G select A B H select n B Out 4 A B 2  Apply 1 to Load Enable to force the V S2:0 || Cin Arithmetic/logic 0 S IR Shifter IL 0 unit (ALU) Load input to R0 to 1 so that R0 is N C G n H n loaded on the clock pulse (not shown) Z Zero Detect 0 1 MF select MUX F Function unit  The overall microoperation requires F n n Data In 1 clock cycle (!) n MD select Bus D 0 1 MUX D
  • 28. Lesson Learned • We could improve the instruction execution speed by increasing processor clock speed (can we?) • We could improve the instruction execution speed by implementing dual bus (can we?) • We can overcome (partly) the CPU-Memory bottleneck by inserting cache memory between CPU and Main Memory (can we?) • Is there any other way to improve instruction execution speed (increasing performance)? - pipelining • Are these improvements need extra cost? (cost vs performance issue)
  • 29. What do we get after studying Computer Architecture ? • It is always a complicated problem to answer. • Basically we learn about the processor design issues, namely hardware of a computer but it was taught through “software” logics. • At least we know about basic building blocks of a computer • We know the design development trends
  • 30. What is our topic ? Intruction Set Architecture(ISA) Application Program Compiler OS ISA CPU Design Circuit Design Chip Layout
  • 31. Chapter 1 : Introduction
  • 32. 1. 1. Introduction : Organization & Architecture • Organization and Architecture : two jargons that are often confusing • Computer organization refers to the operational units and their interconnections that realize the architectural specifications (!) • Computer Architecture refers to those attributes of a system visible to a programmer, or put another way, those attributes that have a direct impact on the logical execution of a program (!) • The later definition (architecture) concerns more about the performance, compared to the first one (organization)
  • 33. 1. 1. Introduction - continued • Architecture concerns more about the basic instruction design, that may lead to better performance of the system • Organization, is the implementation of computer system, in terms of its interconnection of functional units : CPU, memory, bus and I/O devices. • Example : IBM/S-370 family architecture. There are plenty of IBM products having the same architecture (S- 370) but different organization, depending on its price/performance measures. Cost and performance differs the organizations • So, organization of a computer is the implementation of its architecture, but tailored to fit the intended price and performance measures.
  • 34. Chapter 2 : Computer Evolution and Performance
  • 35. ENIAC - background • Electronic Numerical Integrator And Computer • Eckert and Mauchly • University of Pennsylvania • Trajectory tables for weapons • Started 1943 • Finished 1946 – Too late for war effort • Used until 1955
  • 36. ENIAC - details • Decimal (not binary) • 20 accumulators of 10 digits • Programmed manually by switches • 18,000 vacuum tubes • 30 tons • 15,000 square feet • 140 kW power consumption • 5,000 additions per second
  • 37. ENIAC
  • 38. ENIAC
  • 39. Another View of ENIAC YOUR PICTURE GALLERY IS NOW LOADING...
  • 40. Structure of von Neumann machine
  • 41. IAS - details • 1000 x 40 bit words – Binary number – 2 x 20 bit instructions • Set of registers (storage in CPU) – Memory Buffer Register – Memory Address Register – Instruction Register – Instruction Buffer Register – Program Counter – Accumulator – Multiplier Quotient
  • 42. 2. 1.Evolution and Performance - history • 1946 Von Neuman and his gang proposed IAS (Institute for Advanced Studies) • The design included : – main memory – ALU – Control Unit – I/O • First Stored Program, able to perform : +, -, x, : • The “father” of all modern computer/processor
  • 44. IAS
  • 45.
  • 46. 2. 1. Evolution and Performance -history IAS components are : • MBR (memory buffer register), MAR (memory address register), IR (instruction register), IBR (instruction buffer register), PC (program counter), AC (accumulator and MQ (multiplier quotient), memory (1000 locations) • 20 bit instruction : 8 bit opcode, 12 bit address (addressing one of 1000 memory locations - 0 to 999) • 39 bit data (with sign bit - 1 bit) • Operations : data transfer between registers and ALU, unconditional branch, conditional branch, arithmetic, address modify
  • 47. 2.1. Evolution - History of Commercial computers • First Generation : 1950 Mauchly & Eckert developed UNIVAC I, used by Census Beureau • Then appeared UNIVAC II, and later grew to UNIVAC 1100 series (1103, 1104,1105,1106,1108) - vacuum tubes and later transistor • Second Generation : Transistors, IBM 7094 (although there are NCR, RCA and others tried to develop their versions - commercially not successful) • Third Generation : Integrated Circuit (IC) - SSI. IBM S/360 was the successful example • Later generations (possibly fourth and fifth) : LSI and VLSI technology
  • 48. 2.1. Evolution - history of commercial computers Table 2.1 Approx Speed Generation Time Technology (opr/sec) -------------------------------------------------------------------------- 1. 1946-57 Vacuum tube 40,000 2. 1958-64 Transistor 200,000 3. 1965-71 SSI & MSI 1,000,000 4. 1972-77 LSI 10,000,000 5. 1978- VLSI 100,000,000 --------------------------------------------------------------------------
  • 51. 2.1. Evolution - System 360 Family Model Model Model Model Model Characteristic 30 40 50 65 75 ---------------------------------------------------------------------------------------- -- Max memory size (Bytes) 64K 256K 256K 512K 512K Memory data-rate(MB/s) 0.5 0.8 2.0 8.0 16.0 Processor cycle time (µs) 1.0 0.625 0.5 0.25 0.2 Relative Speed 1 3.5 10 21 50 Max Number data channel 3 3 4 6 6 Max chan. data-rate(KB/s) 250 400 800 1250 1250 --------------------------------------------------------------------------------------- • Family architecture menyebabkan adanya istilah : upward dan downward compatible
  • 52. Generations of Computer • Vacuum tube - 1946-1957 • Transistor - 1958-1964 • Small scale integration - 1965 on – Up to 100 devices on a chip • Medium scale integration - to 1971 – 100-3,000 devices on a chip • Large scale integration - 1971-1977 – 3,000 - 100,000 devices on a chip • Very large scale integration - 1978 to date – 100,000 - 100,000,000 devices on a chip • Ultra large scale integration – Over 100,000,000 devices on a chip
  • 53. Moore’s Law • Increased density of components on chip • Gordon Moore - cofounder of Intel • Number of transistors on a chip will double every year • Since 1970’s development has slowed a little – Number of transistors doubles every 18 months • Cost of a chip has remained almost unchanged • Higher packing density means shorter electrical paths, giving higher performance • Smaller size gives increased flexibility • Reduced power and cooling requirements • Fewer interconnections increases reliability
  • 55. Growth in CPU Transistor Count
  • 56. Growth in CPU Transistor Count
  • 57. IBM 360 series • 1964 • Replaced (& not compatible with) 7000 series • First planned “family” of computers – Similar or identical instruction sets – Similar or identical O/S – Increasing speed – Increasing number of I/O ports (i.e. more terminals) – Increased memory size – Increased cost • Multiplexed switch structure
  • 58. 2.1. Evolution - Later generations • Semiconductor memories : 1K,4K,16K,64K,256K,1M,4M,16 Mbits on a single chip At present : 256 Mbit, 512 Mbit per chip • Microprocessors appeared : Intel 4004 (1971), Intel 8008 (72), Intel 8080 (8 bit-74), 8086 (16 bit-81), 80386 (32bit-85) onward. • At almost the same time : Motorola, 6800 (8bit), 68000 (16bit), 68010 (16bit), 68020 (32bit), 68030/40 (32bit) • Then Motorola’s product disappeared commercially • Intel products dominated the market, since the appearance of IBM PC
  • 59. 2.1. Evolution of Microprocessors Table 2.2 ---------------------------------------------------------------------------------------- -- Feature 8008 8080 8086 80386 80486 ---------------------------------------------------------------------------------------- -- Year introduced 1972 1974 1978 1985 1989 # of instructions 66 111 133 154 235 Address bus width 8 16 20 32 32 Data bus width 8 8 16 32 32 # of registers 8 8 16 8 8 Memory addressability 16KB 64KB 1 MB 4 GB 4 GB Bus Bandwidth (MB/s) - 0.75 5 32 32 Reg-Reg add time (µs) - 1.3 0.3 0.125 0.06 ----------------------------------------------------------------------------------------
  • 60. 2.2 Designing for Performance • Price of µprocessor continue to drop every year • $1000 for an advanced system is today’s price : in it you may find more than 100 million transistors ! • Even 100 millions pieces of toilet papers cost more !! • Computing power is for free !! • People solve problem that never been thought possible before : image processing, speech recognition, videoconferencing, multimedia authoring, etc. • We need more and more computing power • The organization and architecture of today’s processor remains the same (basically) as those of IAS ! • Algorithms to improve speed and efficiency differs !
  • 61. 2.2. Designing - µprocessor speed • Intel Pentium and PowerPC follows Moore’s Law : By shrinking size of lines in IC chips by 10%, industry may get new IC with 4 times transistor density every 3 years ! • The above law is true for DRAM (Dynamic Random Access Memory) • If the capacity does increase, the speed doesn’t increase automatically • More work in designing instructions needed • Also, techniques for faster instruction execution must be developed : branch prediction, data flow analysis and speculative execution
  • 62.
  • 63. Pentium Evolution (1) • 8080 – first general purpose microprocessor – 8 bit data path – Used in first personal computer – Altair • 8086 – much more powerful – 16 bit – instruction cache, prefetch few instructions – 8088 (8 bit external bus) used in first IBM PC • 80286 – 16 Mbyte memory addressable – up from 1Mb • 80386 – 32 bit – Support for multitasking
  • 64. Pentium Evolution (2) • 80486 – sophisticated powerful cache and instruction pipelining – built in maths co-processor • Pentium – Superscalar – Multiple instructions executed in parallel • Pentium Pro – Increased superscalar organization – Aggressive register renaming – branch prediction – data flow analysis – speculative execution
  • 65. Pentium Evolution (3) • Pentium II – MMX technology – graphics, video & audio processing • Pentium III – Additional floating point instructions for 3D graphics • Pentium 4 – Note Arabic rather than Roman numerals – Further floating point and multimedia enhancements • Itanium – 64 bit – see chapter 15 • See Intel web pages for detailed information on processors
  • 67.
  • 68. Summary: Important Points • Organization and Architecture • Family Architectures • Function of a Computer (Data Processing, Control, Data movement) • Born of Computers (Eniac-decimal, IAS-digital) Mauckly-Eckert • Microprocessors(I-4004,8008,8080,8086/16,80386/32) • IAS Instructions • Von Neuman bottleneck • Increasing clock speed, make bus wider, cache memory • Loosers : e.g. Motorola Micro Processor, Radio Shack, • More dense transistor in a single chip (4 times every 3 years, by shrinking lines by 10%)