Vector computing

VECTOR PROCESSOR
• Vector processors are special purpose computers
that match a range of (scientific) computing tasks.
• vector processors provide vector instructions. These
instructions operate in a pipeline .
3

OBJECTIVE
• Small Programs size
• No wastage
• Feeding of functional unit(FU) and the register
buses
4

OPERATIONS
• Add two vectors to produce a third.
• Subtract two vectors to produce a third
• Multiply two vectors to produce a third
• Divide two vectors to produce a third
• Load a vector from memory
• Store a vector to memory.
7

PROPERTIES
• Vector processors reduce the fetch and decode
bandwidth as the number of instructions fetched are
less.
• They also exploit data parallelism in large scientific
and multimedia applications.
• Many performance optimization schemes are used
in vector processors.
• Strip mining is used to generate code so that vector
operation is possible for vector operands whose size
is less than or greater than the size of vector
registers.
9

PROPERTIES
• Vector chaining the equivalent of forwarding in
vector processors - is used in case of data
dependency among vector instructions.
• Special scatter and gather instructions are provided
to efficiently operate on sparse matrices.
• Instruction are designed with the property that all
vector arithmetic instructions only allow element N
of one vector register to take part in operations with
element N from other vector registers.
10

PROPERTIES
• Based on how the operands are fetched, vector
processors can be divided into two categories - in
memory-memory architecture operands are directly
streamed to the functional units from the memory
and results are written back to memory as the vector
operation proceeds. In vector-register architecture,
operands are read into vector registers from which
they are fed to the functional units and results of
operations are written to vector registers.
11

ADVANTAGES
• Data can be represented at its original resolution and
form without generalization.
• Accurate location of data is maintained.
• Efficient encoding of topology, and as a result more
efficient operations.
• Mature, developed compiler technology
• Compact: Describe N operations with 1 short
instruction
12

NEW TERMS FOR VECTOR PROCESSORS
• Initiation rate
consuming operands
producing new results.
• Chime
timing measure
vector sequence
ignores the startup overhead for a vector operation.
14

• Convoy
is the set of vector instructions
potentially begin execution together in one clock period.
must complete before new instructions can begin.
• vector start-up time
overhead to start execution
related to the pipeline depth
NEW TERMS FOR VECTOR PROCESSORS
15

PROPOSED VECTOR PROCESSOR
• CODE (Clustered Organization for Decoupled
Execution) is a proposed vector architecture which
will overcome the some limitations of conventional
vector processors.
16

REASONS
• Complexity of central vector register files(VRF) - In
a processor with N vector functional units(VFU),
the register file needs approximately 3N access
ports. VRF area, power consumption and latency
are proportional to O(N*N), O(log N) and O(N)
respectively.
• Difficult to implement precise implementation - In
order to implement in-order commit, a large ROB is
needed with at least one vector register per VFU.
17

• In order to support virtual memory, large TLB is
needed so that TLB has enough entries to translate
all virtual addresses generated by a vector
instruction.
• Vector processors need expensive on-chip memory
for low latency.
REASONS
18

SOME FEATURES OF CODE
• Vector registers are organized in the form of clusters
in CODE architecture.
• CODE can hide communication latency by forcing
the output interface to look ahead into the
instruction queue and start executing register move
instructions.
• CODE supports precise exception using a history
buffer.
• In order to reduce the size of TLB.
• CODE proposes an ISA level change.
19

The Effect of cache design into
vector computers
• Numerical programs
 data sets that are too large for the current cache sizes.
Sweep accesses of a large vector
result in complete reloading of the cache
• achieve high memory bandwidth
Register files
highly interleaved memories
• Address sequentiation
20

Proposals of cache schemes
• Proposals such as prime-mapped cache schemes
have been proposed and studied. The new cache
organization minimizes cache misses caused by
cache line interferences that have been shown to be
critical in numerical applications.
• The cache lookup time of the new mapping scheme
keeps the same as conventional caches. Generation
of cache addresses for accessing the prime-mapped
cache can be done in parallel with normal address
calculations.
21

Conclusion
• Vector supercomputers
• Vector instruction
• Commodity technology like SMT
• Superscalar microprocessor
• Embedded and multimedia applications
22

References:
• J.L. Hennessy and D.A. Patterson, Computer Architecture, A
Quantitative Approach. Morgan Kaufmann, 1990.
• http://csep1.phy.ornl.gov/ca/node24.html
• http://www.comp.nus.edu.sg/~johnm/cs3220/l21.htm
• http://penta-performance.com/sager/vector/Default_vector2.htm
• www.google.com
• www.wikipedia.com
• www.youtube.com
23

Vector computing

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Vector computing

Similar a Vector computing (20)

Más de Safayet Hossain

Más de Safayet Hossain (13)

Último

Último (20)

Vector computing